Dvc add and push after adding a couple of images

Hi, I 400 GB of images and videos inside a self-hosted minIO s3 that I use as a DVC repository. I also have the git repo with all the data in my local. when I add a few images to the data in my local and run dvc add, it takes a long time and also running dvc status takes a long time to run. before this when I used git lfs to track my data both git add and git status ran almost instantly. is it normal or am I doing something wrong? this is the process I go through when I modify some data in my git repo which is integrated with DVC:

  1. dvc add data
  2. git add the .dvc files that were modified.
  3. git commit…
  4. git tag…
  5. git push
  6. dvc push

Can you please run dvc doctor and post the output here

here is the output:

Hi, it is hard to say without knowing the amount of files you have. But yes, dvc is slow for 400 GB of files.

when I add a few images to the data in my local and run dvc add, it takes a long time

When updating an existing dataset, instead of using dvc add data, you can selectively ask dvc to update the part of the dataset that was changed, it’ll be faster that way.

eg:

dvc add data/images/00/

If you can, please provide a profiling data. You can generate it as follows:

dvc add --cprofile-dump add.prof ...

Also see Debugging, Profiling and Benchmarking DVC · iterative/dvc Wiki · GitHub.