i have a folder containing 7000 images and i keep track of them with dvc and i am using a azure blob as my remote.
The blob is version aware so when i go to the blob i can see all images in human readable file format.
i add 1000 new images to the folder and run:
dvc add folder
i go to my folder.dvc file and i can see the names of all my files.
i then run:
dvc push
and the output is that its sending those 1000 images to the blob but after that’s done and i take a look at my folder.dvc file the old file names have been removed and only the names of the new 1000 images exist. when i go to the blob i can now only see the new 1000 images and the old images are deleted. If i run dvc add folder and dvc push again it sends the old 7000 images to the remote and removes the new 1000 images and removes the new 1000 images from the folder.dvc
Am i doing something wrong or misunderstanding something when using version aware or is this some kind of bug?
If i reinitiate dvc and change remote the same thing happens
Here is my dvc doctor output
DVC version: 3.42.0 (pip)
Platform: Python 3.8.13 on Linux-6.5.0-44-generic-x86_64-with-glibc2.10
Subprojects:
dvc_data = 3.8.0
dvc_objects = 3.0.6
dvc_render = 1.0.1
dvc_task = 0.3.0
scmrepo = 2.0.4
Supports:
azure (adlfs = 2024.2.0, knack = 0.11.0, azure-identity = 1.15.0),
gdrive (pydrive2 = 1.20.0),
gs (gcsfs = 2024.2.0),
hdfs (fsspec = 2024.2.0, pyarrow = 7.0.0),
http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
oss (ossfs = 2023.12.0),
s3 (s3fs = 2024.2.0, boto3 = 1.34.131),
ssh (sshfs = 2024.6.0),
webdav (webdav4 = 0.10.0),
webdavs (webdav4 = 0.10.0),
webhdfs (fsspec = 2024.2.0)
Config:
Global: /root/.config/dvc
System: /etc/xdg/dvc
Cache types: symlink
Cache directory: ext4 on /dev/nvme0n1p2
Caches: local
Remotes: azure
Workspace directory: ext4 on /dev/nvme0n1p2
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/668d5746c2a68091019bea1a109aaea0