DVC says "everything is up to date" when it is not

Summary: I’m running dvc pull on a .dvc file that references a directory cache file, the cache file does not exist in the local cache, but dvc says everything is ok.

I don’t have a MWE for this. I think something is screwed up in my DVC cache, but I’m not sure and I’d like to see if anyone knows of this behavior I’m experiencing.

First here is my dvc doctor:

DVC version: 3.2.5.dev1+ge29da6eae.d20230701
Platform: Python 3.11.2 on Linux-5.19.0-45-generic-x86_64-with-glibc2.35
	dvc_data = 2.3.2.dev1+gde5c685
	dvc_objects = 0.23.1.dev1+g76c3665
	dvc_render = 0.3.1
	dvc_task = 0.3.0
	scmrepo = 1.0.4
	azure (adlfs = 2023.4.0, knack = 0.10.1, azure-identity = 1.12.0),
	gdrive (pydrive2 = 1.15.4),
	gs (gcsfs = 2023.6.0),
	hdfs (fsspec = 2023.6.0, pyarrow = 11.0.0),
	http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
	oss (ossfs = 2021.8.0),
	s3 (s3fs = 2023.6.0, boto3 = 1.26.76),
	ssh (sshfs = 2023.4.1),
	webdav (webdav4 = 0.9.8),
	webdavs (webdav4 = 0.9.8),
	webhdfs (fsspec = 2023.6.0)
	Global: /home/joncrall/.config/dvc
	System: /etc/xdg/xdg-ubuntu/dvc
Cache types: reflink, hardlink, symlink
Cache directory: btrfs on /dev/nvme1n1
Caches: local
Remotes: s3, ssh, local, ssh, ssh, ssh, local, local, ssh, ssh, ssh, ssh, ssh, ssh, ssh, ssh, ssh, local
Workspace directory: btrfs on /dev/nvme1n1
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/e5b7dab284e2ab932c373f6d94804a2b

Now the setup:

I have two local DVC repos. One is on an HDD (zfs RAID-10) and the other is on an SSD (btrfs single disk). The different part of the dvc doctor for the HDD repo is:

> Cache types: hardlink, symlink
> Cache directory: zfs on data

The SSD repo has a local remote setup called hdd to point at the the local path to the HDD cache repo.

Awhile back in the HDD repo I ran dvc add on some folders, and everything looked fine. At some point I called dvc pull -r hdd -R . to get a specific subdirectory from my HDD onto my SSD, but after I noticed that some files seemed to be missing.

I found the missing file. There should have been a folder called S2 associted with S2.dvc, but it wasn’t there. On the SSD I ran dvc pull -r hdd S2.dvc, and the it said: Everything is up to date.,
but after the folder was still missing. I reran with -vv and got this additoinal output:

... bunch of collect stages trace 

2023-07-03 11:21:43,986 DEBUG: Preparing to transfer data from '/data/joncrall/dvc-repos/smart_data_dvc-hdd/.dvc/cache' to '/media/joncrall/flash1/smart_data_dvc/.dvc/cache'
2023-07-03 11:21:43,986 DEBUG: Preparing to collect status from '/media/joncrall/flash1/smart_data_dvc/.dvc/cache'
2023-07-03 11:21:43,986 DEBUG: Collecting status from '/media/joncrall/flash1/smart_data_dvc/.dvc/cache'
2023-07-03 11:21:43,990 DEBUG: Preparing to transfer data from '/data/joncrall/dvc-repos/smart_data_dvc-hdd/.dvc/cache/files/md5' to '/media/joncrall/flash1/smart_data_dvc/.dvc/cache/files/md5'
2023-07-03 11:21:43,990 DEBUG: Preparing to collect status from '/media/joncrall/flash1/smart_data_dvc/.dvc/cache/files/md5'
2023-07-03 11:21:43,990 DEBUG: Collecting status from '/media/joncrall/flash1/smart_data_dvc/.dvc/cache/files/md5'
Everything is up to date.                                                                                                                                                             
2023-07-03 11:21:44,001 DEBUG: Analytics is disabled.

Nothing here seems to be helpful. So I started manually digging into it. I saw the content of S2.dvc was:

- md5: 82ece028a0e44746fea5aee2e2603f6a.dir
  size: 3672919
  nfiles: 630
  hash: md5
  path: S2

so I went to check: does that file exist in my cache? So in the SSD I ran:

( test -f $(dvc cache dir)/files/md5/82/ece028a0e44746fea5aee2e2603f6a.dir && echo "exist" ) || echo "missing"

and I got “missing”. I reran the same command in the HDD and I got “exist”. So, the file that S2.dvc
is referencing exists on the remote, and does not exist locally, but dvc pull is not detecting that?

For good measure I also reran using dvc 2.0 cache paths:

( test -f $(dvc cache dir)/82/ece028a0e44746fea5aee2e2603f6a.dir && echo "exist" ) || echo "missing"

which returned missing on both repos (which is expected).

I also ran dvc checkout S2.dvc and it didn’t do anything. It just collected stages, said Analytics is disabled, and exited with a return code of 0.

Because I was using a development version of DVC for this, I also ran a final tests where I uninstalled my editable dvc-data, dvc-objects, and dvc repo, then then installed the latest dvc from pypi, which gave me these versions:

	dvc_data = 2.3.3
	dvc_objects = 0.23.0
	dvc_render = 0.3.1
	dvc_task = 0.3.0
	scmrepo = 1.0.4

dvc checkout did the same thing, and dvc pull did the same thing. I also tried on dvc 3.2.0 and dvc 3.1.0 with no change.

In case it matters, the only non-remote parts of my SSD dvc config / config.local are:

    type = "symlink,reflink,hardlink,copy"
    shared = group
    protected = true
    analytics = false
    autostage = true

I’m pretty stumped by this. Is there any reason why pulling a DVC file where the first cache path it references doesn’t exist, would report that everyting is up to date?

Further weirdness.

Because I have a DVC repo on my HDD with all of the files correctly in the cache, I figured I’d just manually move them over.

I found a particular file (which is just a file, not a directory): KR_R001.kwcoco.zip.dvc

- md5: 96949f9f821b9b324e25e4e66cda5e68
  size: 87238
  hash: md5
  path: KR_R001.kwcoco.zip

and copied the data from $HDD_CACHE/files/md5/96/949f9f821b9b324e25e4e66cda5e68 to $SSD_CACHE/files/md5/96/949f9f821b9b324e25e4e66cda5e68

Now, in the SSD running:

test -f $(dvc cache dir)/files/md5/96/949f9f821b9b324e25e4e66cda5e68 && echo "exists"

I get “exists”. The absolute path on my machine is /media/joncrall/flash1/smart_data_dvc/.dvc/cache/files/md5/96/949f9f821b9b324e25e4e66cda5e68, and I can verify that the file is indeed the file I’m intersted in. However, when I run:

dvc checkout KR_R001.kwcoco.zip.dvc

and nothing happens…

What computer god did I offend to get myself into such a borked state?