I’m quite new to DVC and I’m still confued by some concepts. What I’m trying to do now is to understand the output of dvc status
. This is what I see:
$ dvc status
somefolder/somefile.dvc:
changed outs:
deleted: somefolder/somefile
What does “changed outs” mean here? I’ve just cloned this git repository, and I have not made any changes at all. Why would dvc claim that something has been deleted?
The .dvc file looks correct:
$ cat somefolder/somefile.dvc
outs:
- md5: c06520abe0140c72004dbe4494a78b23.dir
size: 692847854
nfiles: 8
hash: md5
path: somefile
This is how I’ve configured DVC:
$ cat .dvc/config
[cache]
dir = /srv/shared-dvc-cache/
shared = group
type = hardlink
[core]
remote = myremote
['remote "myremote"']
url = gs://some-bucket/dvc
And I do see the md5 for somefolder/somefile.dvc
in the cache directory (/srv/shared-dvc-cache/
):
$ ls -ahl /srv/shared-dvc-cache/files/md5/c0/6520abe0140c72004dbe4494a78b23.dir
-r--r--r-- 1 stian stian 686 Aug 2 14:56 /srv/shared-dvc-cache/files/md5/c0/6520abe0140c72004dbe4494a78b23.dir
I can also find it in the bucket:
$ gsutil ls -ahl gs://some-bucket/dvc/files/md5/c0/6520abe0140c72004dbe4494a78b23.dir
686 B 2023-08-03T07:01:49Z gs://s/dvc/files/md5/c0/6520abe0140c72004dbe4494a78b23.dir#1691046109228732 metageneration=1
TOTAL: 1 objects, 686 bytes (686 B)
If I try to pull, then I get a weird error:
$ dvc pull -vv
2023-08-22 11:31:15,846 DEBUG: v3.10.1 (pip), CPython 3.10.12 on Linux-5.4.0-152-generic-x86_64-with-glibc2.31
2023-08-22 11:31:15,846 DEBUG: command: /home/stian/some_repo/venv/bin/dvc pull -vv
2023-08-22 11:31:15,846 TRACE: Namespace(quiet=0, verbose=2, cprofile=False, cprofile_dump=None, yappi=False, yappi_separate_threads=False, viztracer=False, viztracer_depth=None, viztracer_async=False, pdb=False, instrument=False, instrument_open=False, show_stack=False, cd='.', cmd='pull', jobs=None, targets=[], remote=None, all_branches=False, all_tags=False, all_commits=False, force=False, with_deps=False, recursive=False, run_cache=False, glob=False, allow_missing=False, func=<class 'dvc.commands.data_sync.CmdDataPull'>, parser=DvcParser(prog='dvc', usage=None, description='Data Version Control', formatter_class=<class 'argparse.RawTextHelpFormatter'>, conflict_handler='error', add_help=False))
2023-08-22 11:31:16,149 TRACE: 3.64 ms in collecting stages from /home/stian/some_repo
<redacted>
2023-08-22 11:31:16,376 DEBUG: Preparing to transfer data from 'some-bucket/dvc/files/md5' to '/srv/shared-dvc-cache/files/md5'
2023-08-22 11:31:16,376 DEBUG: Preparing to collect status from '/srv/shared-dvc-cache/files/md5'
2023-08-22 11:31:16,376 DEBUG: Collecting status from '/srv/shared-dvc-cache/files/md5'
2023-08-22 11:31:16,394 DEBUG: failed to create '/home/stian/some-repo/testdata/some-folder/some-file.txt' from '/srv/shared-dvc-cache/files/md5/3e/ee94fe51c2bc3978b48d0205b6c77d' - [Errno 95] no more link types left to try out: [Errno 1] Operation not permitted: '/srv/shared-dvc-cache/files/md5/3e/ee94fe51c2bc3978b48d0205b6c77d' -> '/home/stian/some-repo/testdata/some-folder/some-file.txt'
Traceback (most recent call last):
File "/home/stian/some-repo/venv/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 250, in _try_links
_link(link, from_fs, from_path, to_fs, to_path)
File "/home/stian/some-repo/venv/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 62, in _link
func(from_path, to_path)
File "/home/stian/some-repo/venv/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 381, in link
return self.fs.link(from_info, to_info)
File "/home/stian/some-repo/venv/lib/python3.10/site-packages/dvc_objects/fs/local.py", line 166, in link
return system.hardlink(path1, path2)
File "/home/stian/some-repo/venv/lib/python3.10/site-packages/dvc_objects/fs/system.py", line 32, in hardlink
os.link(src, link_name)
PermissionError: [Errno 1] Operation not permitted: '/srv/shared-dvc-cache/files/md5/3e/ee94fe51c2bc3978b48d0205b6c77d' -> '/home/stian/some-repo/testdata/some-folder/some-file.txt'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/stian/some-repo/venv/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 308, in transfer
_try_links(
File "/home/stian/some-repo/venv/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 267, in _try_links
raise OSError(errno.ENOTSUP, "no more link types left to try out") from error
OSError: [Errno 95] no more link types left to try out
2023-08-22 11:31:16,395 ERROR: unexpected error - list index out of range
Traceback (most recent call last):
File "/home/stian/some-repo/venv/lib/python3.10/site-packages/dvc/cli/__init__.py", line 209, in main
ret = cmd.do_run()
File "/home/stian/some-repo/venv/lib/python3.10/site-packages/dvc/cli/command.py", line 26, in do_run
return self.run()
File "/home/stian/some-repo/venv/lib/python3.10/site-packages/dvc/commands/data_sync.py", line 31, in run
stats = self.repo.pull(
File "/home/stian/some-repo/venv/lib/python3.10/site-packages/dvc/repo/__init__.py", line 64, in wrapper
return f(repo, *args, **kwargs)
File "/home/stian/some-repo/venv/lib/python3.10/site-packages/dvc/repo/pull.py", line 43, in pull
stats = self.checkout(
File "/home/stian/some-repo/venv/lib/python3.10/site-packages/dvc/repo/__init__.py", line 64, in wrapper
return f(repo, *args, **kwargs)
File "/home/stian/some-repo/venv/lib/python3.10/site-packages/dvc/repo/checkout.py", line 184, in checkout
apply(
File "/home/stian/some-repo/venv/lib/python3.10/site-packages/dvc_data/index/checkout.py", line 351, in apply
_create_files(
File "/home/stian/some-repo/venv/lib/python3.10/site-packages/dvc_data/index/checkout.py", line 122, in _create_files
transfer(
File "/home/stian/some-repo/venv/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 296, in transfer
if links[0] == "copy":
IndexError: list index out of range
2023-08-22 11:31:16,414 DEBUG: link type reflink is not available ([Errno 95] no more link types left to try out)
2023-08-22 11:31:16,414 DEBUG: Removing '/home/stian/.PKve26BjXmzCXh4ZsBJg8H.tmp'
2023-08-22 11:31:16,415 DEBUG: Removing '/home/stian/.PKve26BjXmzCXh4ZsBJg8H.tmp'
2023-08-22 11:31:16,415 DEBUG: Removing '/home/stian/.PKve26BjXmzCXh4ZsBJg8H.tmp'
2023-08-22 11:31:16,415 DEBUG: Removing '/srv/shared-dvc-cache/files/md5/.R67LoavoyLcfYk2qnVcXn3.tmp'
2023-08-22 11:31:16,421 DEBUG: Version info for developers:
DVC version: 3.10.1 (pip)
-------------------------
Platform: Python 3.10.12 on Linux-5.4.0-152-generic-x86_64-with-glibc2.31
Subprojects:
dvc_data = 2.8.1
dvc_objects = 0.24.1
dvc_render = 0.5.3
dvc_task = 0.3.0
scmrepo = 1.1.0
Supports:
gs (gcsfs = 2023.6.0),
http (aiohttp = 3.8.5, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.5, aiohttp-retry = 2.8.3)
Config:
Global: /home/stian/.config/dvc
System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/md2
Caches: local
Remotes: gs
Workspace directory: ext4 on /dev/md2
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/e42a95986bfa50153b451739105fc96b
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2023-08-22 11:31:16,423 DEBUG: Analytics is enabled.
2023-08-22 11:31:16,478 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmppwgh5izm']'
2023-08-22 11:31:16,479 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmppwgh5izm']'
Does this make any sense at all?
Here are the dvc package versions:
$ pip list | grep -i dvc
dvc 3.10.1
dvc-data 2.8.1
dvc-gs 2.22.1
dvc-http 2.30.2
dvc-objects 0.24.1
dvc-render 0.5.3
dvc-studio-client 0.11.0
dvc-task 0.3.0