Hello
I have a large number of datasets, each in its own subdirectory. So far, I was always pulling all datasets with git clone ... & dvc pull, which works perfectly.
Now, I would like to load only a subset of my datasets. I tried to archive this by doing a sparse checkout of the git repository. However, if I run git sparse-checkout init & git sparse-checkout add .dvc dataset_name followed by dvc pull, I get
Is there a way to fix this, or maybe a better way to solve the problem in the first place?
I am aware that I could simply run dvc pull --recursive dataset_name - but given the large number of directories I am dealing with, I’d prefer to clone only those that I actually need.
Traceback (most recent call last):
File "/Users/marc/miniconda3/envs/py39/lib/python3.9/site-packages/dvc/cli/__init__.py", line 183, in main
cmd = args.func(args)
File "/Users/marc/miniconda3/envs/py39/lib/python3.9/site-packages/dvc/cli/command.py", line 20, in __init__
self.repo: "Repo" = Repo(uninitialized=self.UNINITIALIZED)
File "/Users/marc/miniconda3/envs/py39/lib/python3.9/site-packages/dvc/repo/__init__.py", line 248, in __init__
self._ignore()
File "/Users/marc/miniconda3/envs/py39/lib/python3.9/site-packages/dvc/repo/__init__.py", line 416, in _ignore
self.scm_context.ignore(file)
File "/Users/marc/miniconda3/envs/py39/lib/python3.9/site-packages/funcy/objects.py", line 28, in __get__
res = instance.__dict__[self.fget.__name__] = self.fget(instance)
File "/Users/marc/miniconda3/envs/py39/lib/python3.9/site-packages/dvc/repo/__init__.py", line 313, in scm_context
return SCMContext(self.scm, self.config)
File "/Users/marc/miniconda3/envs/py39/lib/python3.9/site-packages/funcy/objects.py", line 28, in __get__
res = instance.__dict__[self.fget.__name__] = self.fget(instance)
File "/Users/marc/miniconda3/envs/py39/lib/python3.9/site-packages/dvc/repo/__init__.py", line 301, in scm
return SCM(self.root_dir, no_scm=no_scm)
File "/Users/marc/miniconda3/envs/py39/lib/python3.9/site-packages/dvc/scm.py", line 102, in SCM
return Git(
File "/Users/marc/miniconda3/envs/py39/lib/python3.9/site-packages/scmrepo/git/__init__.py", line 98, in __init__
first_ = first(self.backends.values())
File "/Users/marc/miniconda3/envs/py39/lib/python3.9/site-packages/funcy/seqs.py", line 55, in first
return next(iter(seq), None)
File "/Users/marc/miniconda3/envs/py39/lib/python3.9/_collections_abc.py", line 869, in __iter__
yield self._mapping[key]
File "/Users/marc/miniconda3/envs/py39/lib/python3.9/site-packages/scmrepo/git/__init__.py", line 49, in __getitem__
initialized = backend(*self.args, **self.kwargs)
File "/Users/marc/miniconda3/envs/py39/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 146, in __init__
self.repo = Repo.discover(start=root_dir)
File "/Users/marc/miniconda3/envs/py39/lib/python3.9/site-packages/dulwich/repo.py", line 1240, in discover
return cls(path)
File "/Users/marc/miniconda3/envs/py39/lib/python3.9/site-packages/dulwich/repo.py", line 1172, in __init__
raise UnsupportedExtension(extension)
dulwich.repo.UnsupportedExtension: (b'worktreeConfig', b'true')
Unfortunately our default git backend (dulwich) does not currently support the worktreeconfig option. The good news is that it has just been merged Add support for worktreeconfig extension · jelmer/dulwich@a9bbc16 · GitHub and will be included in the next dulwich release, meaning we’ll be able to fix the issue as soon as the new dulwich release it out. For now, if it is an option, I’d suggest to disable sparse checkout.