Hello,
I am currently implementing DVC in my client MLOps infrastructure as they need dataset version control.
So far it worked great. I was able to:
- Create a repository containing some dummy data
- Commit those data to DVC and the
.dvc
and config to Git. - Link the repository to the company GitHub account
- Link the repository to the default remote s3 bucket Minio
- Push the dummy data files to the remote
- Create multiple version of the dataset using Git tags
- Clone the dataset using
git clone
thendvc pull
Now that this is done, I want to be able to download a specific file/directory from a specific repository and specific tag.
From my understanding, I need to use dvc get path/to/github/repo path/to/file --rev tag_name
.
But when I try to run:
dvc get https://github.foyer.lu/Intelligence-Artificielle/Dataset-test data.csv
I get an error that I do not understand:
dvc get https://github.foyer.lu/Intelligence-Artificielle/Dataset-test newdata.csv --verbose
2024-11-05 11:52:47,011 DEBUG: v3.56.0 (pip), CPython 3.9.18 on Linux-5.14.0-362.8.1.el9_3.x86_64-x86_64-with-glibc2.34
2024-11-05 11:52:47,011 DEBUG: command: /home/llama/.local/bin/dvc get https://github.foyer.lu/Intelligence-Artificielle/Dataset-test newdata.csv --verbose
2024-11-05 11:52:47,133 DEBUG: Creating external repo https://github.foyer.lu/Intelligence-Artificielle/Dataset-test@None
2024-11-05 11:52:47,133 DEBUG: erepo: git clone 'https://github.foyer.lu/Intelligence-Artificielle/Dataset-test' to a temporary dir
2024-11-05 11:52:48,093 ERROR: failed to get 'newdata.csv' from 'https://github.foyer.lu/Intelligence-Artificielle/Dataset-test' - Git failed to fetch ref from 'https://github.foyer.lu/Intelligence-Artificielle/Dataset-test'
Traceback (most recent call last):
File "/home/llama/.local/lib/python3.9/site-packages/funcy/flow.py", line 84, in reraise
yield
File "/home/llama/.local/lib/python3.9/site-packages/scmrepo/git/backend/pygit2/__init__.py", line 704, in fetch_refspecs
for head in remote.ls_remotes(callbacks=cb, proxy=True)
File "/home/llama/.local/lib/python3.9/site-packages/pygit2/remotes.py", line 176, in ls_remotes
self.connect(callbacks=callbacks, proxy=proxy)
File "/home/llama/.local/lib/python3.9/site-packages/pygit2/remotes.py", line 117, in connect
payload.check_error(err)
File "/home/llama/.local/lib/python3.9/site-packages/pygit2/callbacks.py", line 99, in check_error
check_error(error_code)
File "/home/llama/.local/lib/python3.9/site-packages/pygit2/errors.py", line 66, in check_error
raise GitError(message)
_pygit2.GitError: unexpected EOF
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/llama/.local/lib/python3.9/site-packages/dvc/scm.py", line 60, in map_scm_exception
yield
File "/usr/lib64/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/llama/.local/lib/python3.9/site-packages/dvc/repo/open_repo.py", line 23, in _external_repo
path = _cached_clone(url, rev)
File "/home/llama/.local/lib/python3.9/site-packages/dvc/repo/open_repo.py", line 134, in _cached_clone
clone_path, shallow = _clone_default_branch(url, rev)
File "/home/llama/.local/lib/python3.9/site-packages/funcy/decorators.py", line 47, in wrapper
return deco(call, *dargs, **dkwargs)
File "/home/llama/.local/lib/python3.9/site-packages/funcy/flow.py", line 246, in wrap_with
return call()
File "/home/llama/.local/lib/python3.9/site-packages/funcy/decorators.py", line 68, in __call__
return self._func(*self._args, **self._kwargs)
File "/home/llama/.local/lib/python3.9/site-packages/dvc/repo/open_repo.py", line 198, in _clone_default_branch
git = clone(url, clone_path)
File "/home/llama/.local/lib/python3.9/site-packages/dvc/scm.py", line 152, in clone
fetch_all_exps(git, url, progress=pbar.update_git)
File "/home/llama/.local/lib/python3.9/site-packages/dvc/repo/experiments/utils.py", line 280, in fetch_all_exps
scm.fetch_refspecs(url, refspecs, progress=progress, **kwargs)
File "/home/llama/.local/lib/python3.9/site-packages/scmrepo/git/__init__.py", line 359, in fetch_refspecs
return self._fetch_refspecs(
File "/home/llama/.local/lib/python3.9/site-packages/scmrepo/git/__init__.py", line 307, in _backend_func
result = func(*args, **kwargs)
File "/home/llama/.local/lib/python3.9/site-packages/scmrepo/git/backend/pygit2/__init__.py", line 709, in fetch_refspecs
remote.fetch(
File "/usr/lib64/python3.9/contextlib.py", line 137, in __exit__
self.gen.throw(typ, value, traceback)
File "/home/llama/.local/lib/python3.9/site-packages/funcy/flow.py", line 88, in reraise
raise into from e
scmrepo.exceptions.SCMError: Git failed to fetch ref from 'https://github.foyer.lu/Intelligence-Artificielle/Dataset-test'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/llama/.local/lib/python3.9/site-packages/dvc/commands/get.py", line 37, in _get_file_from_repo
Repo.get(
File "/home/llama/.local/lib/python3.9/site-packages/dvc/repo/get.py", line 45, in get
with Repo.open(
File "/home/llama/.local/lib/python3.9/site-packages/dvc/repo/__init__.py", line 302, in open
return open_repo(url, *args, **kwargs)
File "/home/llama/.local/lib/python3.9/site-packages/dvc/repo/open_repo.py", line 60, in open_repo
return _external_repo(url, *args, **kwargs)
File "/usr/lib64/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/usr/lib64/python3.9/contextlib.py", line 137, in __exit__
self.gen.throw(typ, value, traceback)
File "/home/llama/.local/lib/python3.9/site-packages/dvc/scm.py", line 65, in map_scm_exception
raise into # noqa: B904
dvc.scm.SCMError: Git failed to fetch ref from 'https://github.foyer.lu/Intelligence-Artificielle/Dataset-test'
2024-11-05 11:52:48,100 DEBUG: Analytics is enabled.
2024-11-05 11:52:48,119 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpxz8qj2pw', '-v']
2024-11-05 11:52:48,124 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpxz8qj2pw', '-v'] with pid 560884
I tried to specify the remote but same error.
The only reference to an error Git failed to fetch ref from
is from here `dvc import` fails with `Git failed to fetch ref from".
But his error is different from mine, as his issue was with SSH but I use HTTPS.
What is weird is that the clone work when running dvc get
as I can find it in /tmp
but it fail to then download the file from the remote I think.
What is weirder is that if I myself clone the repo using git clone https://github.foyer.lu/Intelligence-Artificielle/Dataset-test
then dvc pull
inside, it’s work and I get all data files.
I have no idea how to solve this and I’d truly appreciate any help. Thank you!
Best regards,
Adrien