`dvc get` fail with `Git failed to fetch ref from`

Hello,

I am currently implementing DVC in my client MLOps infrastructure as they need dataset version control.

So far it worked great. I was able to:

  1. Create a repository containing some dummy data
  2. Commit those data to DVC and the .dvc and config to Git.
  3. Link the repository to the company GitHub account
  4. Link the repository to the default remote s3 bucket Minio
  5. Push the dummy data files to the remote
  6. Create multiple version of the dataset using Git tags
  7. Clone the dataset using git clone then dvc pull

Now that this is done, I want to be able to download a specific file/directory from a specific repository and specific tag.
From my understanding, I need to use dvc get path/to/github/repo path/to/file --rev tag_name.

But when I try to run:
dvc get https://github.foyer.lu/Intelligence-Artificielle/Dataset-test data.csv

I get an error that I do not understand:

dvc get https://github.foyer.lu/Intelligence-Artificielle/Dataset-test newdata.csv --verbose
2024-11-05 11:52:47,011 DEBUG: v3.56.0 (pip), CPython 3.9.18 on Linux-5.14.0-362.8.1.el9_3.x86_64-x86_64-with-glibc2.34
2024-11-05 11:52:47,011 DEBUG: command: /home/llama/.local/bin/dvc get https://github.foyer.lu/Intelligence-Artificielle/Dataset-test newdata.csv --verbose
2024-11-05 11:52:47,133 DEBUG: Creating external repo https://github.foyer.lu/Intelligence-Artificielle/Dataset-test@None
2024-11-05 11:52:47,133 DEBUG: erepo: git clone 'https://github.foyer.lu/Intelligence-Artificielle/Dataset-test' to a temporary dir
2024-11-05 11:52:48,093 ERROR: failed to get 'newdata.csv' from 'https://github.foyer.lu/Intelligence-Artificielle/Dataset-test' - Git failed to fetch ref from 'https://github.foyer.lu/Intelligence-Artificielle/Dataset-test'                                            
Traceback (most recent call last):
  File "/home/llama/.local/lib/python3.9/site-packages/funcy/flow.py", line 84, in reraise
    yield
  File "/home/llama/.local/lib/python3.9/site-packages/scmrepo/git/backend/pygit2/__init__.py", line 704, in fetch_refspecs
    for head in remote.ls_remotes(callbacks=cb, proxy=True)
  File "/home/llama/.local/lib/python3.9/site-packages/pygit2/remotes.py", line 176, in ls_remotes
    self.connect(callbacks=callbacks, proxy=proxy)
  File "/home/llama/.local/lib/python3.9/site-packages/pygit2/remotes.py", line 117, in connect
    payload.check_error(err)
  File "/home/llama/.local/lib/python3.9/site-packages/pygit2/callbacks.py", line 99, in check_error
    check_error(error_code)
  File "/home/llama/.local/lib/python3.9/site-packages/pygit2/errors.py", line 66, in check_error
    raise GitError(message)
_pygit2.GitError: unexpected EOF

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/llama/.local/lib/python3.9/site-packages/dvc/scm.py", line 60, in map_scm_exception
    yield
  File "/usr/lib64/python3.9/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/home/llama/.local/lib/python3.9/site-packages/dvc/repo/open_repo.py", line 23, in _external_repo
    path = _cached_clone(url, rev)
  File "/home/llama/.local/lib/python3.9/site-packages/dvc/repo/open_repo.py", line 134, in _cached_clone
    clone_path, shallow = _clone_default_branch(url, rev)
  File "/home/llama/.local/lib/python3.9/site-packages/funcy/decorators.py", line 47, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/home/llama/.local/lib/python3.9/site-packages/funcy/flow.py", line 246, in wrap_with
    return call()
  File "/home/llama/.local/lib/python3.9/site-packages/funcy/decorators.py", line 68, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/home/llama/.local/lib/python3.9/site-packages/dvc/repo/open_repo.py", line 198, in _clone_default_branch
    git = clone(url, clone_path)
  File "/home/llama/.local/lib/python3.9/site-packages/dvc/scm.py", line 152, in clone
    fetch_all_exps(git, url, progress=pbar.update_git)
  File "/home/llama/.local/lib/python3.9/site-packages/dvc/repo/experiments/utils.py", line 280, in fetch_all_exps
    scm.fetch_refspecs(url, refspecs, progress=progress, **kwargs)
  File "/home/llama/.local/lib/python3.9/site-packages/scmrepo/git/__init__.py", line 359, in fetch_refspecs
    return self._fetch_refspecs(
  File "/home/llama/.local/lib/python3.9/site-packages/scmrepo/git/__init__.py", line 307, in _backend_func
    result = func(*args, **kwargs)
  File "/home/llama/.local/lib/python3.9/site-packages/scmrepo/git/backend/pygit2/__init__.py", line 709, in fetch_refspecs
    remote.fetch(
  File "/usr/lib64/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/llama/.local/lib/python3.9/site-packages/funcy/flow.py", line 88, in reraise
    raise into from e
scmrepo.exceptions.SCMError: Git failed to fetch ref from 'https://github.foyer.lu/Intelligence-Artificielle/Dataset-test'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/llama/.local/lib/python3.9/site-packages/dvc/commands/get.py", line 37, in _get_file_from_repo
    Repo.get(
  File "/home/llama/.local/lib/python3.9/site-packages/dvc/repo/get.py", line 45, in get
    with Repo.open(
  File "/home/llama/.local/lib/python3.9/site-packages/dvc/repo/__init__.py", line 302, in open
    return open_repo(url, *args, **kwargs)
  File "/home/llama/.local/lib/python3.9/site-packages/dvc/repo/open_repo.py", line 60, in open_repo
    return _external_repo(url, *args, **kwargs)
  File "/usr/lib64/python3.9/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/usr/lib64/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/llama/.local/lib/python3.9/site-packages/dvc/scm.py", line 65, in map_scm_exception
    raise into  # noqa: B904
dvc.scm.SCMError: Git failed to fetch ref from 'https://github.foyer.lu/Intelligence-Artificielle/Dataset-test'

2024-11-05 11:52:48,100 DEBUG: Analytics is enabled.
2024-11-05 11:52:48,119 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpxz8qj2pw', '-v']
2024-11-05 11:52:48,124 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpxz8qj2pw', '-v'] with pid 560884

I tried to specify the remote but same error.

The only reference to an error Git failed to fetch ref from is from here `dvc import` fails with `Git failed to fetch ref from".
But his error is different from mine, as his issue was with SSH but I use HTTPS.

What is weird is that the clone work when running dvc get as I can find it in /tmp but it fail to then download the file from the remote I think.
What is weirder is that if I myself clone the repo using git clone https://github.foyer.lu/Intelligence-Artificielle/Dataset-test then dvc pull inside, it’s work and I get all data files.

I have no idea how to solve this and I’d truly appreciate any help. Thank you!

Best regards,
Adrien

1 Like

Could you please share dvc version and pip freeze outputs?

Also, how do you do auth with Git server - where do you store user name, password?

Hello,

Thank you for your response!

Here the output of both commands:

[llama@sfpl-ai-01:~/Dataset/get_test]$ dvc version
DVC version: 3.56.0 (pip)
-------------------------
Platform: Python 3.9.18 on Linux-5.14.0-362.8.1.el9_3.x86_64-x86_64-with-glibc2.34
Subprojects:
        dvc_data = 3.16.6
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.40.2
        scmrepo = 3.3.8
Supports:
        http (aiohttp = 3.10.10, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.10.10, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2024.10.0, boto3 = 1.35.36)
Config:
        Global: /home/llama/.config/dvc
        System: /etc/xdg/dvc
[llama@sfpl-ai-01:~/Dataset/get_test]$ pip freeze
WARNING: Ignoring invalid distribution -orch (/home/llama/.local/lib/python3.9/site-packages)
absl-py==2.1.0
accelerate==0.33.0
addict==2.4.0
aiobotocore==2.15.2
aiofiles==23.2.1
aiohappyeyeballs==2.4.3
aiohttp==3.10.10
aiohttp-retry==2.8.3
aioitertools==0.12.0
aiosignal==1.3.1
alembic==1.13.1
aliyun-python-sdk-core==2.15.1
aliyun-python-sdk-kms==2.16.3
altair==4.2.2
amqp==5.2.0
aniso8601==9.0.1
annotated-types==0.6.0
anthropic==0.13.0
antlr4-python3-runtime==4.9.3
anyio==3.7.1
appdirs==1.4.4
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asgiref==3.7.2
asttokens==2.4.1
async-lru==2.0.4
async-timeout==4.0.3
asyncssh==2.17.0
atpublic==5.0
attrdict==2.0.1
attrs==23.2.0
audioread==3.0.1
Babel==2.14.0
backoff==2.2.1
beautifulsoup4==4.12.3
bidict==0.22.1
billiard==4.2.1
binpacking==1.5.2
bitsandbytes==0.42.0
bleach==6.1.0
blinker==1.7.0
blis==0.7.11
boto3==1.35.36
botocore==1.35.36
bs4==0.0.2
cachetools==5.3.2
catalogue==2.0.10
celery==5.4.0
certifi==2023.11.17
cffi==1.16.0
chardet==4.0.0
charset-normalizer==3.3.2
click==8.1.7
click-didyoumean==0.3.1
click-plugins==1.1.1
click-repl==0.3.0
cloudpathlib==0.16.0
cloudpickle==3.0.0
cmake==3.28.1
colorama==0.4.6
comm==0.2.0
confection==0.1.4
configobj==5.0.9
contourpy==1.2.0
coqpit==0.0.17
coremltools==6.3.0
crcmod==1.7
cryptography==43.0.3
cuda-python==12.1.0
cupy-cuda12x==12.1.0
cycler==0.12.1
cymem==2.0.8
Cython==3.0.8
dacite==1.8.1
dalaipy==2.0.2
dataclasses-json==0.6.4
datasets==2.16.1
dbus-python==1.2.18
debugpy==1.8.0
decorator==4.4.2
defusedxml==0.7.1
Deprecated==1.2.14
dictdiffer==0.9.0
dill==0.3.7
dirtyjson==1.0.8
diskcache==5.6.3
distro==1.9.0
dnspython==2.3.0
docker==7.0.0
docker-pycreds==0.4.0
docopt==0.6.2
docstring_parser==0.16
doctr==1.9.0
donut-python==1.0.9
dpath==2.2.0
dulwich==0.22.3
dvc==3.56.0
dvc-data==3.16.6
dvc-http==2.32.0
dvc-objects==5.1.0
dvc-render==1.0.2
dvc-s3==3.2.0
dvc-studio-client==0.21.0
dvc-task==0.40.2
einops==0.7.0
email_validator==2.2.0
entrypoints==0.4
et-xmlfile==1.1.0
eval_type_backport==0.2.0
exceptiongroup==1.2.0
executing==2.0.1
faiss-gpu==1.7.2
fastapi==0.111.0
fastapi-cli==0.0.4
fastjsonschema==2.19.1
fastrlock==0.8.2
ffmpy==0.3.1
file-magic==0.4.0
filelock==3.13.1
fire==0.6.0
FlagEmbedding==1.2.5
Flask==3.0.1
Flask-Cors==4.0.0
flatten-dict==0.4.2
flufl.lock==8.1.0
fonttools==4.48.1
fqdn==1.5.1
frogmouth==0.9.2
frozenlist==1.4.1
fschat==0.2.36
fsspec==2024.10.0
funcy==2.0
future==1.0.0
fuzzywuzzy==0.18.0
gitdb==4.0.11
GitPython==3.1.41
google-ai-generativelanguage==0.4.0
google-api-core==2.16.1
google-auth==2.27.0
google-generativeai==0.3.2
googleapis-common-protos==1.62.0
gpg==1.15.1
gptcache==0.1.43
gradio==4.18.0
gradio_client==0.10.0
gradio_pdf==0.0.5
grandalf==0.8
graphene==3.3
graphql-core==3.2.3
graphql-relay==3.2.0
great-expectations==1.2.1
greenlet==3.0.3
grpcio==1.60.0
grpcio-status==1.60.0
gssapi==1.6.9
gto==1.7.1
guidance==0.1.10
gunicorn==21.2.0
h11==0.14.0
hangul-romanize==0.1.0
html2text==2020.1.16
httpcore==0.17.3
httptools==0.6.1
httpx==0.24.1
huggingface-hub==0.23.4
hydra-core==1.3.2
idna==3.6
imageio==2.25.1
imageio-ffmpeg==0.4.9
importlib-metadata==7.0.1
importlib-resources==6.1.1
iniparse==0.4
InstructorEmbedding==1.0.1
interegular==0.3.3
ipaclient==4.10.2
ipalib==4.10.2
ipaplatform==4.10.2
ipapython==4.10.2
ipykernel==6.28.0
ipython==8.18.1
ipython-genutils==0.2.0
ipywidgets==8.1.1
isoduration==20.11.0
iterative-telemetry==0.0.9
itsdangerous==2.1.2
jedi==0.19.1
jieba==0.42.1
Jinja2==3.1.3
jmespath==0.10.0
joblib==1.3.2
json5==0.9.14
jsonpatch==1.33
jsonpointer==2.4
jsonschema==4.21.0
jsonschema-specifications==2023.12.1
jupyter==1.0.0
jupyter-console==6.6.3
jupyter-events==0.9.0
jupyter-lsp==2.2.1
jupyter_client==8.6.0
jupyter_core==5.5.1
jupyter_server==2.12.5
jupyter_server_terminals==0.5.1
jupyterlab==4.0.10
jupyterlab-widgets==3.0.9
jupyterlab_pygments==0.3.0
jupyterlab_server==2.25.2
jwcrypto==0.8
kiwisolver==1.4.5
kombu==5.4.2
langchain==0.1.7
langchain-community==0.0.20
langchain-core==0.1.23
langcodes==3.3.0
langsmith==0.0.87
lark==1.1.9
lazy_loader==0.4
libcomps==0.1.18
librosa==0.10.1
lightning-utilities==0.10.0
linkify-it-py==2.0.3
lit==17.0.6
llama-index==0.10.4
llama-index-agent-openai==0.1.1
llama-index-core==0.10.3
llama-index-embeddings-openai==0.1.1
llama-index-legacy==0.9.48
llama-index-llms-openai==0.1.1
llama-index-multi-modal-llms-openai==0.1.1
llama-index-program-openai==0.1.1
llama-index-question-gen-openai==0.1.1
llama-index-readers-file==0.1.3
llvmlite==0.42.0
lm-format-enforcer==0.10.1
lmdeploy==0.5.2.post1
lxml==5.1.0
Mako==1.3.3
Markdown==3.6
markdown-it-py==3.0.0
markdown2==2.4.13
MarkupSafe==2.1.4
marshmallow==3.20.2
matplotlib==3.8.2
matplotlib-inline==0.1.6
mdit-py-plugins==0.4.2
mdurl==0.1.2
memray==1.14.0
mistune==3.0.2
mlflow==2.11.3
mmengine-lite==0.10.4
modelscope==1.17.0
monotonic==1.6
moviepy==2.0.0.dev2
mpmath==1.3.0
ms-swift==2.2.5
msal==1.26.0
msgpack==1.0.8
multidict==6.0.4
multiprocess==0.70.15
munch==4.0.0
murmurhash==1.0.10
mutagen==1.47.0
mypy-extensions==1.0.0
nbclient==0.9.0
nbconvert==7.14.2
nbformat==5.9.2
nest-asyncio==1.5.8
netaddr==0.8.0
netifaces==0.10.6
networkx==3.2.1
nftables==0.1
nh3==0.2.15
ninja==1.11.1.1
nltk==3.8.1
notebook==7.0.6
notebook_shim==0.2.3
num2words==0.5.13
numba==0.59.0
numpy==1.26.3
nvidia-cublas-cu11==11.10.3.66
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu11==8.5.0.96
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu11==10.9.0.58
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu11==10.2.10.91
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu11==11.7.4.91
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==12.555.43
nvidia-nccl-cu11==2.14.3
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu11==11.7.91
nvidia-nvtx-cu12==12.1.105
olefile==0.46
omegaconf==2.3.0
openai==1.10.0
opencv-python==4.9.0.80
openpyxl==3.1.2
ordered-set==4.1.0
orjson==3.9.13
oss2==2.18.6
outlines==0.0.46
overrides==7.4.0
packaging==23.2
pandas==2.1.4
pandocfilters==1.5.0
parso==0.8.3
pathspec==0.12.1
patsy==0.5.6
pdfminer.six==20231228
peft==0.11.1
pexpect==4.8.0
pikepdf==2.16.1
pillow==10.2.0
platformdirs==3.11.0
ply==3.11
pooch==1.8.1
posthog==2.5.0
preshed==3.0.9
proglog==0.1.10
prometheus-client==0.19.0
prometheus-fastapi-instrumentator==7.0.0
prompt-toolkit==3.0.43
propcache==0.2.0
proto-plus==1.23.0
protobuf==3.20.3
psutil==5.9.7
ptyprocess==0.6.0
pure-eval==0.2.2
py-cpuinfo==9.0.0
pyairports==2.1.1
pyarrow==14.0.2
pyarrow-hotfix==0.6
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycountry==24.6.1
pycparser==2.20
pycryptodome==3.20.0
pydantic==2.5.3
pydantic_core==2.14.6
pydot==2.0.0
pydub==0.25.1
pyformlang==1.0.6
pygit2==1.15.1
Pygments==2.17.2
PyGObject==3.40.1
pygtrie==2.5.0
pyinotify==0.9.6
PyJWT==2.8.0
pymongo==4.10.1
PyMuPDF==1.23.22
PyMuPDFb==1.23.22
pynvml==11.5.0
pyparsing==3.1.1
pypdf==4.0.1
pypinyin==0.51.0
PySocks==1.7.1
pysrt==1.1.2
python-augeas==0.5.0
python-bidi==0.4.2
python-dateutil==2.8.2
python-dotenv==1.0.1
python-engineio==4.8.2
python-json-logger==2.0.7
python-ldap==3.4.3
python-multipart==0.0.7
python-socketio==5.11.0
python-yubico==1.3.3
pytorch-lightning==2.0.9.post0
pytz==2021.1
pyusb==1.0.2
PyWavelets==1.5.0
PyYAML==6.0.1
pyzmq==25.1.2
qrcode==6.1
qtconsole==5.5.1
QtPy==2.4.1
querystring-parser==1.2.4
ray==2.9.3
referencing==0.32.1
regex==2023.12.25
requests==2.31.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==13.7.0
rouge==1.0.1
rpds-py==0.17.1
rpm==4.16.1.3
rsa==4.9
ruamel.yaml==0.18.5
ruamel.yaml.clib==0.2.8
ruff==0.2.1
s3fs==2024.10.0
s3transfer==0.10.3
safetensors==0.4.1
scikit-image==0.19.3
scikit-learn==1.2.2
scipy==1.10.1
scmrepo==3.3.8
sconf==0.2.5
seaborn==0.13.2
selinux==3.5
semantic-version==2.10.0
semver==3.0.2
Send2Trash==1.8.2
sentence-transformers==2.5.0
sentencepiece==0.1.99
sentry-sdk==1.39.2
sepolicy==3.5
setools==4.4.3
setproctitle==1.3.3
Shapely==1.8.5.post1
shellingham==1.5.4
shortuuid==1.0.13
shtab==1.7.1
simple-websocket==1.0.0
six==1.16.0
smart-open==6.4.0
smmap==5.0.1
sniffio==1.3.0
sos==4.6.0
soundfile==0.12.1
soupsieve==2.5
soxr==0.3.7
spacy==3.7.4
spacy-legacy==3.0.12
spacy-loggers==1.0.5
SQLAlchemy==2.0.27
sqlparse==0.4.4
sqltrie==0.11.1
srsly==2.4.8
SSSDConfig==2.9.1
stack-data==0.6.3
starlette==0.37.2
statsmodels==0.14.2
subscription-manager==1.29.38
svgwrite==1.4.3
sympy==1.12
systemd-python==234
tabulate==0.9.0
tenacity==8.2.3
tensorboard==2.16.2
tensorboard-data-server==0.7.2
termcolor==2.4.0
terminado==0.18.0
textual==0.43.2
thinc==8.2.3
threadpoolctl==3.2.0
tifffile==2024.1.30
tiktoken==0.7.0
timm==0.9.12
tinycss2==1.2.1
tokenizers==0.19.1
tomli==2.0.1
tomlkit==0.12.0
toolz==0.12.1
torch==2.3.0
torchaudio==2.1.0
torchmetrics==1.3.0
torchvision==0.18.0
tornado==6.4
tqdm==4.66.1
trainer==0.0.36
traitlets==5.14.2
transformers @ git+https://github.com/huggingface/transformers@409fcfdfccde77a14b7cc36972b774cabc371ae1
transformers-stream-generator==0.0.5
triton==2.3.0
trl==0.9.6
typer==0.12.3
types-python-dateutil==2.8.19.20240106
typing-inspect==0.9.0
typing_extensions==4.12.2
tyro==0.8.5
tzdata==2023.4
tzlocal==5.2
uc-micro-py==1.0.3
ujson==5.10.0
Unidecode==1.3.8
uri-template==1.3.0
urllib3==1.26.20
uvicorn==0.27.0.post1
uvloop==0.19.0
vine==5.1.0
virtualenv-clone==0.5.7
vllm==0.5.1
vllm-flash-attn==2.5.9
voluptuous==0.15.2
wandb==0.16.2
wasabi==1.1.2
watchfiles==0.21.0
wavedrom==2.0.3.post3
wcwidth==0.2.12
weasel==0.3.4
webcolors==1.13
webencodings==0.5.1
websocket-client==1.7.0
websockets==11.0.3
Werkzeug==3.0.1
widgetsnbextension==4.0.9
wrapt==1.16.0
wsproto==1.2.0
xattr==1.1.0
xdg==6.0.0
xformers==0.0.26.post1
xxhash==3.4.1
yapf==0.40.2
yarl==1.16.0
zc.lockfile==3.0.post1
zipp==3.17.0
zss==1.2.0

Regarding how do I auth with Git server, I am not sure. I can’t get an answer from my client as they just say “You should use SSH”. But the SSH do not work as I get Connection timed out
when using SSH and HTTP work for cloning.

I am still talking with them but the error do not make any sense for them.