`dvc import` fails with `Git failed to fetch ref from"

Hi,

I want to use dvc import to retrieve dvc-managed data from a remote git repository. The repository is located on a remote server. The data itself is located in an internal S3 remote. The S3 remote’s credentials are configured in the git repository, including the key file location. I can clone the git repo using git and use dvc pull to retrieve the data from the remote.

However, dvc import does not work as expected. It either does not seem to be able to access the remote repo itself or maybe the S3 remote (?). Short version (full output below):

user@machine:~/tmp/try_dvc_data_registry$ dvc import --verbose git@host:repo.git data
2024-03-15 20:27:02,456 DEBUG: v3.48.4 (pip), CPython 3.9.1 on Linux-6.1.0-17-amd64-x86_64-with-glibc2.36
2024-03-15 20:27:02,456 DEBUG: command: /home/user/tmp/try_dvc_data_registry/.venv/bin/dvc import --verbose git@host:repo.git data
2024-03-15 20:27:02,834 DEBUG: Removing output 'file.dill' of stage: 'file.dill.dvc'.
2024-03-15 20:27:02,834 DEBUG: Removing '/home/user/tmp/try_dvc_data_registry/file.dill'
Importing 'data (git@host:repo.git)' -> 'file.dill'
2024-03-15 20:27:02,836 DEBUG: Computed stage: 'file.dill.dvc' md5: 'd6176c15517db1426864eb7f2e39a46a'
2024-03-15 20:27:02,836 DEBUG: 'md5' of stage: 'file.dill.dvc' changed.
2024-03-15 20:27:02,836 DEBUG: Creating external repo git@host:repo.git@None
2024-03-15 20:27:02,836 DEBUG: erepo: git clone 'git@host:repo.git' to a temporary dir
2024-03-15 20:27:09,303 ERROR: failed to import 'data' from 'git@host:repo.git'. - Git failed to fetch ref from 'git@host:repo.git'
...
_pygit2.GitError: authentication required but no callback set
...
scmrepo.exceptions.SCMError: Git failed to fetch ref from 'git@host:repo.git'
...
dvc.scm.SCMError: Git failed to fetch ref from 'git@host:repo.git'

I thought it might be related to these issues

Which however both appear to be long fixed and I am using a much newer version (3.48.4, see dvc doctor below). Interestingly, though, the second issue mentions that dvc==2.45.0 might not have this issue. And indeed when I switch to that version, it seems that at least the git repo can be accessed, indicated by the fact that DVC now complains about unallowed keys that it definitely retrieved from the remote repository:

2024-03-15 20:53:23,812 ERROR: failed to import 'data/output/file.dill' from 'git@host:repo.git'. - '../../../dvc.lock' validation failed in revision '9290b7c': 30 errors: extra keys not allowed @ data['stages']['ltv']['deps'][0]['hash']

So unfortunately I cannot use this older version, as it is not compatible with the data repository that was created with a recent version.

However, this might hint that the issue is related to the linked ones, even though these have been thought fixed. Unless I am doing something horribly wrong, that is.

I’d truly appreciate any help, as we urgently need the import and related functionality. Thank you!

Best regards
Jonas


Dvc doctor

$ dvc doctor
DVC version: 3.48.4 (pip)
-------------------------
Platform: Python 3.9.1 on Linux-6.1.0-17-amd64-x86_64-with-glibc2.36
Subprojects:
  dvc_data = 3.14.1
  dvc_objects = 5.1.0
  dvc_render = 1.0.1
  dvc_task = 0.3.0
  scmrepo = 3.3.0
Supports:
  http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
  https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3)
Config:
  Global: /home/user/.config/dvc
  System: /etc/xdg/dvc
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/vda1
Repo: dvc, git

Full traceback

user@machine:~/tmp/try_dvc_data_registry$ dvc import --verbose git@host:repo.git data
2024-03-15 20:27:02,456 DEBUG: v3.48.4 (pip), CPython 3.9.1 on Linux-6.1.0-17-amd64-x86_64-with-glibc2.36
2024-03-15 20:27:02,456 DEBUG: command: /home/user/tmp/try_dvc_data_registry/.venv/bin/dvc import --verbose git@host:repo.git data
2024-03-15 20:27:02,834 DEBUG: Removing output 'file.dill' of stage: 'file.dill.dvc'.
2024-03-15 20:27:02,834 DEBUG: Removing '/home/user/tmp/try_dvc_data_registry/file.dill'
Importing 'data (git@host:repo.git)' -> 'file.dill'
2024-03-15 20:27:02,836 DEBUG: Computed stage: 'file.dill.dvc' md5: 'd6176c15517db1426864eb7f2e39a46a'
2024-03-15 20:27:02,836 DEBUG: 'md5' of stage: 'file.dill.dvc' changed.
2024-03-15 20:27:02,836 DEBUG: Creating external repo git@host:repo.git@None
2024-03-15 20:27:02,836 DEBUG: erepo: git clone 'git@host:repo.git' to a temporary dir
2024-03-15 20:27:09,303 ERROR: failed to import 'data' from 'git@host:repo.git'. - Git failed to fetch ref from 'git@host:repo.git'
Traceback (most recent call last):
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/funcy/flow.py", line 84, in reraise
    yield
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/scmrepo/git/backend/pygit2/__init__.py", line 698, in fetch_refspecs
    for head in remote.ls_remotes(callbacks=cb)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/pygit2/remotes.py", line 171, in ls_remotes
    self.connect(callbacks=callbacks, proxy=proxy)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/pygit2/remotes.py", line 120, in connect
    payload.check_error(err)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/pygit2/callbacks.py", line 99, in check_error
    check_error(error_code)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/pygit2/errors.py", line 65, in check_error
    raise GitError(message)
_pygit2.GitError: authentication required but no callback set

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/scm.py", line 59, in map_scm_exception
    yield
  File "/usr/local/lib/python3.9/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/repo/open_repo.py", line 23, in _external_repo
    path = _cached_clone(url, rev)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/repo/open_repo.py", line 134, in _cached_clone
    clone_path, shallow = _clone_default_branch(url, rev)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/funcy/decorators.py", line 47, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/funcy/flow.py", line 246, in wrap_with
    return call()
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/funcy/decorators.py", line 68, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/repo/open_repo.py", line 198, in _clone_default_branch
    git = clone(url, clone_path)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/scm.py", line 154, in clone
    fetch_all_exps(git, url, progress=pbar.update_git)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/repo/experiments/utils.py", line 280, in fetch_all_exps
    scm.fetch_refspecs(url, refspecs, progress=progress, **kwargs)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/scmrepo/git/__init__.py", line 359, in fetch_refspecs
    return self._fetch_refspecs(
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/scmrepo/git/__init__.py", line 307, in _backend_func
    result = func(*args, **kwargs)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/scmrepo/git/backend/pygit2/__init__.py", line 703, in fetch_refspecs
    remote.fetch(
  File "/usr/local/lib/python3.9/contextlib.py", line 135, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/funcy/flow.py", line 88, in reraise
    raise into from e
scmrepo.exceptions.SCMError: Git failed to fetch ref from 'git@host:repo.git'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/commands/imp.py", line 15, in run
    self.repo.imp(
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/repo/imp.py", line 44, in imp
    return self.imp_url(path, out=out, erepo=erepo, frozen=True, **kwargs)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/repo/__init__.py", line 58, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/repo/scm_context.py", line 143, in run
    return method(repo, *args, **kw)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/repo/imp_url.py", line 86, in imp_url
    stage.run(jobs=jobs, no_download=no_download)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/funcy/decorators.py", line 47, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/stage/decorators.py", line 44, in rwlocked
    return call()
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/funcy/decorators.py", line 68, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/stage/__init__.py", line 603, in run
    self._sync_import(dry, force, kwargs.get("jobs", None), no_download)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/funcy/decorators.py", line 47, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/stage/decorators.py", line 44, in rwlocked
    return call()
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/funcy/decorators.py", line 68, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/stage/__init__.py", line 640, in _sync_import
    sync_import(self, dry, force, jobs, no_download)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/stage/imports.py", line 56, in sync_import
    stage.save_deps()
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/stage/__init__.py", line 496, in save_deps
    dep.save()
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/dependency/repo.py", line 63, in save
    rev = self.fs.repo.get_rev()
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/fs/dvc.py", line 565, in repo
    return self.fs.repo
  File "/usr/local/lib/python3.9/functools.py", line 969, in __get__
    val = self.func(instance)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/fs/dvc.py", line 198, in repo
    repo = self._make_repo(**self._repo_kwargs)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/fs/dvc.py", line 275, in _make_repo
    with Repo.open(uninitialized=True, **kwargs) as repo:
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/repo/__init__.py", line 297, in open
    return open_repo(url, *args, **kwargs)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/repo/open_repo.py", line 60, in open_repo
    return _external_repo(url, *args, **kwargs)
  File "/usr/local/lib/python3.9/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/usr/local/lib/python3.9/contextlib.py", line 135, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/scm.py", line 64, in map_scm_exception
    raise into  # noqa: B904
dvc.scm.SCMError: Git failed to fetch ref from 'git@host:repo.git'

2024-03-15 20:27:09,309 DEBUG: Analytics is enabled.
2024-03-15 20:27:09,376 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpflvigfpp', '-v']
2024-03-15 20:27:09,388 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpflvigfpp', '-v'] with pid 1003932

Update:

  • As the error message suggests, the issue is that the git repository cannot be accessed. When I clone the repository manually (no files in dvc cache) and apply dvc import based on that repo the s3 remote can be accessed.
  • Updating libgit2 to 1.5 did not solve the issue.

Hi. We use multiple backend implementations for Git operations, mostly dulwich and pygit2.

Looking at your stack trace, it seems the git repository was successfully cloned using dulwich.
But when dvc tried to fetch some references, it tried to use pygit2 and failed.

pygit2 backend does not support ssh remotes, so it is expected to fail. But it should fallback to dulwich backend, which does not seem to be happening. We try to fallback to dulwich if url starts with git@ or has either of git://, ssh://, git+ssh:// schemes.

Could you please share what structure the url has? I believe you are using SCP url that is like git@ but does have a different username(?).

Also please check if you can import with urls with either of git://, ssh://, or git+ssh:// schemes.

1 Like

Hi!

Thanks for the hints! It did not solve my problem immediately, but prompted me to play with dulwich a bit, which ultimately lead to the solution! :partying_face:

Two things:

  • As you mentioned, I added ssh:// to make dvc fall back to dulwich. This was required because my url actually starts with gitea@, not git@, so the fallback did not happen without it. So my url became ssh://gitea@git.something.org:FOO/myrepo.git.
  • This still did not work, giving gaierror: [Errno -2] Name or service not known. It turns out that dulwich cannot handle a colon between server address and path to the repo, as used in SSH urls I suppose. I changed the colon to a forward slash ssh://gitea@git.something.org/FOO/myrepo.git and that made it work!

Thanks a lot, you have helped me a great deal! :smiling_face:

Best regards
Jonas

Sorry, I forgot to mention that you have to replace the colon with / in ssh urls.

I have created a fix in support scp-style shorthand urls with users other than git@ by skshetry · Pull Request #346 · iterative/scmrepo · GitHub. It’d be great if you can test it out and see if it fixes the issue for you. :slight_smile:

To install from the PR, use the following command:

pip install "scmrepo @ git+https://github.com/iterative/scmrepo@refs/pull/346/merge"

Hi! Sorry, for the delayed reply and thanks a lot for the fix. I can’t test it right now though, but I’ll come back to this as soon as possible and will let you know then.