Hi,
I want to use dvc import
to retrieve dvc-managed data from a remote git repository. The repository is located on a remote server. The data itself is located in an internal S3 remote. The S3 remote’s credentials are configured in the git repository, including the key file location. I can clone the git repo using git and use dvc pull
to retrieve the data from the remote.
However, dvc import
does not work as expected. It either does not seem to be able to access the remote repo itself or maybe the S3 remote (?). Short version (full output below):
user@machine:~/tmp/try_dvc_data_registry$ dvc import --verbose git@host:repo.git data
2024-03-15 20:27:02,456 DEBUG: v3.48.4 (pip), CPython 3.9.1 on Linux-6.1.0-17-amd64-x86_64-with-glibc2.36
2024-03-15 20:27:02,456 DEBUG: command: /home/user/tmp/try_dvc_data_registry/.venv/bin/dvc import --verbose git@host:repo.git data
2024-03-15 20:27:02,834 DEBUG: Removing output 'file.dill' of stage: 'file.dill.dvc'.
2024-03-15 20:27:02,834 DEBUG: Removing '/home/user/tmp/try_dvc_data_registry/file.dill'
Importing 'data (git@host:repo.git)' -> 'file.dill'
2024-03-15 20:27:02,836 DEBUG: Computed stage: 'file.dill.dvc' md5: 'd6176c15517db1426864eb7f2e39a46a'
2024-03-15 20:27:02,836 DEBUG: 'md5' of stage: 'file.dill.dvc' changed.
2024-03-15 20:27:02,836 DEBUG: Creating external repo git@host:repo.git@None
2024-03-15 20:27:02,836 DEBUG: erepo: git clone 'git@host:repo.git' to a temporary dir
2024-03-15 20:27:09,303 ERROR: failed to import 'data' from 'git@host:repo.git'. - Git failed to fetch ref from 'git@host:repo.git'
...
_pygit2.GitError: authentication required but no callback set
...
scmrepo.exceptions.SCMError: Git failed to fetch ref from 'git@host:repo.git'
...
dvc.scm.SCMError: Git failed to fetch ref from 'git@host:repo.git'
I thought it might be related to these issues
Which however both appear to be long fixed and I am using a much newer version (3.48.4, see dvc doctor below). Interestingly, though, the second issue mentions that dvc==2.45.0
might not have this issue. And indeed when I switch to that version, it seems that at least the git repo can be accessed, indicated by the fact that DVC now complains about unallowed keys that it definitely retrieved from the remote repository:
2024-03-15 20:53:23,812 ERROR: failed to import 'data/output/file.dill' from 'git@host:repo.git'. - '../../../dvc.lock' validation failed in revision '9290b7c': 30 errors: extra keys not allowed @ data['stages']['ltv']['deps'][0]['hash']
So unfortunately I cannot use this older version, as it is not compatible with the data repository that was created with a recent version.
However, this might hint that the issue is related to the linked ones, even though these have been thought fixed. Unless I am doing something horribly wrong, that is.
I’d truly appreciate any help, as we urgently need the import and related functionality. Thank you!
Best regards
Jonas
Dvc doctor
$ dvc doctor
DVC version: 3.48.4 (pip)
-------------------------
Platform: Python 3.9.1 on Linux-6.1.0-17-amd64-x86_64-with-glibc2.36
Subprojects:
dvc_data = 3.14.1
dvc_objects = 5.1.0
dvc_render = 1.0.1
dvc_task = 0.3.0
scmrepo = 3.3.0
Supports:
http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3)
Config:
Global: /home/user/.config/dvc
System: /etc/xdg/dvc
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/vda1
Repo: dvc, git
Full traceback
user@machine:~/tmp/try_dvc_data_registry$ dvc import --verbose git@host:repo.git data
2024-03-15 20:27:02,456 DEBUG: v3.48.4 (pip), CPython 3.9.1 on Linux-6.1.0-17-amd64-x86_64-with-glibc2.36
2024-03-15 20:27:02,456 DEBUG: command: /home/user/tmp/try_dvc_data_registry/.venv/bin/dvc import --verbose git@host:repo.git data
2024-03-15 20:27:02,834 DEBUG: Removing output 'file.dill' of stage: 'file.dill.dvc'.
2024-03-15 20:27:02,834 DEBUG: Removing '/home/user/tmp/try_dvc_data_registry/file.dill'
Importing 'data (git@host:repo.git)' -> 'file.dill'
2024-03-15 20:27:02,836 DEBUG: Computed stage: 'file.dill.dvc' md5: 'd6176c15517db1426864eb7f2e39a46a'
2024-03-15 20:27:02,836 DEBUG: 'md5' of stage: 'file.dill.dvc' changed.
2024-03-15 20:27:02,836 DEBUG: Creating external repo git@host:repo.git@None
2024-03-15 20:27:02,836 DEBUG: erepo: git clone 'git@host:repo.git' to a temporary dir
2024-03-15 20:27:09,303 ERROR: failed to import 'data' from 'git@host:repo.git'. - Git failed to fetch ref from 'git@host:repo.git'
Traceback (most recent call last):
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/funcy/flow.py", line 84, in reraise
yield
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/scmrepo/git/backend/pygit2/__init__.py", line 698, in fetch_refspecs
for head in remote.ls_remotes(callbacks=cb)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/pygit2/remotes.py", line 171, in ls_remotes
self.connect(callbacks=callbacks, proxy=proxy)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/pygit2/remotes.py", line 120, in connect
payload.check_error(err)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/pygit2/callbacks.py", line 99, in check_error
check_error(error_code)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/pygit2/errors.py", line 65, in check_error
raise GitError(message)
_pygit2.GitError: authentication required but no callback set
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/scm.py", line 59, in map_scm_exception
yield
File "/usr/local/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/repo/open_repo.py", line 23, in _external_repo
path = _cached_clone(url, rev)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/repo/open_repo.py", line 134, in _cached_clone
clone_path, shallow = _clone_default_branch(url, rev)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/funcy/decorators.py", line 47, in wrapper
return deco(call, *dargs, **dkwargs)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/funcy/flow.py", line 246, in wrap_with
return call()
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/funcy/decorators.py", line 68, in __call__
return self._func(*self._args, **self._kwargs)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/repo/open_repo.py", line 198, in _clone_default_branch
git = clone(url, clone_path)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/scm.py", line 154, in clone
fetch_all_exps(git, url, progress=pbar.update_git)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/repo/experiments/utils.py", line 280, in fetch_all_exps
scm.fetch_refspecs(url, refspecs, progress=progress, **kwargs)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/scmrepo/git/__init__.py", line 359, in fetch_refspecs
return self._fetch_refspecs(
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/scmrepo/git/__init__.py", line 307, in _backend_func
result = func(*args, **kwargs)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/scmrepo/git/backend/pygit2/__init__.py", line 703, in fetch_refspecs
remote.fetch(
File "/usr/local/lib/python3.9/contextlib.py", line 135, in __exit__
self.gen.throw(type, value, traceback)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/funcy/flow.py", line 88, in reraise
raise into from e
scmrepo.exceptions.SCMError: Git failed to fetch ref from 'git@host:repo.git'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/commands/imp.py", line 15, in run
self.repo.imp(
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/repo/imp.py", line 44, in imp
return self.imp_url(path, out=out, erepo=erepo, frozen=True, **kwargs)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/repo/__init__.py", line 58, in wrapper
return f(repo, *args, **kwargs)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/repo/scm_context.py", line 143, in run
return method(repo, *args, **kw)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/repo/imp_url.py", line 86, in imp_url
stage.run(jobs=jobs, no_download=no_download)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/funcy/decorators.py", line 47, in wrapper
return deco(call, *dargs, **dkwargs)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/stage/decorators.py", line 44, in rwlocked
return call()
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/funcy/decorators.py", line 68, in __call__
return self._func(*self._args, **self._kwargs)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/stage/__init__.py", line 603, in run
self._sync_import(dry, force, kwargs.get("jobs", None), no_download)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/funcy/decorators.py", line 47, in wrapper
return deco(call, *dargs, **dkwargs)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/stage/decorators.py", line 44, in rwlocked
return call()
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/funcy/decorators.py", line 68, in __call__
return self._func(*self._args, **self._kwargs)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/stage/__init__.py", line 640, in _sync_import
sync_import(self, dry, force, jobs, no_download)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/stage/imports.py", line 56, in sync_import
stage.save_deps()
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/stage/__init__.py", line 496, in save_deps
dep.save()
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/dependency/repo.py", line 63, in save
rev = self.fs.repo.get_rev()
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/fs/dvc.py", line 565, in repo
return self.fs.repo
File "/usr/local/lib/python3.9/functools.py", line 969, in __get__
val = self.func(instance)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/fs/dvc.py", line 198, in repo
repo = self._make_repo(**self._repo_kwargs)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/fs/dvc.py", line 275, in _make_repo
with Repo.open(uninitialized=True, **kwargs) as repo:
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/repo/__init__.py", line 297, in open
return open_repo(url, *args, **kwargs)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/repo/open_repo.py", line 60, in open_repo
return _external_repo(url, *args, **kwargs)
File "/usr/local/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/usr/local/lib/python3.9/contextlib.py", line 135, in __exit__
self.gen.throw(type, value, traceback)
File "/home/user/tmp/try_dvc_data_registry/.venv/lib/python3.9/site-packages/dvc/scm.py", line 64, in map_scm_exception
raise into # noqa: B904
dvc.scm.SCMError: Git failed to fetch ref from 'git@host:repo.git'
2024-03-15 20:27:09,309 DEBUG: Analytics is enabled.
2024-03-15 20:27:09,376 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpflvigfpp', '-v']
2024-03-15 20:27:09,388 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpflvigfpp', '-v'] with pid 1003932