Error with dvc import

Hi,

My team and I are evaluating dvc for use within our projects. All of our data and code are stored on a NAS which is mounted to several different machines. We are currently testing with two repositories located at:
/nas/projects/project1/code/user1/repo1
/nas/projects/project1/code/user2/repo2

And the shared cache is located at:
/nas/data/project1/dvc

Repo1 currently has two files being tracked by dvc:
data/test/file1.pq.gz
data/test/file2.pq.gz

We are trying to import these files from repo1 to repo2. From user2/repo2/data/test, we are running:
dvc import /nas/projects/project1/code/user1/repo1 data/test/file1.pq.gz

And we are receiving the following error:
ERROR: unexpected error - [Errno 2] No such file or directory: PosixPathInfo: '../../../../../../../../../../tmp/tmptxvv7xk0dvc-clone/data/test/file1.pq.gz'

We have tried this from multiple machines with both absolute and relative paths to the project. We have tried with DVC installed both in venv and with conda. We receive the same error, but sometimes the number of “…/” in the path changes between absolute and relative paths. The kicker is that I ran into this issue the other day and, off a hunch, decided to try it on another machine. It worked, but then going back to the original machine, the command ran for file2 without issue. Then, today, we started receiving the error again while testing.

I don’t know if this is a bug, a configuration issue, or some other error on our part, so any thoughts or insight into potential issues are greatly appreciated.

Thanks for the report! Would you mind sharing the outputs of dvc doctor and dvc import -v ...? I also wonder what changes between file1 and file2 so please also share the command that worked for you (I assume it is dvc import /nas/projects/project1/code/user1/repo1 data/test/file2.pq.gz). One helpful thing might be the list of files in the repo, which can be achieved through dvc list. https://dvc.org/doc/command-reference/list Try it with --dvc-only to see the tracked files and share the output (feel free to mask the actual filenames).

The command that you have for file2 is correct.

dvc list --dvc-only -R .:

data/test/file1.pq.gz
data/test/file2.pq.gz

dvc doctor:

DVC version: 2.1.0 (pip)
---------------------------------
Platform: Python 3.6.8 on Linux-3.10.0-1160.6.1.el7.x86_64-x86_64-with-centos-7.9.2009-Core
Supports: http, https
Cache types: hardlink, symlink
Cache directory: nfs4 on remote.storage.inside.network.com:/nas
Caches: local
Remotes: None
Workspace directory: nfs4 on remote.storage.inside.network.com:/nas
Repo: dvc, git

dvc import -v:

[user2@/nas/projects/project1/code/user2/dvc_test]$ env/bin/dvc import -v ../../user1/repo1/ data/test/file1.pq.gz
2021-05-24 13:43:44,940 DEBUG: Check for update is enabled.
2021-05-24 13:43:46,927 DEBUG: Removing output 'file1.pq.gz' of stage: 'file1.pq.gz.dvc'.
2021-05-24 13:43:46,927 DEBUG: Removing 'file1.pq.gz'
Importing 'data/test/file1.pq.gz (../../user1/repo1/)' -> 'file1.pq.gz'
2021-05-24 13:43:46,934 DEBUG: Computed stage: 'file1.pq.gz.dvc' md5: '7e562729f1306769a103b98c4705df2a'
2021-05-24 13:43:46,934 DEBUG: 'md5' of stage: 'file1.pq.gz.dvc' changed.
2021-05-24 13:43:46,936 DEBUG: Creating external repo ../../user1/repo1/@None
2021-05-24 13:43:46,937 DEBUG: erepo: git clone '../../user1/repo1/' to a temporary dir
2021-05-24 13:43:48,371 DEBUG: Checking if stage '/tmp/tmp3hsesjwmdvc-clone/data/test/file1.pq.gz' is in 'dvc.yaml'
2021-05-24 13:43:48,373 DEBUG: Assuming '/tmp/tmp3hsesjwmdvc-clone/data/test/file1.pq.gz' to be a stage inside 'dvc.yaml'
2021-05-24 13:43:48,428 ERROR: unexpected error - [Errno 2] No such file or directory: PosixPathInfo: '../../../../../../../../tmp/tmp3hsesjwmdvc-clone/data/test/file1.pq.gz'
------------------------------------------------------------
Traceback (most recent call last):
  File "/nas/projects/project1/code/user2/dvc_test/env/lib64/python3.6/site-packages/dvc/main.py", line 55, in main
    ret = cmd.run()
  File "/nas/projects/project1/code/user2/dvc_test/env/lib64/python3.6/site-packages/dvc/command/imp.py", line 22, in run
    jobs=self.args.jobs,
  File "/nas/projects/project1/code/user2/dvc_test/env/lib64/python3.6/site-packages/dvc/repo/imp.py", line 7, in imp
    path, out=out, fname=fname, erepo=erepo, frozen=True, **kwargs
  File "/nas/projects/project1/code/user2/dvc_test/env/lib64/python3.6/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/nas/projects/project1/code/user2/dvc_test/env/lib64/python3.6/site-packages/dvc/repo/scm_context.py", line 14, in run
    return method(repo, *args, **kw)
  File "/nas/projects/project1/code/user2/dvc_test/env/lib64/python3.6/site-packages/dvc/repo/imp_url.py", line 80, in imp_url
    stage.run(jobs=jobs)
  File "/nas/projects/project1/code/user2/dvc_test/env/lib64/python3.6/site-packages/funcy/decorators.py", line 45, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/nas/projects/project1/code/user2/dvc_test/env/lib64/python3.6/site-packages/dvc/stage/decorators.py", line 36, in rwlocked
    return call()
  File "/nas/projects/project1/code/user2/dvc_test/env/lib64/python3.6/site-packages/funcy/decorators.py", line 66, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/nas/projects/project1/code/user2/dvc_test/env/lib64/python3.6/site-packages/dvc/stage/__init__.py", line 512, in run
    sync_import(self, dry, force, jobs)
  File "/nas/projects/project1/code/user2/dvc_test/env/lib64/python3.6/site-packages/dvc/stage/imports.py", line 47, in sync_import
    stage.deps[0].download(stage.outs[0], jobs=jobs)
  File "/nas/projects/project1/code/user2/dvc_test/env/lib64/python3.6/site-packages/dvc/dependency/repo.py", line 103, in download
    follow_subrepos=False,
  File "/nas/projects/project1/code/user2/dvc_test/env/lib64/python3.6/site-packages/dvc/objects/stage.py", line 180, in stage
    errno.ENOENT, os.strerror(errno.ENOENT), path_info
FileNotFoundError: [Errno 2] No such file or directory: PosixPathInfo: '../../../../../../../../tmp/tmp3hsesjwmdvc-clone/data/test/file1.pq.gz'
------------------------------------------------------------
2021-05-24 13:43:49,272 DEBUG: Version info for developers:
DVC version: 2.1.0 (pip)
---------------------------------
Platform: Python 3.6.8 on Linux-3.10.0-1160.6.1.el7.x86_64-x86_64-with-centos-7.9.2009-Core
Supports: http, https
Cache types: hardlink, symlink
Cache directory: nfs4 on remote.storage.inside.network.com:/nas
Caches: local
Remotes: None
Workspace directory: nfs4 on remote.storage.inside.network.com:/nas
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2021-05-24 13:43:49,276 DEBUG: Analytics is enabled.
2021-05-24 13:43:49,421 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpug76kojt']'
2021-05-24 13:43:49,422 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpug76kojt']'

I tried recreating the issue with two new repositories and could not reproduce the issue. I created this dvc_test repository and ran the following script to initialize the repository before running dvc import:

git init
python3 -m venv env
env/bin/pip install pip --upgrade
env/bin/pip install dvc
env/bin/dvc init
env/bin/dvc cache dir ../../../../data/dvc/
env/bin/dvc config cache.shared group

On a hunch, I tried activating the environment and running dvc import instead of env/bin/dvc import and received the same error. dvc doctor in repo1 looks the same as repo2 except repo1 has version 2.0.5. The relative path in the PosixPathInfo goes down to /tmp.

So if you try the version 2.0.5 it does work on both files on the original repo, right? If so, this seems like a bug that was introduced between 2.0.5-2.1.

No, the version does not seem to matter. The first time that it worked (between two instances of it not working), both were at 2.0.5. When I tried to reproduce the error with two new repos, both were using 2.1 and I could not reproduce. We were trying to import from 2.0.5 into 2.1, so I just upgraded the 2.0.5 repo to 2.1, but we’re still receiving the error.

We decided to recreate the repo today. We used dvc destroy and saw that 2.3.0 was released, so we upgraded before running dvc init again. We started receiving a new error “unable to open database file”. We ran through the same steps with repo2 with no success.

After that, I googled the new error and found unable to open database on mapped network drive · Issue #4420 · iterative/dvc · GitHub which then led to dvc: QA nfs/cifs/overlay/etc · Issue #5562 · iterative/dvc · GitHub

I’m not sure why the other error went away or why I couldn’t reproduce it with other repos. I am thinking it had to do with the NFS, though. I’ll keep this open until we test the fix for those issues and then report back here after.

1 Like