Since v2, dvc pull does not pull imported files

This is an awkward bug to report, as I’m unable to reproduce it with a toy example (despite creating some repos just for that purpose), and cannot share the original bug, as it happens in private repos.

Perhaps it’s some configuration weirdness on my end - hence a forum post.

When using dvc>=2, upon running dvc pull (also dvc pull -R and similar), files that are dvc import-ed from other repos don’t actually get pulled. dvc immediately returns with Everything is up to date, without even cloning the repos that we should import from.

The only way to pull the files seems to be specifying each file individually - dvc pull some_dir/some_file.dvc.

When using dvc<2 in the same repos - everything gets pulled.

Note: this behavior is consistent for all of my teammates, in several different repos. Tested on Ubuntu and Mac. Effectively it stops us from upgrading to dvc v2, as we heavily rely on dvc import.

A couple of facts that might be relevant:

  • we usually have multiple DVC repos per git repo, in sub-directories
  • we’re using S3 and sometimes “local” remotes (the latter refer to different DVC repos in the same git repo)

I would really appreciate any suggestions for how to investigate further. As mentioned, simple imports in toy repos seem to work fine.

Thanks a lot!
Tomasz

Hi @tpietruszka

Are your .dvc files inside .gitignore’d directories?

In DVC 2.0 .dvc and dvc.yaml files are no longer allowed to be gitignored. This change was made for performance reasons - in 2.0+ DVC will not traverse gitignored directories at all when looking for files to pull.

The solution in this case is to either un-ignore those directories, or to add the appropriate ! exclusions for .dvc files to your .gitignore

1 Like

Thank you so much @pmrowla ! gitignore is indeed the core of the problem.

While the .dvc files are not actually gitignored, the gitignore file is a bit complex - I guess DVC (dulwich?) doesn’t parse it exactly right

I was finally able to reproduce the problem in a toy example - GitHub - tpietruszka/dvc-import-bug: Reproducing a bug with dvc import where dvc import should import a small png file, but in dvc >2 it does not.

The gitignore looks like:

data/**
!data/**/
!data/**/*.dvc

(I use this pattern a lot - gitignore everything in a directory, but traverse subdirectories, and don’t ignore a specific extension)

I guess this is effectively the same problem as mentioned here: dvc cannot handle gitignore patterns · Issue #5748 · iterative/dvc · GitHub ? (needs to be fixed via dulwich or changing the DVC’s git engine?)

Yes, it looks like it’s the same issue as the one you linked. It would be great if you could comment on the GH ticket and note that you are experiencing the same problem, as it helps us prioritize what bugs need to be addressed soonest.

1 Like