However, I’m trying to understand your scenario here and I don’t quite get it. You are trying to “download” data already tracked in your own project, and put it in the same location it already exists? That’s what dvc get . data/test_data --out data/ looks like.
Also, is there a data/test_data.dvc file in your project? Otherwise the command should fail, I believe. Maybe the error message is wrong and this is the problem. If so would you mind opening a bug report for that in https://github.com/iterative/dvc/issues?
p.s. please note we’ve officially released DVC 1.0! Your project looks like it’s still using 0.x — we highly recommend migrating.
6 months later, I’m more comfortable with dvc and bash, I can better explain my problem. At the highest level, I wanted a one-liner that downloads all .dvc files in a given sub-folder and I didn’t want to write a for-loop. I don’t want to do dvc pull, because I’m not going to be using the version-control features in a CI pipeline and don’t need a cache.
The original directory structure I posted wasn’t clear. Here is an improved illustration where each DVC file points to a full folder.
You would use Git to sync .dvc files though. I’ll assume you mean to get the actual data tracked by DVC.
I’m still confused about the . URL given to get. So the environment in CI is a copy of the DVC repo, but you’re just trying to avoid dvc pull? If so a) you’re already using Git it so avoiding SCM doesn’t seem to be a concern (or I don’t get what the problem is); and b) similarly, “not needing a cache” shouldn’t be a concern? Think about that as some internal mechanism DVC uses during the download process which is not a problem: the final result is that you get the data files you want placed where you want them.
BTW note that pull accepts targets so you can tell it to only download the data in that folder (not everything in the project).
The only other path I can think of now would be to change the structure of your DVC repo. Instead of dvc adding each file in data/test_assets (resulting in multiple .dvc files) you can add the whole directory, so you can do this from anywhere in the CI (don’t even need to clone the repo):
$ dvc get https://<Git URL to DVC repo) data/test_assets --out data/
As for changes in DVC 1.x or 2.x (coming up very soon!), some commands are getting a new --glob option to accept wildcards but get doesn’t have it. Not sure if it’s planned but feel free to request that feature in our repo!
In hindsight, I don’t know why I was fixated on “not needing a cache”. I guess I just wanted my CI containers to consume as few resources (including disk space) as possible. However, I never checked my assumption that the cache even took up that much space in this case, so I’m just going to use dvc pull now.