Pull only files belonging to a specific remote?

Is it possible to pull only files belonging to specific remotes?

I have an example reference-data.dvc file where I’ve added the remote field, specifying “reference-data” as the value. However, when I run dvc pull -r test-data, then that reference-data.dvc data is still being pulled. Even though that .dvc file is specifically specified to belong to the “reference-data” remote. I believe this happens because the “test-data” remote wrongly also contains the data that’s in the reference-data.dvc file. However, even so, I don’t want that pull to happen because I’m only interested in the “test-data”. How can I accomplish this? I wish to pull all files belonging to a specific remote, without having to manually find those .dvc files and pull them explicitly.

This question is spun off from How to configure that specific files should go to different remotes with DVC? - Stack Overflow.

1 Like

Hey, here is the discussion on this in this ticket pull: how to only download data from a specified remote? · Issue #8298 · iterative/dvc · GitHub . Please come and chime in.

As @daavoo suggested, you could also try to use a custom script:

import os
from dvc.repo import Repo
repo = Repo()
outs = [
    os.path.relpath(out.fs_path, repo.root_dir)
    for out in repo.index.outs 
    if out.remote == "storage"
repo.pull(outs, allow_missing=True)