Hi all, I’m having trouble wrapping my head around how to combine dvc import
and dvc pull
.
I have two DVC projects. One project is a registry, and the other project is a separate, standalone project that relies on some data in the registry. Let’s call this project.
On Computer A, I used dvc import
to import some data from registry into project. registry and project do not share a cache, so the imported data was then copied into project’s cache. I also created some DVC pipeline stages, ran them, and the outputs were checked into DVC as well.
I then added a DVC Google Cloud Storage remote and ran dvc push
from Computer A to push the data to the cloud. I also pushed the code and all DVC-related files to my git remote.
On Computer B I pulled the code (git pull
) and then attempted to dvc pull
. It pulled the outputs of the pipeline stages, but not the data that I had imported to project from registry on Computer A.
Am I missing a part of the puzzle here? I expected this combination of git push
+ dvc push
from Computer A along with git pull
and dvc pull
on Computer B to enable me to relatively seamlessly replicate the project across machines, but that doesn’t seem to be the case. Should I have added the registry data to project using dvc get
instead of dvc import
? Is there a way to ensure that data added using dvc import
is also able to be pulled to other machines in the future?