How to automate dvc pull request for a single file?

We are using dvc for heavy AI-ML model files in our gitlab repository.
Lets say, with the help of DVC, we can easily push a model file ‘X’ to cloud but while pulling same file on some other server, we have to use command “dvc pull X”.

Currently, we run this command “dvc pull X” manually everytime we update our file X. Since, we dont want to pull all the updated files on cloud therefore it is necessary for us to specify ‘X’ in our dvc pull requests.

My question is how can we automate this dvc pull request in our CI yaml file for a single file X, if this X is a variable for our file name ?

Hi @Somya,

Could you set the file name as an environment variable in the CI system? (Perhaps via GitLab CI/CD environment variables | GitLab if you’re using that)

Then if the env var name is FILE_NAME, you can

$ dvc pull $FILE_NAME

(from the CI job script)

Depending on your use case though, you may want to check out the dvc get or dvc import commands: https://dvc.org/doc/command-reference/get, https://dvc.org/doc/command-reference/import.

Hi @jorgeorpinel,
Thanks for your reply.
Regarding the solution based on environment variable, the file needs to be pushed from server 1 and get pulled on server 2, and if we will set file name in an environment variable then we will be doing that on server 2 and my problem will still persist. How can I communicate this environment variable between 2 branches of my gitlab repository ?

I will definitely checkout dvc get and dvc import as suggested.

Thanks and Regards

@Somya

How can I communicate this environment variable between 2 branches of my gitlab repository ?

Is it about communication between branches, or rather two different builds?
Are they run in parallel, or is one depending on another?

Hi @Paffciu,
My use case involves one feature branch which is then merged with “master” branch.
Feature branch is just a clone of master branch on my local machine from which I will push my dvc file. And master branch means deployment of same repository on some other server with master branch. I want to pull a dvc file from master branch.
To answer your question, master branch is dependent on feature branch and do not run in parallel.

@Somya
And how does your workflow look like?

  1. You do some changes in feature branch
  2. You push the changed data with dvc
  3. You push the git changes into master branch in core repo
  4. You update your master branch on server 2
    And
  5. You want to update only the stage that has been updated in step 3.
    Am I right so far?
1 Like

Hi @Paffciu,
My workflow looks like following :

  1. I do some changes in feature branch
  2. I push the changed data with dvc
  3. I merge master branch from feature branch using merge request feature of gitlab
  4. Git pull all the changes in repository deployed on server 2 with master branch (Continuous integration written in gitlab config yaml file)

But as I pushed a dvc file from my feature branch, I would like to pull that file from master branch on server 2. A simple command “dvc pull $filename” can do that trick on server 2 but how can I automate this command if $filename is not fixed.