In our repo, I’m implementing a PR check in github to make sure that all dependencies in the pipeline are pushed to the remote before they are allowed to merge their PR.
The script that runs this check runs on a Jenkins server with a the PR branch checked out. It does not download the DVC-tracked files (i.e., it does not run
dvc pull), as these are many gigabytes and would take a long time and take up a lot of space.
So far, I’ve been able to implement this check by having the Jenkins server running
dvc status -c --json, and then checking the output for any statuses other than “deleted” (which usually means the file is available remotely but does not exist locally).
This works great for dependencies tracked by DVC. However, in one case, one of the dependencies was supposed to be tracked in git and was not tracked in DVC (i.e., it was not an output of a dvc pipeline stage and had not been added to dvc via
dvc add). The coder forgot to add the file to git, but test still passed, because the
dvc status -c --json simply returned “deleted” for that file.
Is there some way to catch this kind of error? In other words, is there a command I can run to automatically check whether there are any dependencies not tracked by DVC that are missing?