Automatically checking for missing dependencies

In our repo, I’m implementing a PR check in github to make sure that all dependencies in the pipeline are pushed to the remote before they are allowed to merge their PR.

The script that runs this check runs on a Jenkins server with a the PR branch checked out. It does not download the DVC-tracked files (i.e., it does not run dvc pull), as these are many gigabytes and would take a long time and take up a lot of space.

So far, I’ve been able to implement this check by having the Jenkins server running dvc status -c --json, and then checking the output for any statuses other than “deleted” (which usually means the file is available remotely but does not exist locally).

This works great for dependencies tracked by DVC. However, in one case, one of the dependencies was supposed to be tracked in git and was not tracked in DVC (i.e., it was not an output of a dvc pipeline stage and had not been added to dvc via dvc add). The coder forgot to add the file to git, but test still passed, because the dvc status -c --json simply returned “deleted” for that file.

Is there some way to catch this kind of error? In other words, is there a command I can run to automatically check whether there are any dependencies not tracked by DVC that are missing?

I think you should be able to use dvc status(no -c) and check for missing deps and then filter out dvc-tracked ones somehow (maybe through dvc list).

But indeed, the lack of a baked-in solution is a problem. The simplest one would probably be to introduce some kind of flag for dvc status to check remote as well when checking for missing files, so that you would ideally get everything you want in one command.