Declare python version and packages versions as dependencies

Hello!

I recently started using DVC and I found the tutorials very useful and clear. However I still have a question about a use case of dvc run.

I know that one can declare parameters and files to be dependencies of a pipeline. However, in most cases also

  • the python version
  • the python packages versions
    are fundamental dependencies.

To give an example, running the same pipeline (= same files, same parameters) with python 2.7 or python 3.x can give different results. In the same way, if we have loaded tensorflow 1.0 or 2.0, the results may vary.

So, I would like the pipeline to be re-run even if the python version or packages versions have changed and anything else has remained the same.

How can I keep track of this kind of special dependencies for my pipeline?

Thank you in advance!

Best,

—Francesco

Hi @fra-csl,

Yes, it would be nice if DVC could know that code dependencies have changed and thus re-execute the necessary stages at dvc repro but as you can imagine, there are countless programming languages that DVC would need to know in detail (DVC is platform agnostic i.e. not just for Python code), and recursively analyze every source file to make a list of package dependencies to keep track of, and then know how to check their versions, etc.

This is not what DVC aims to solve at it’s core (although it’s an occasional feature request [1] [2] and partially under consideration — feel free to chime in), but rather to help with the data aspects of the pipeline.

THAT SAID, there is a relatively simple workaround for now which is to print the package names and versions to a text file (e.g. with pip freeze and add that file as a dependency of the stage(s) in question :slightly_smiling_face:

1 Like

Thank you for the clarification! I actually think we are going to use Dockerfiles to really make sure that non only the python packacages have the right version (pip freeze > requirements.txt) but the whole env is precisely the same. Then DVC will make sure that all the data and parameter dependencies are consistent.

Thank you again for your support.

1 Like