Best Practices: DVC with cloud training and local evaluation

I have been search for a solution to integrate DVC pipelines with a remote training and a local evaluation during development process of a ML model. First, I like to describe our use case as clear as possible. I will improve the description if you have any questions.


We have one git repository per project, where our code is stored. This includes code for training AI models, some Jupyter notebooks for data analysis, some notebooks for evaluation and unit tests.

Our current process looks like the following:

  1. (Experimental) Data analysis
  2. Development on an AI model
  3. Training of the model
  4. Evaluation of the model results, normally within a notebook

Steps 2 - 4 might be repeated various times. Since our models are normally not trainable on a local machine, we are using Google Cloud Vertex AI Custom Training, there you can basically run your training code inside a container. But you need to fulfill some specif requirements:

  1. Training code must be build as python package and uploaded to a bucket
  2. Data should also be available in bucket
  3. Results are stored in a bucket.

We are starting the training from our a local machine (this might change in the future, e.g. starting via GitHub Actions). In order to automate some steps above, we wrote a simple command line tool. This builds a python package and most imported queries the bucket location of our files track with DVC. We have the advantage that our DVC Remote Storage is always mounted to the container, and therefor we just need to replace the local path to the data files with remote paths in the bucket. Before each training, we fully commit our code (GIT) and data (DVC). We are able to assign the training runs in the cloud directly to a GIT commit. In some cases, the training container also directly predicts the results of our validation and test sets, since we need to make use of the GPUs, results are again stored in a bucket, which is assigned to the training.

Our evaluation of the model, is commonly experimental. Sometimes it is hard to determinate a fixed score before training a model, and we develop multiple metrics and plots for the evaluation. At the moment, we manually download the training results from the bucket and add with DVC in our repo. But often this done several commits later (this is bad, but happens). If we store the metrics and plots in the DVC format, this commit would not match the commit of the training, also the question is how to automated query the predictions from the bucket.


For now, I don’t have any Ideas how to connect the training step with the evaluation step. In an ideal world, I would create one DVC pipeline which does all the steps. But I even do not have an idea how to connect DVC to the cloud training, and even less how to connect to evaluation with the training.

Do you have any ideas or suggestions how to connect a cloud training with DVC and track the metrics assigned to the training results?

I do not expect a perfect solution and like to utilize this thread for discussion and exchange of ideas. Thanks for your help.