Once a model is trained, can the DAG be re-used for prediction?

jcrousse · September 23, 2019, 9:47am

Hi,

The examples I reviewed in the documentation seem to describe how to define, share and reproduce a train model pipeline. Once we are happy with our trained model and we want to move it into production, what would be the recommended approach to use DVC to ensure the pipeline consistency between train and predict?

I would like to re-use the DVC pipeline defined for training (feature engineering, processing,…) to ensure consistency and proper usage. On the other hand, the pipeline would also be somewhat different (each individual model script would “predict” instead on “training”).

Is there a recommended solution ? Should I create additional variables to tell each script whether to train or predict ? What to make of the metric at the end?

kupruser · September 24, 2019, 6:55am

Hi @jcrousse !

Thank you for your patience We don’t have any recommended approach for that yet, but your idea with additional vars (i imagine it would be some env var, right? e.g. PREDICT=1 dvc repro) should work. That would work nicely if for your pipeline it is just a matter of flicking a switch to go from training to prediction, otherwise it might get tricky and you would have to build a separate pipeline or somehow figureout the way to make current one work with the help of some additional flags. Are you mostly talking about using that in kind of a “production” setting? Or will it be something that you would want to keep in your project?

Thanks,
Ruslan

jcrousse · September 26, 2019, 9:34am

Thanks for the answer
Yes the use case is exactly to use the pipeline in production.
Once the model DAG becomes a bit complex, we would like to have only one DAG definition (the DVC one) and not to re-produce a similar DAG outside of DVC.
Otherwise there is too much or a risk to create mistakes or inconsistencies between the two.

Topic		Replies	Views
Using DVC to keep track of multiple model variants Questions	8	2626	August 21, 2020
Using DVC for end-to-end pipeline Questions	6	1616	January 5, 2019
Coding Patterns Development	3	1098	April 28, 2020
Right architecture for daily training Questions	6	602	July 12, 2023
How to continually update a model with new data Questions	2	406	February 14, 2022

Once a model is trained, can the DAG be re-used for prediction?

Related topics