Hello! I’ve started using DVC and I love it, thank you for your work!
I have a question regarding DVC and MLFlow combination. I hope someone can help me with that.
I am using DVC to build/run pipelines and version data and models. And I am using MLFlow to have a nice overview of all the experiments and to visualize/store metrics and plots of the experiments.
During the training and evaluation stages, I am logging stuff to MLFlow (including current git commit id for reproducibility).
Let’s say I want to experiment with a different learning rate.
My actions are:
- I am updating the learning rate in params.yaml
git commit -m 'starting a new run with updated learning rate'
dvc repro(new MLFlow run with metrics and current git commit id is created)
- I check the results and I like them. I want to save the model.
git add 'dvc.lock'
git commit -m 'awesome learning rate'
So I have 2 commits here. Commit A before the run and commit B after.
Let’s say I checked my MLFlow dashboard and I want to get the trained model of the last run.
It is linked to the commit A.
git checkout commit A, I won’t get the trained model until I
dvc repro again.
And I can’t log commit B to the MLFlow run during training/evaluation because I haven’t created commit B yet.
Can’t really see the whole picture… How to do it properly?
Any ideas would be very helpful!
Thanks in advance