Thanks for taking the time to give feedback!
Problem 2
What do you mean by
just in the run cache
? Are you using only-O
options fordvc run
?
No, we’re using the -o
option, but maybe I’ve misunderstood how DVC works here.
My understanding is:
The CI Runner successfully executed the DVC pipeline, and now has the new outputs (trained model, processed data, …) as well as a new dvc.lock
in its working directory.
To make these outputs accessible from the developers local machine, I see two options:
a) The CI Runner does ci-runner: dvc push --run-cache
, afterwards the I do dev-local: dvc pull --run-cache && dvc repro
.
- Disadvantages:
- AFAIK I can’t pull the run cache for just one stage (I need to pull all the preprocessed data as well as the trained model, even if I only want
dvc pull --run-cache model_training
) - Looking at a git branch, it’s not immediately obvious whether a successful
dvc repro
has been executed here (compared to a commit with a changeddvc.lock
) - I couldn’t find anything in the garbage collection docs regarding how to clean the run cache, or which runs stay cached after cleaning
- AFAIK I can’t pull the run cache for just one stage (I need to pull all the preprocessed data as well as the trained model, even if I only want
- Advantages:
- No extra commit necessary, to check out the results of an experiment, I can pull the commit with which an experiment was started
b) ci-runner: git add dvc.lock && git commit && git push && dvc push
- Disadvantages:
- CI-Runner needs to be configured to be able to commit to a branch
- The “results” (the dvc.lock) is stored in a separate commit from the one which started the experiments (relevant for e.g., MLFLow which stores the commit hash of an experiment)
- Advantages
- The changed
dvc.lock
in a branch tells me that this experiment has been successfully executed - I can pull single dvc stages (
dvc pull model_training
) without downloading all preprocessed data
- The changed
For all these reasons we opted for option b)
Problem 3
Your new solution (dvc exp run {stage_name} --params epochs=3
) seems ideal, thanks for the hint! Together with the parameterlike dependencies we can streamline our project a lot