Hello,
Iv been trying to use dvc to manage my experiments and like it very much so for but I encountered rather unexpected behavior.
Anytime I commit the code, the very first experiment which I performs afterwards cannot be applied to workspace. The command “dvc exp run” will proceed without errors and environment is modified accordingly. But when I change the environment and then try to “dvc exp apply” it back, no changes to the environment are made (again, no errors or warnings). Applying any other experiments works as expected.
The minimal example is the following:
dvc.yaml:
stages:
main:
cmd: Rscript src/main.R
params:
- par.yaml:
metrics:
- out.yaml:
cache: false
par.yaml:
a: 1
b: 1
c: 1
d: 1
main.R
library(yaml)
par ← yaml.load_file(“par.yaml”)
out ← list(
perf = par$a * 1 + par$b * 0.1 + par$c * 0.01 + par$d * 0.001
)
print(out)
write_yaml(out, “out.yaml”)
However, I believe that the issue may not be with the dvc.yaml but rather in my workflow as the same happens also when creating the environment with “dvc exp init” and when cloning the official example repo. Em I supposed to call some other command after new commit to prime the environment for new experiments? I had not found anything in the reference manual.
Right now I’m sidestepping the issue by running the first experiment twice and disregarding the first one but it is cumbersome for computationally costly experiments.
Thank you very much for your help.
But when I change the environment and then try to “dvc exp apply” it back
Could you please clarify what you are modifying when you “change the environment” before using dvc exp apply
?
Of course. I by that meant modifying the par.yaml either manually or via -S
option and reruning the model to regenerate outputs. In particular, for the following (slightly modified) setup
dvc.yaml:
stages:
main:
cmd: Rscript src/main.R
deps:
- src
params:
- par.yaml:
metrics:
- out.yaml:
cache: false
main.R:
library(yaml)
par <- yaml.load_file("par.yaml")
b <- 9
out <- list(
perf = par$a * 1 + b * 0.1
)
print(out)
write_yaml(out, "out.yaml")
par.yaml:
a: 9
I do the following:
- change values of
a
and b
to 1 to simulate the process of working on both the code and parameters
- commit all
- run
dvc exp run -n exp_a
and observe that perf=1.1 is created as expected
- run
dvc exp run -n exp_b -S "par.yaml:a=2"
and observe that perf=2.1 is created as expected
- run
dvc exp run -n exp_c -S "par.yaml:a=3"
and observe that perf=3.1 is created as expected
- Now I want return to exp_a. I call
dvc exp apply exp_a
. However, a
in par.yaml stays at its current value, in this case 3. Interestingly though, the out.yaml gets restored to the correct value of 1.1.
- If I try to apply any other exp, say exp_b or c, everything works as expected and both par.yaml and out.yaml are restored.
I also tried to perform another experiment (say exp_0) right before the commit (so that the commit is published with the correct value of out.yaml) In this case, neither par.yaml nor out.yaml is restored when calling dvc exp apply exp_a
.
Em I using it right? Thank you.
I’ve confirmed that I can reproduce the issue, this is a bug in DVC. To clarify, the changes from the first exp are applied except for files where the file in the experiment result is unchanged from the file in the initial git commit. So in your case, the changes for the output out.yaml
are applied, but since par.yaml
is unchanged (both the original commit and the experiment result contain par.yaml:a=2
) exp apply
is essentially ignoring the params file and does not modify it.
I’ve opened a github issue for this bug, you can subscribe to it for further updates: exp apply: git-tracked files which are unchanged from HEAD are not reset on apply · Issue #8764 · iterative/dvc · GitHub
Great, thank you for opening the issue.
Keep up with the great work, it is pleasure to use DVC.