I’m trying to better understand how params.yaml works and in particular how changes in it are tracked. @jorgeorpinel suggested here to track params.yaml using git. So far so go. Now, when inspecting a .dvc generated by dvc run -p ... I see that the parameters from the params.yaml are not associated with MD5. This made me wonder: if I’m changing a parameter in the YAML (and commit the change to git), how dvc will know about it? How will a dvc repro ... know that a parameter changed?
1 Like
@drorata So, if you inspect the .dvc file you can note, that chosen params are stored there too.
So the detection is based on value stored in .dvc file versus value stored in params.yaml file.
Thats how DVC will know to rerun the stage on params.yaml changed.
Take a look at this script:
#!/bin/bash
rm -rf repo
mkdir repo
set -ex
pushd repo
git init --quiet
dvc init --quiet
echo data >> data
echo "lr: 0.1" >> params.yaml
dvc add data -q
dvc run -d data -o output -p lr "cat data >> output"
git add -A
git commit -am "init"
cat output.dvc | grep -A1 params
sed -i "s/0.1/0.2/g" params.yaml
dvc repro output.dvc
cat output.dvc | grep -A1 params
If you will run it, you can note that dvc notices the change in lr and reruns the command. After run it writes new value of lr to output.dvc.
2 Likes
Thanks for the great explanation!
1 Like