I setup some dvc data pipeline and use dvc repro command generated some .pkl files. I later on decided to regenerate the files (due to code change, where the dependency wasn’t captured in the dvc.yaml file), so I deleted the file manually in the directory. This is probably the first mistake I made.
So now when I run dvc repro, dvc refuse to re-run and report nothing has been change.
I thought dvc repro --force-downstream [target] should work, but it’s still reporting “everything is up to date” and not reproducing the stage.
How do I force dvc to re-run the stage to regenerate the .pkl file?
would you mind sharing your
dvc.yaml? Did you add the output
pkl files as stage outputs?
Short answer is yes. Please see the train and test stages from my dvc.yaml below.
cmd: python3 code/train.py
- ./model/rbf_svm_3_.pkl #The model ID needs to be updated for each new model generated
cmd: python3 code/test.py
Thanks! The pipeline itself looks fine, but you might want to add
code/train.py to dependencies. That way, when you change
train.py and run
dvc repro, the
training stage will be re-run since one of its dependencies changed.
To force reproducing without adding
train.py you could use
dvc repro -f