Dvc repro --force-downstream not working

Hi,

I setup some dvc data pipeline and use dvc repro command generated some .pkl files. I later on decided to regenerate the files (due to code change, where the dependency wasn’t captured in the dvc.yaml file), so I deleted the file manually in the directory. This is probably the first mistake I made.

So now when I run dvc repro, dvc refuse to re-run and report nothing has been change.

I thought dvc repro --force-downstream [target] should work, but it’s still reporting “everything is up to date” and not reproducing the stage.

How do I force dvc to re-run the stage to regenerate the .pkl file?

Thanks,

Hi,
would you mind sharing your dvc.yaml? Did you add the output pkl files as stage outputs?

Hi,

Short answer is yes. Please see the train and test stages from my dvc.yaml below.

training:
cmd: python3 code/train.py
params:
- train
deps:
- ./data/SWT_transform/Hisar_test_data_firstOrder.npy
- ./data/SWT_transform/Hisar_test_data_secondOrder.npy
- ./data/SWT_transform/Hisar_test_data_total.npy
- ./data/SWT_transform/Hisar_train_data_firstOrder.npy
- ./data/SWT_transform/Hisar_train_data_secondOrder.npy
- ./data/SWT_transform/Hisar_train_data_total.npy
outs:
- ./model/rbf_svm_3_.pkl #The model ID needs to be updated for each new model generated
- ./model/scaler3.bin

test:
cmd: python3 code/test.py
params:
- test
deps:
- ./model/rbf_svm_3_.pkl
- ./model/scaler3.bin

Thanks! The pipeline itself looks fine, but you might want to add code/train.py to dependencies. That way, when you change train.py and run dvc repro, the training stage will be re-run since one of its dependencies changed.

To force reproducing without adding train.py you could use dvc repro -f