Force running experiment from a specified stage

I have a pipeline that looks something like this

prep_data --> train

train --> evaluate_test_data
train --> evaluate_holdout_data
train --> training_report

I am iterating the train step. For instance, I added logging of feature importance (using log_plot) and logging of hyperparameters (using log_params)

I have run this experiment before, so naturally the previously generated train, test data and model are cached.

After making changes, when I run the experiment using dvc exp run, it uses the previously generated model that’s cached and fails to generate the hyperparameters and plot (given that the step hasn’t run)

I can see there is a dvc exp -f parameter to force run the whole pipeline, however is there a way to force run from a specific stage, so that only the train step and any further downstream steps are reprocessed and not the entire pipeline.

1 Like

You can do that using --downstream flag, which will run only stages after (and including) the target: dvc exp run dvc.yaml:evaluate_test_data -f --downstream

2 Likes