Force running experiment from a specified stage

hjoshi · July 7, 2024, 11:30pm

I have a pipeline that looks something like this

prep_data --> train

train --> evaluate_test_data
train --> evaluate_holdout_data
train --> training_report

I am iterating the train step. For instance, I added logging of feature importance (using log_plot) and logging of hyperparameters (using log_params)

I have run this experiment before, so naturally the previously generated train, test data and model are cached.

After making changes, when I run the experiment using dvc exp run, it uses the previously generated model that’s cached and fails to generate the hyperparameters and plot (given that the step hasn’t run)

I can see there is a dvc exp -f parameter to force run the whole pipeline, however is there a way to force run from a specific stage, so that only the train step and any further downstream steps are reprocessed and not the entire pipeline.

andboy · July 22, 2024, 10:16am

You can do that using --downstream flag, which will run only stages after (and including) the target: dvc exp run dvc.yaml:evaluate_test_data -f --downstream

Topic		Replies	Views
Run pipeline from a stage Questions	2	701	February 28, 2020
Experiment duplicates Questions	0	267	April 18, 2023
Challenges with non-standard DVC Pipeline Questions	2	90	June 21, 2024
Dvc repro --force-downstream not working Questions	3	733	March 18, 2022
How to run dvc exp without pulling all dependency graph Questions	0	22	November 13, 2024

Force running experiment from a specified stage

Related topics