I would like to use DVC for project that’s not ML but a sensitivity analysis for a bunch of a parameters of a simulation model.
The DAG of my pipline looks basically like the diagram below. There’s a single “command” file (box at the top) in which all simulations are defined (a csv file with one row for each simulation). Each column in the 3x4 blocks represents the set of steps for one simulation (preprocessing, running, postprocessing). Finally, combined results are created from groups of simulations (the 3x4 blocks).
I want to make sure that, if a change is made to the command file for one simulation, only the stuff for that simulation is redone. It would be OK if the first step (the arrows from the command file to the first row) is always run for all simulations (this is a minor step). But for the following steps should be done only for the relevant simulations.
I could create a stage for every simulation and every step but this is hassle (in reality there are hundreds of simulations). Also I want to make it easy to add more simulations, by adding more lines to the command csv file.
Is there functionality in DVC to handle this kind of situation? Is that a way to restructure my project to make things easier?