Dvc.yaml for feature analysis

Hi everyone.

I’m a beginner in DVC. I would like to have an ability to analyse my generating features.

I have some stages in my dvc.yaml file: unzip, prepare, featurize, train, evaluate.

Is it possible to add a stage between featurize and train steps for example clusterization stage parallel to main line and make something like a new branch from featurize step? Could I run this stpe when it needs and not always. This step should genereate some plots for analysis.

So the graph of stages should be like this:

image

What is the best prctice of using dvc in thises step?

Yes, your pipeline does not have to be linear and can branch off of each other. If you want to run only the pipeline that runs train and evaluate, you could run dvc exp run evaluate, which will run only the steps needed to generate the evaluate stage. When you want to run clusterization, you could instead do dvc exp run clusterization.

2 Likes