Hi folks,
I have a data analysis problem which I want to use DVC to help me with. I have a series of many different datasets which each need to have an analysis stage applied to them. Each dataset is a folder with a specific set of (large) files, and the same python analysis code needs to be applied to each one.
Rather than just run one monolithic script that loops over all the datasets, I would like to create a pipeline to help me handle this. I am imagining some sort of pipeline where the script gets applied to each dataset separately (and possibly in parallel).
Is something like this even possible with dvc?
Thanks,
Steve