Best Practice - Test pipeline with smaller dataset?

I currently have a DVC pipeline that takes about 12 hours to run. I would like a way to test the entire pipeline on a small subset of my data, so that I can quickly verify each stage after a code change. Ideally, this would be an option on dvc repro. This leads me to think that I can configure my pipeline to have separate “test” versions of each stage, using the same cmds as their full counterparts, but using different parameters to reduce the time to reproduce.

Is this the correct approach? Is there a cleaner way to do this? How have other people solved this problem?

Hi! There’s no correct approach, but I and others have used a separate Git branch with a smaller dataset to do debugging. If that’s similar to what you are thinking, then it should not be a problem. Good luck!