Best practice for dependencies


#1

In a pipeline, if folder1 is the output of stage1 and used in stage2, then dvc will know about it if we run dvc run -d folder1 -o folder2 python stage2.py

Equivalently, since stage1 creates a dvc file, we could run: dvc run -d stage1.dvc -o folder2 python stage2.py

What is the recommended way ?

Thanks !


#2

Hi @tmain !

If you specify the stage1.dvc as a dependency for stage2.dvc, dvc will not be able to track dependencies of stage1.dvc when you call dvc repro stage2.dvc because the DAG is built using explicitly specified dependencies and outputs by tying them together, but stage1.dvc doesn’t consider stage1.dvc file as an output of the stage, but rather as a metafile describing the stage. So the recommended way would be to explicitly specify outputs of stage1 as dependency of the stage2.

Thanks,
Ruslan


#3

Thanks for the prompt clarification !