In a pipeline, if
folder1 is the output of
stage1 and used in
stage2, then dvc will know about it if we run
dvc run -d folder1 -o folder2 python stage2.py
Equivalently, since stage1 creates a dvc file, we could run:
dvc run -d stage1.dvc -o folder2 python stage2.py
What is the recommended way ?
Hi @tmain !
If you specify the
stage1.dvc as a dependency for
stage2.dvc, dvc will not be able to track dependencies of
stage1.dvc when you call
dvc repro stage2.dvc because the DAG is built using explicitly specified dependencies and outputs by tying them together, but
stage1.dvc doesn’t consider
stage1.dvc file as an output of the stage, but rather as a metafile describing the stage. So the recommended way would be to explicitly specify outputs of stage1 as dependency of the stage2.
Thanks for the prompt clarification !