dvc will always rerun
callback stages
.
--outs-no-cache
does not matter in this case.
The reason for this behavior is thatcallback stages
have been introduced to let user execute code that verifies whether something changed and let that information into the pipeline.
Example: checking if there are new entries in log storage to trigger importing and processing them.
I would expect the default would be to not rerun stages that don’t have dependencies, and that callback stage
could be made by using --always-changed
. I suppose I could add a dummy dependency to the get-corpus
task, but that’s a bit clunky.
It seems that my use-case wont work with dvc pull
without duplicating the data, but maybe making it work via a dvc repro
is the next best thing.