dvc will always rerun
callback stages.
--outs-no-cachedoes not matter in this case.
The reason for this behavior is thatcallback stageshave been introduced to let user execute code that verifies whether something changed and let that information into the pipeline.
Example: checking if there are new entries in log storage to trigger importing and processing them.
I would expect the default would be to not rerun stages that don’t have dependencies, and that callback stage could be made by using --always-changed. I suppose I could add a dummy dependency to the get-corpus task, but that’s a bit clunky.
It seems that my use-case wont work with dvc pull without duplicating the data, but maybe making it work via a dvc repro is the next best thing.