Add a remote directory without adding to the cache

dvc will always rerun callback stages .
--outs-no-cache does not matter in this case.
The reason for this behavior is that callback stages have been introduced to let user execute code that verifies whether something changed and let that information into the pipeline.
Example: checking if there are new entries in log storage to trigger importing and processing them.

I would expect the default would be to not rerun stages that don’t have dependencies, and that callback stage could be made by using --always-changed. I suppose I could add a dummy dependency to the get-corpus task, but that’s a bit clunky.

It seems that my use-case wont work with dvc pull without duplicating the data, but maybe making it work via a dvc repro is the next best thing.

1 Like