Hi, I have a script that takes hours to complete. I already run it outside dvc and collected its output. Is there a way to add a stage to the dag with that script avoiding to run it again, pointing the right input/output/dependencies? In other words, since computation takes too much time, i would like to skip it and let dvc do the hashes and complete the run without actually executing the script. Is it possible?
You could possibly try out
--no-exec parameter to create dvc run (this would not execute the stages, but rather create/modify the dvc.yaml) and use
dvc commit to lock your files in.
It seems the solution, but I am using multiple stages generation feature writing directly the dvc.yaml file. Is it possible to skip execution with dvc repro?
@mauro, you can just do
dvc commit, which will generate the
dvc.lock file and store the outputs to
dvc's cache. This way, it should not run again on
repro, do verify it with
dvc status first though.
it worked, thanks @skshetry!