Here is my use case:
- I train a model automatically through whatever way (CML, Airflow)
- Now I have a
weights.dbfile, which is versioned by DVC alongside other intermediary data files
- I do a
dvc push --run-cacheas I do not want to version with
dvc.locksince I am not committing anything
–> Now there are several files in my S3.
–> Time to run my model in prod
- In a Docker container, run
dvc pull --run-cache generate_weights(the stage that created
- DVC starts to pull all files from remote, not only previous
weights.dbversions, but also caches from other stages.
This makes my approach unfeasible as pulling all files from remote takes a long time.
dvc pull --run-cache [target]pull all files?
- Should I be doing it the way I am doing? I am avoiding CML/Git-related solutions.