DVC 1.0 release

Suor · June 23, 2020, 10:55am

I’ll get into Run cache more:

dvc add still creates a pointer file to some data. You are still supposed to commit it to git and it still contains checksum reference to your data. In this sense dvc is still versions your data with git.
Run cache works for dvc run/repro. When you run a command you have a combination of deps: data files, code and params, which produce some result. This result is saved into run cache and dvc.lock file. If you commit the changes to lock file to git then it works the same as before.
You typically use run cache by tweaking your code or params and rerunning some stage without intermediate commits. If you happen to return to a combination of code, params and data you already tried the stage result will be fetched from cache instead of rerunning the stage command on dvc repro.
You don’t need to have S3 or any other remote cache to use run cache so far. If you do have some remote then your local run cache will be sent and received to and from remote along with usual cache on dvc push/pull commands. This enables you and your teammates to save quickly reuse each others results on different machines. This also enables CI or any other cloud/remote/background job to add precalculated runs to run cache, which may be quickly fetched later, e.g. in your dev environment or on production system.

Topic		Replies	Views
Git Flow for DVC 🌿 General	5	8421	December 11, 2020
`dvc pull --run-cache [target]` Questions	16	2304	July 18, 2020
Versioning predictions Questions	7	952	February 10, 2021
Parameterlike dependencies Questions	12	1514	December 12, 2020
Simplifying `dvc run` and pipelines Feature Requests	3	1748	August 29, 2019