DVC 1.0 release

I’ll get into Run cache more:

  1. dvc add still creates a pointer file to some data. You are still supposed to commit it to git and it still contains checksum reference to your data. In this sense dvc is still versions your data with git.
  2. Run cache works for dvc run/repro. When you run a command you have a combination of deps: data files, code and params, which produce some result. This result is saved into run cache and dvc.lock file. If you commit the changes to lock file to git then it works the same as before.
  3. You typically use run cache by tweaking your code or params and rerunning some stage without intermediate commits. If you happen to return to a combination of code, params and data you already tried the stage result will be fetched from cache instead of rerunning the stage command on dvc repro.
  4. You don’t need to have S3 or any other remote cache to use run cache so far. If you do have some remote then your local run cache will be sent and received to and from remote along with usual cache on dvc push/pull commands. This enables you and your teammates to save quickly reuse each others results on different machines. This also enables CI or any other cloud/remote/background job to add precalculated runs to run cache, which may be quickly fetched later, e.g. in your dev environment or on production system.
4 Likes