DVC PyCon 2019 and Updates v0.19 - v0.35

We will be talking about DVC at PyCon 2019 May 4th in Cleveland :snake: :tada:! Please come and stop by our booth on Saturday, May 4th. We will be happy to chat!

We’ve launched the DVC Patreon campaign - it’s one of the ways to support the project if you like it.

Now, let’s highlight the changes (not including bug fixes, and minor improvements) we have done in the last few months:

  • :label: We received a lot of feedback that using Git branches is not always an optimal way to manage experiments. We have added an option to support Git tags (Git commits are coming). The new option -T or --all-tags is supported by all DVC commands that support-a or --all-branches.

  • :open_book: Get started guide has been simplified (e.g. to use tags instead of branches) and extended. We have also prepared a Github DVC project that reflects the sequence of steps in the “get started” guide. You can now download the whole project and reproduce all the models.

  • dvc diff command introduced . Summary statistics for the directory/file under the DVC control. How many files were added/deleted/modified/size:

(HEAD)$ tree image     (HEAD^)$ tree image
images                 images
├── color.png          └── grey.png
└── grey.png

$ dvc diff -t images HEAD^1

diff for 'images'
-images with md5 ad0a6adcd409cae3263b28487064e1f2.dir
+images with md5 283215dface0d41291482330324632fc.dir

1 file not changed, 0 files modified, 1 file added, 0 files deleted, size was increased by 15.3 MB
  • We’ve introduced the dvc commit command and dvc run/repro/add --no-commit flag to give a way to avoid uncontrolled cache growth and as a way to save some dvc repro runs. In the future we plan to have “do-not-cache-my-data” as a default mode for dvc run, dvc add and dvc repro.

  • SSH remotes (data storage) support - config options to set port, key files, timeouts, password, etc + improved stability and Windows support! Introduced HTTP remotes - external dependencies and as a read-only cache.

  • Control over where DVC files are located in your project - place them wherever you want with the -f option supported by all relevant commands - dvc add, dvc run, and dvc import.

  • :slightly_smiling_face:A lot of UI improvements . Starting from the finally fixed nasty issue with Windows terminal printing a lot of garbage symbols, to using progress bars for checkouts, better metrics output, and lots of smaller things:

  • :zap:Performance optimizations. The most notable one is the migration from using the plain JSON file to the embedded SQLLite engine to cache file and directory checksums, another one is improved performance, stability and general user experience for the commands that navigate tags or branches (all the commands that include --all-bracnhes, -a or --all-tags, -T).

There are new DVC integrations and plugins available:

Don’t hesitate to like\star DVC repository if you haven’t yet. We are waiting for your feedback!