DVC compared with GitLFS for storage and versioning only

Thanks @nickrsan!

More context here https://discordapp.com/channels/485586884165107732/563406153334128681/629825409097138186

The obvious one - Github still has 2GB limit of the file size with Git LFS - https://help.github.com/en/articles/about-git-large-file-storage

Second, yes - caching and better/explicit data management, like pull/push data partially

Third, advanced data management - utilize reflinks, hardlinks, etc to do checkout in a very fast manner, push/pull are using parallelism to save/download data (I’m not sure about LFS), ability to use a shared cache to save space if multiple projects use the same data and/or multiple people use the same machine

Fourth, features like dvc import / dvc get and dvc.api python interface - a really great way to reuse data or model files. Use cases like data registry for example via a Github repo with all the history, etc
Or with dvc get and dvc.api - model deployment

And those differences are only about data management :slight_smile: no touching pipelines, mertics part (which I actually also consider a part of the data management layer).

2 Likes