DVC compared with GitLFS for storage and versioning only

shcheklein · October 5, 2019, 12:49am

More context here https://discordapp.com/channels/485586884165107732/563406153334128681/629825409097138186

The obvious one - Github still has 2GB limit of the file size with Git LFS - https://help.github.com/en/articles/about-git-large-file-storage

Second, yes - caching and better/explicit data management, like pull/push data partially

Third, advanced data management - utilize reflinks, hardlinks, etc to do checkout in a very fast manner, push/pull are using parallelism to save/download data (I’m not sure about LFS), ability to use a shared cache to save space if multiple projects use the same data and/or multiple people use the same machine

Fourth, features like dvc import / dvc get and dvc.api python interface - a really great way to reuse data or model files. Use cases like data registry for example via a Github repo with all the history, etc
Or with dvc get and dvc.api - model deployment

And those differences are only about data management no touching pipelines, mertics part (which I actually also consider a part of the data management layer).

Topic		Replies	Views
DVC Heartbeat - Discord gems Announcements	3	4165	June 27, 2019
DVC local storage usecase Questions	6	1605	January 20, 2021
Trouble modifying and saving dvc data file which lives outside the repo Questions	22	3571	July 15, 2020
`dvc pull --run-cache [target]` Questions	16	2305	July 18, 2020
Data (registry) and remote GPU cluster with local DVC repositories Questions	6	715	July 5, 2022

DVC compared with GitLFS for storage and versioning only

Related topics