Thanks @nickrsan!
More context here https://discordapp.com/channels/485586884165107732/563406153334128681/629825409097138186
The obvious one - Github still has 2GB limit of the file size with Git LFS - https://help.github.com/en/articles/about-git-large-file-storage
Second, yes - caching and better/explicit data management, like pull/push data partially
Third, advanced data management - utilize reflinks, hardlinks, etc to do checkout in a very fast manner, push/pull are using parallelism to save/download data (I’m not sure about LFS), ability to use a shared cache to save space if multiple projects use the same data and/or multiple people use the same machine
Fourth, features like dvc import
/ dvc get
and dvc.api
python interface - a really great way to reuse data or model files. Use cases like data registry for example via a Github repo with all the history, etc
Or with dvc get
and dvc.api
- model deployment
And those differences are only about data management no touching pipelines, mertics part (which I actually also consider a part of the data management layer).