I have a VM on which my local git repo exists. On top of that i have installed dvc on same machine. Now when i add data to dvc it will be in dvc cache and on git push, commit same data will go to git repo as well. Is my understanding correct? If yes then their will be two copies of data and size will keep increasing as data grows. I am not using any remote repo for data.
Thanks @dashohoxha , that looks great!
@writetoneeraj No, data won’t go into your git repo, only tiny metafiles (DVC-files), so there won’t be duplication happening. E.g. if you
dvc add data, then
data.dvc(tiny yaml metafile) will be stored by git, but the
data itself will be stored by dvc, so on
git push only that
data.dvc will be uploaded, but the
data itself will stay on your machine.