How cache is maintained for big data size locally

Hi,
I have a VM on which my local git repo exists. On top of that i have installed dvc on same machine. Now when i add data to dvc it will be in dvc cache and on git push, commit same data will go to git repo as well. Is my understanding correct? If yes then their will be two copies of data and size will keep increasing as data grows. I am not using any remote repo for data.

@writetoneeraj Could you please have a look at this tutorial: https://katacoda.com/dvc ? It explains how DVC manages the data. If you still have any questions please come back and ask.

1 Like

Thanks @dashohoxha , that looks great!

@writetoneeraj No, data won’t go into your git repo, only tiny metafiles (DVC-files), so there won’t be duplication happening. E.g. if you dvc add data, then data.dvc(tiny yaml metafile) will be stored by git, but the data itself will be stored by dvc, so on git push only that data.dvc will be uploaded, but the data itself will stay on your machine.

1 Like

p.s. @writetoneeraj the official get started guide is at Get Started with DVC. Your question has to do with the Add Files chapter.

@dashohoxha this tutorial is really good.

1 Like