Dear Colleagues,
we are using DVC in our workgroup and use a shared stroage and shared cache among projects and users, as we often share the same datasets as input for our projects.
While doing house-keeping on the dvc-storage and cache I noticed that we accumulated a vast amount of small files in the runs
subfolder. As I unserstand this subfolder is keeping track of all stage runs across all repos and experiments, and this can sum up to many files over track of time.
I see the benefit of having the run-cache to avoid repeating experiments that have already been processed. However, I think we don’t need the run-cache for long-time storage. Here we only want to stora data that explicitly connects to a dvc.lock file in a git-repo (at some revisions). Therefore we now use the no-run-cache
option when using dvc push.
Here are my questions:
(1) is it save to entirely remove the runs
folder from our storage, if our work is always commited to a dvc.lock file?
(2) is it possible to permanently configure a repo to use dvc push and dvc pull with the no-run-cache
option, e.g. in the .dvc/config
?
Your thoughts are very much appreciated!
Best,
Max