How to deal with dvc run-cache?

Dear Colleagues,

we are using DVC in our workgroup and use a shared stroage and shared cache among projects and users, as we often share the same datasets as input for our projects.
While doing house-keeping on the dvc-storage and cache I noticed that we accumulated a vast amount of small files in the runs subfolder. As I unserstand this subfolder is keeping track of all stage runs across all repos and experiments, and this can sum up to many files over track of time.
I see the benefit of having the run-cache to avoid repeating experiments that have already been processed. However, I think we don’t need the run-cache for long-time storage. Here we only want to stora data that explicitly connects to a dvc.lock file in a git-repo (at some revisions). Therefore we now use the no-run-cache option when using dvc push.

Here are my questions:
(1) is it save to entirely remove the runs folder from our storage, if our work is always commited to a dvc.lock file?
(2) is it possible to permanently configure a repo to use dvc push and dvc pull with the no-run-cache option, e.g. in the .dvc/config?

Your thoughts are very much appreciated!
Best,
Max

Yes, it should be safe to remove it.

I think the default is to not push / pull run cache:

Default is --no-run-cache.

Could you confirm that?

Thank you for your reply! I renamed the runs folder and, so far, have not found that I lost anything.
As for the default settings: thank you for pointing it out. You’re right it says so in the docs.

1 Like