First off thanks for a great tool! While we don’t use the pipelines very much, we do use DVC to store our data in my team, several sets of 100s of GBs each.
I often work with one data set at the time, usually for several weeks. It would be nice to be able to clear the data sets not currently in use completely from my local machine. We use a monorepo for everything, and if I understand gc correctly, the reason my cache doesn’t get cleared is because there are .dvc files of all sets in the branch head.
Since there are some datasets I almost never use, it would be nice to be able to clear them completely from the local cache and then once i need it I’ll take my punishment and wait for it to download from the remote using dvc pull. My question, I guess, is if there is a nice way of doing this that I’m missing? Currently I’ve resorted to manually deleting everything i the cache folder every few months to start fresh and pull what I need. It works but doesn’t feel like the correct way to go about it.
All the best!