How to permanently stop tracking a file/folder

I’m having issues with dvc after I added a folder to dvc then attempted to remove it. Specifically, I was originally tracking notebooks using get and I decided to see if the experience was better using dvc - in the end I decided to revert back to using git.

I started with dvc add /src/notebooks and I added the entire folder to dvc. This appears to copy all the files into the cache (renaming each as a hash) and creating a symlink.

I couldn’t find any way of deleting (dvc remove appears to remove stages not files/folders) so to revert the add I removed all the references I could find to /src/notebooks and added back to git

Now every time I checkout files from git, they get replaced with the original symlink to the old file that’s still in the dvc cache

In the end I had to track down each and every symlink and manually remove it from the cache.

What’s the proper way to remove a file/folder from dvc - having it automatically relink files that happen to be the same names as files that used to be tracked causes all sorts of issues? Even adding src/notebooks to .dvcignore didn’t work

Also note I’ve read Deleting a DVC tracked file from history but the recommended

dvc gc --workspace

Had no affect, i.e. it reported

WARNING: This will remove all cache except items used in the workspace of the current repo.
Are you sure you want to proceed? [y/n]: y
No unused ‘repo’ cache to remove.
No unused ‘local’ cache to remove.

Also it doesn’t seem to be a solution, I have a global cache, and I’m working in a branch.

Hi David,

what I do to remove a file, e.g. data/table1.csv, that was previously added with dvc add is the following:

dvc unprotect data/table1.csv 
rm data/table1.csv.dvc

Then remove table1.csv from .gitignore (or rather data/.gitignore) so it will show up in git again, and commit the changes to git (the removed .dvc file and the .gitignore).

I hope this helps. As a guide-line what to track in git and dvc I recommend tracking files in git, where you want to track human-readable changes. Large data files, where you would not track the content of the changes, but only the status of change, maybe best tracked with dvc.
Best,
Maximilian

1 Like