Deleting files from a directory data set


#1

Hi,

I’m currently tracking with dvc a directory containing the set of files of my data base. I added this on a single “dvc add” call, and a single entry appears on the “.dvc” file. I would like to erase some files from the directory without having to remove and re-add the whole directory, but I can’t find information on how to do this. Is there a safe way to do it while keeping the soon-to-be-deleted files on history so I can get them back if I move to an older commit?

Thanks


#2

Hi @elvira,
Sure. You can just remove file that you don’t need, add this directory one more time (another dvc file will be generated - make sure dvc file name matches the old one) and commit the changed dvc file:

$ ls dir
users.csv       clicks.tsv          apps.csv
$ dvc add dir
$ git add dir.dvc .gitignore
$ git commit -m 'Store clickstream data'
# Changing dir:
$ rm dir/aps.csv
$ dvc add dir
$ git add dir.dvc
$ git commit -m 'Remove application data file'

Checkout should work just fine:

$ ls dir
users.csv       clicks.tsv
$ git checkout HEAD^    # create a branch if you going to work there: -b step_back
Note: checking out 'HEAD^'.

You are in 'detached HEAD' state.
...
$ dvc checkout
hecking out '{'scheme': 'local', 'path': '/Users/dmitry/src/dvc_tests/dir'}' with cache '95414424c3787480d7a3083352fa1136.dir'.
Linking directory 'dir'.
$ ls dir
users.csv       clicks.tsv          apps.csv    # <-- apps.csv file is back!

Please let me know if you have any other questions.