I am trying out dvc for one of my ML pipeline poc. I see dvc add command to keep track of changes in data files. How do i revert back to an older version of data files using dvc cli?


After changing the file content you can return to any version of your data file using a combination of two command:

  1. git checkout COMMIT which reverts all code and all dvc metafiles (*.dvc)
  2. then dvc checkout which gets all the corresponded data files from your cache.

Get the previous commit example:

$ git checkout HEAD~1
$ dvc checkout

Sometimes you can get the “detached HEAD” issue. To avoid the issue please create a branch when you jump to an old commit:

$ git checkout HEAD~1 -b original_dataset
$ dvc checkout

PS: I’d recommend not to modify any data file which was added by dvc add file.txt. Insted, please remove the files by dvc remove file.txt.dvc and then add the file again.

