Get older version of data files


I am trying out dvc for one of my ML pipeline poc. I see dvc add command to keep track of changes in data files. How do i revert back to an older version of data files using dvc cli?


After changing the file content you can return to any version of your data file using a combination of two command:

  1. git checkout COMMIT which reverts all code and all dvc metafiles (*.dvc)
  2. then dvc checkout which gets all the corresponded data files from your cache.

Get the previous commit example:

$ git checkout HEAD~1
$ dvc checkout

Sometimes you can get the “detached HEAD” issue. To avoid the issue please create a branch when you jump to an old commit:

$ git checkout HEAD~1 -b original_dataset
$ dvc checkout

PS: I’d recommend not to modify any data file which was added by dvc add file.txt. Insted, please remove the files by dvc remove file.txt.dvc and then add the file again.

1 Like

Hm, this is not working. I did

git checkout HEAD~1


dvc checkout

But the local files still contain the last edits.

Hey @rmbzmb,
This thread is quite old, but the suggestion is still valid.

Keep in mind that in order to restore previous versions, you will need to checkout the git revision that contains the file version you’re interested in, you can find out the revisions in which the .dvc files have changed by doing git log <yourfilename.dvc>.

Please create a new issue with more details about your setup if you cannot get this to work.

Just in case someone still looking for it. By parsing the dvc file in the checkout works for me:

git checkout <old_commit> <file_name>.dvc