Migration tutorials

is there any tutorial how to migrate from git lfs to dvc?

2 Likes

I don’t think so but that would be great to have! In case you want to make one.

I can think of 2 strategies:

i. Brute-force approach:
Remove Git-LFS (or even Git itself) completely and start over with DVC.

ii. Conversion

  1. Translate the Git-LFS pointer files into .dvc files (empty hash values)
  2. Download all the data from the Git-LFS server into the workspace
  3. dvc commit and git remove/add/commit everything.

I just went through the process of doing this. Luckily, the commit history of my repo wasn’t very deep, but I was able to migrate the file in each commit from lfs to dvc. The process went roughly like this:

  1. git log --oneline --follow -- path/to/file - Determine which commits touched the file
  2. Rebase/edit those commits, then for each commit:
  3. git rm -r --cached path/to/file - Remove the file from git, but keep the local copy
  4. dvc add path/to/file - Tell dvc to track the file
  5. git add path/to/file.dvc - Tell git to track the dvc file
  6. git commit --amend - Replace lfs with dvc in the previous commit (skip if resolving a conflict as part of the rebase)
  7. git rebase --continue - Repeat however many times necessary

Additionally, I needed to rebase/edit the commit that introduced lfs by removing the entry it added to .gitattributes, running dvc init and any other setup required/desired, and amending those changes to the commit.

Finally, I ran dvc push --all-commits and git push -f, and I was done!

Hey there,
I exactly couldn’t find a specific tutorial on migrating from Git LFS to DVC. However, you can follow these general steps:

  • Install DVC: Begin by installing DVC and setting up your repository.
  • Convert Git LFS Files: Download all large files tracked by Git LFS to your local repository.
  • Track Files with DVC: Add these files to DVC tracking using dvc add.
  • Push to Remote Storage: Push the tracked files to your chosen remote storage (e.g., AWS S3) using dvc push.
  • Update Workflow: Modify your workflow to use DVC commands instead of Git LFS for managing large files.

Remember to adjust these steps based on your specific needs and repository setup. If you encounter any issues, seek assistance from the DVC community or consult their documentation for more detailed guidance.

Hello,
As per me you can achieve this by first removing LFS tracking, committing the actual data files to your Git repository, and then setting up DVC to manage versioning and storage of these files.