Are there any tools or guides for rewriting a git repo containing binaries in the git history to dvc based storage?
I can delete the existing binary files and use dvc and S3 to store them but would like to be able to rewrite the git history.
Are there any tools or guides for rewriting a git repo containing binaries in the git history to dvc based storage?
I can delete the existing binary files and use dvc and S3 to store them but would like to be able to rewrite the git history.
For myself, I would use git rebase -i <references>
and manually modify each of the commits.
You might want to take a look at the tool like this - BFG Repo-Cleaner by rtyley .Or this link covers a few other options I think - version control - How to remove/delete a large file from commit history in the Git repository? - Stack Overflow
Thanks for your suggestions. BFG has been superseded by git filter-repo.
I’ve used BFG and it was a pain compared to git filter-repo, so I strongly suggest that everyone use git filter-repo.
git filter-repo has the ability to be extended so there is probably a way to use it to find large blobs matching a file path or file pattern and execute the git commands to remove the blob, add them to dvc, and move on. But I haven’t done that yet.