I’m a bit of a noob, and I’m trying to understand how dvc install
changes the steps introduced in Dr. O’Brien’s introductory videos.
Question 1
The final example on this page of the docs shows a source file stored in a dvc repository, thus when changed, dictates running repro to create the data, then everything is somehow up to date.
I would think that:
- source would be saved to git
- running modified source with repro would result in a changed dataset that would need to be added, committed, and pushed.
Question 2
When I’m working on my project and I run git commit, dvc status is run and (let’s assume) shows me that some of my dvc-managed files have changed. Am I correct in these commands (from memory):
# add alll modified files in data folder
dvc add data
dvc push
# make sure files changed by dvc makes sense
git status
git add -a -m “dvc modified”
git push
This seems pretty convoluted, so if there’s a better way, would love to know. If not, not looking a gift horse in the mouth
Question 3
Okay, a bonus - what about merging, say, a branch into main? Say, the signatures in the *.dvc files don’t match. Is it just a matter of always selecting the branch data over main in the conflicted .dvc files (unless something has gone really wrong)?
Thanks!