This example in the docs seems a bit odd, and a basic question

I’m a bit of a noob, and I’m trying to understand how dvc install changes the steps introduced in Dr. O’Brien’s introductory videos.

Question 1

The final example on this page of the docs shows a source file stored in a dvc repository, thus when changed, dictates running repro to create the data, then everything is somehow up to date.

I would think that:

  • source would be saved to git
  • running modified source with repro would result in a changed dataset that would need to be added, committed, and pushed.

Question 2

When I’m working on my project and I run git commit, dvc status is run and (let’s assume) shows me that some of my dvc-managed files have changed. Am I correct in these commands (from memory):

# add alll modified files in data folder
dvc add data
dvc push
# make sure files changed by dvc makes sense
git status 
git add -a -m “dvc modified”
git push

This seems pretty convoluted, so if there’s a better way, would love to know. If not, not looking a gift horse in the mouth :slightly_smiling_face:

Question 3

Okay, a bonus - what about merging, say, a branch into main? Say, the signatures in the *.dvc files don’t match. Is it just a matter of always selecting the branch data over main in the conflicted .dvc files (unless something has gone really wrong)?

Thanks!

1 Like

Cont of the discussion is here: This example in the docs seems a bit odd, and a basic question · iterative/dvc · Discussion #9263 · GitHub