Odd merge behavior

I have a git-tracked folder, and dvc-tracked data folder within that.
dvc has been installed, so that git actions trigger the dvc hooks.
I created a branch, did some work, created two folders with files within data. I used dvc add on the data folder. Checked all that in and pushed.
Did a git switch to main.
Did a git merge branch.
The git merge was fine, no conflicts.
The data folder does not contain the files from the branch, but contains the two folders.

Looking at the contents of data.dvc, there are two more files shown in nfiles, rather than the 240 files and 2 folders.

What am I missing?
Thanks

Here’s a demo transcript. I’m only listing one of the 682k+ files. I’ve already merged, so with further attempts to merge git says everything is up to date.

(ng) 
FCD on ξ‚  visualizations via πŸ…’ ng 
❯ ll data/eeg_images/classified_view_of_full_montage/hts-hts_arthur_part1_raw_214350.png
-rw-r--r-- 1 john john 269K Aug  1 09:15 data/eeg_images/classified_view_of_full_montage/hts-hts_arthur_part1_raw_214350.png
(ng) 
FCD on ξ‚  visualizations via πŸ…’ ng 
❯ git switch main          
Switched to branch 'main'
Your branch is ahead of 'origin/main' by 1 commit.
  (use "git push" to publish your local commits)
M       data/                                                                                                                                 
(ng) 
FCD on ξ‚  main [⇑] via πŸ…’ ng took 44s 
❯ ll data/eeg_images/classified_view_of_full_montage
total 0
drwxr-xr-x 1 john john   0 Aug  1 09:22 .
drwxr-xr-x 1 john john 118 Jul 31 17:47 ..
(ng) 
FCD on ξ‚  main [⇑] via πŸ…’ ng 
❯ dvc checkout
(ng) 
FCD on ξ‚  main [⇑] via πŸ…’ ng took 26s 
❯ ll data/eeg_images/classified_view_of_full_montage
total 0
drwxr-xr-x 1 john john   0 Aug  1 09:22 .
drwxr-xr-x 1 john john 118 Jul 31 17:47 ..
(ng) 
FCD on ξ‚  main [⇑] via πŸ…’ ng 
❯ cat data.dvc
outs:
- md5: eceee4a3233ba7469e55c3b1331f7267.dir
  size: 113370032087
  nfiles: 682398
  path: data
(ng) 
FCD on ξ‚  main [⇑] via πŸ…’ ng 
❯ git switch visualizations
Switched to branch 'visualizations'
Your branch is up to date with 'origin/visualizations'.
M       data/                                                                                                                                 
(ng) 
FCD on ξ‚  visualizations via πŸ…’ ng took 43s 
❯ ll data/eeg_images/classified_view_of_full_montage/hts-hts_arthur_part1_raw_214350.png
-rw-r--r-- 1 john john 269K Aug  1 09:25 data/eeg_images/classified_view_of_full_montage/hts-hts_arthur_part1_raw_214350.png
(ng) 
FCD on ξ‚  visualizations via πŸ…’ ng 
❯ dvc status  
Data and pipelines are up to date.                                                                                                            
(ng) 
FCD on ξ‚  visualizations via πŸ…’ ng took 19s 
❯ cat data.dvc
outs:
- md5: bf79340da05889953b1f0c4152babe94.dir
  size: 113430468835
  nfiles: 682640
  path: data
(ng) 
FCD on ξ‚  visualizations via πŸ…’ ng 

Ran dvc doctor and saw that my dvc was out of date

You are using dvc version 2.58.2; however, version 3.10.1 is available.

Updated and that might have fixed the issue.

(ng) 
FCD on ξ‚  visualizations via πŸ…’ ng took 4s 
❯ ll data/eeg_images/classified_view_of_full_montage/hts-hts_arthur_part1_raw_214350.png
-rw-r--r-- 1 john john 269K Aug  1 09:25 data/eeg_images/classified_view_of_full_montage/hts-hts_arthur_part1_raw_214350.png
(ng) 
FCD on ξ‚  visualizations via πŸ…’ ng 
❯ git switch main          
Switched to branch 'main'
Your branch is ahead of 'origin/main' by 1 commit.
  (use "git push" to publish your local commits)
Computing md5 for a large file '/home/john/Work/company/FCD/data/inception.mdl/variables/variables.data-00000-of-00001'. This is only done once.
ERROR: Can`t remove the following unsaved files without confirmation. Use `--force` to force.                                                 
/home/john/Work/company/FCD/data/inception.mdl/saved_model.pb                                                                               
(ng) 
FCD on ξ‚  main [⇑] via πŸ…’ ng took 9m50s 
❯ ll data/eeg_images/classified_view_of_full_montage/hts-hts_arthur_part1_raw_214350.png
-rw-r--r-- 1 john john 269K Aug  1 09:25 data/eeg_images/classified_view_of_full_montage/hts-hts_arthur_part1_raw_214350.png
(ng) 
1 Like

I could be wrong but with the hooks installed, when you do git switch main a dvc checkout automatically happens corresponding to the unmerged .dvc file . When you do git merge your .dvc file updates with the file count but the actual data does not. A dvc checkout after git merge should fix the issue since it will checkout the version of the data corresponding to the merged .dvc file.

1 Like

Thanks for the reply @farhanhubble .
The new version of dvc may have fixed it, but I’m not sure. I haven’t merged a branch since then (I don’t think).
Will keep your tips in mind. Thanks!