Saving repo snapshot

Running experiment with the command dvc exp run <target> works fast and correct. It takes only such deps that are necessary for me.

In the documentation we can see the following line: “When called with no arguments, this is equivalent to dvc repro followed by dvc exp save.”.

I decided to check how exactly works dvc exp save and faced with the problem that I can’t repeat dvc exp run via dvc repro + dvc exp save. The problem that I faced with is the following:

For a given pipelines structure

$ tree
.
├── dir_a
│   └── dvc.yaml
└── dir_b
    └── dvc.yaml

dvc exp save dir_a/dvc.yaml tries to load data in dir_b/dvc.yaml which leads to an error:

$ dvc exp save ribs/pipelines/02_seg/dvc.yaml -v                                             
2024-07-22 20:50:21,067 DEBUG: v3.51.2 (pip), CPython 3.10.12 on Linux-6.8.0-35-generic-x86_64-with-glibc2.35                                               
2024-07-22 20:50:21,067 DEBUG: command: /home/ermolaev/projects/radml/venv/bin/dvc exp save ribs/pipelines/02_seg/dvc.yaml -v                               
2024-07-22 20:50:21,253 DEBUG: Saving workspace in /home/ermolaev/projects/radml/cvl-cvisionrad-ml                                                          
2024-07-22 20:50:23,847 DEBUG: Computed stage: 'infiltration/data/full_datasets/infltr_0124_covid_tn_merged/h5/train.dvc' md5: 'None'                       
2024-07-22 20:50:23,850 DEBUG: Saving information to 'infiltration/data/full_datasets/infltr_0124_covid_tn_merged/h5/train.dvc'.                            
2024-07-22 20:50:23,853 DEBUG: Computed stage: 'infiltration/data/full_datasets/infltr_0124_covid_tn_merged/h5/test.dvc' md5: 'None'                        
2024-07-22 20:50:23,854 DEBUG: Saving information to 'infiltration/data/full_datasets/infltr_0124_covid_tn_merged/h5/test.dvc'.                             
2024-07-22 20:50:23,858 DEBUG: Computed stage: 'infiltration/data/full_datasets/tbrc_1123_dh/raw.zip.dvc' md5: 'None'                                       
2024-07-22 20:50:23,859 DEBUG: Saving information to 'infiltration/data/full_datasets/tbrc_1123_dh/raw.zip.dvc'.                                            
2024-07-22 20:50:23,862 DEBUG: Computed stage: 'infiltration/data/full_datasets/srcd_1123_dh/raw.zip.dvc' md5: 'None'                                       
2024-07-22 20:50:23,863 DEBUG: Saving information to 'infiltration/data/full_datasets/srcd_1123_dh/raw.zip.dvc'.                                            
2024-07-22 20:50:23,866 DEBUG: Computed stage: 'infiltration/data/full_datasets/infltr_1223_tn/raw.tar.gz.dvc' md5: 'None'                                  
2024-07-22 20:50:23,867 DEBUG: Saving information to 'infiltration/data/full_datasets/infltr_1223_tn/raw.tar.gz.dvc'.                                       
2024-07-22 20:50:23,870 DEBUG: Computed stage: 'infiltration/data/full_datasets/infltr_1223_tn/h5/train.dvc' md5: 'None'
<...>
2024-07-22 20:50:24,092 DEBUG: Saving information to 'mae/data/full_datasets/MosMed_2023/part5/datasets/val.dvc'.
2024-07-22 20:50:24,367 ERROR: failed to save experiment - output 'mae/data/full_datasets/MosMed_2023/part1/raw/mosmed.tar' is not a file or directory
Traceback (most recent call last):
  File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/commands/experiments/save.py", line 16, in run
    ref = self.repo.experiments.save(
  File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/repo/experiments/__init__.py", line 359, in save
    return save(self.repo, *args, **kwargs)
  File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/repo/experiments/save.py", line 32, in save
    entry = repo.experiments.new(queue=queue, name=name, force=force)
  File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/repo/experiments/__init__.py", line 218, in new
    return queue.put(*args, **kwargs)
  File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/repo/experiments/queue/workspace.py", line 38, in put
    return self._stash_exp(*args, **kwargs)
  File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/repo/experiments/queue/base.py", line 325, in _stash_exp
    self._stash_commit_deps(*args, **kwargs)
  File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/repo/experiments/queue/base.py", line 384, in _stash_commit_deps
    self.repo.commit(
  File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/repo/__init__.py", line 58, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/repo/commit.py", line 65, in commit
    stage.save(allow_missing=allow_missing)
  File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/stage/__init__.py", line 495, in save
    self.save_outs(allow_missing=allow_missing)
  File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/stage/__init__.py", line 533, in save_outs
    out.save()
  File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/output.py", line 686, in save
    raise self.IsNotFileOrDirError(self)
dvc.output.OutputIsNotFileOrDirError: output 'mae/data/full_datasets/MosMed_2023/part1/raw/mosmed.tar' is not a file or directory

2024-07-22 20:50:24,374 DEBUG: Analytics is enabled.
2024-07-22 20:50:24,413 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpv1l4uqw_', '-v']
2024-07-22 20:50:24,418 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpv1l4uqw_', '-v'] with pid 25481

The error here happens because I have broken links. Some files were removed from cache, but there are some links left. There is no convenient tool to remove all broken links and I don’t mind of them and I don’t why dvc exp save tries to get them while I said to this command to work with the specific taget only. So, my question is, how to make dvc exp save work granularly?

UPD: I remove all broken symlinks and it’s worked, but I stil can’t understand why dvc exp run was ok with them while dvc exp save doesn’t

One more strange thing - after dvc exp save dvc.yaml I’ve gotten the following changes in other DVC files:

diff --git a/mae/data/full_datasets/MosMed_2023/part2/datasets/log.txt.dvc b/mae/data/full_datasets/MosMed_2023/part2/datasets/log.txt.dvc
index 4a4e6ada..642b0835 100644
--- a/mae/data/full_datasets/MosMed_2023/part2/datasets/log.txt.dvc
+++ b/mae/data/full_datasets/MosMed_2023/part2/datasets/log.txt.dvc
@@ -1,5 +1,4 @@
 outs:
 - md5: dbc8c9a80d7c0f640c69d2a9db931d1a
   size: 14613421
-  isexec: true
   path: log.txt

Why did this command made this file executable? It’s a .txt with generation log.