I have the following pipeline:
stages:
generate_h5:
foreach: ${datasets}
do:
cmd: >-
unzip -q -n ${DS_ROOT}/${item.name}.zip -d ${DS_ROOT} &&
python ds_gen/gen_h5.py
--input ${DS_ROOT}/${item.name}
--ds-type ${item.type}
--belief-cancer
--out ${PPL_DIR}/h5/${item.name}
--log-file ${PPL_DIR}/logs/${item.name}.txt
--log-level-for-file 'TRACE'
--force
--lungs-bs ${LUNGS_BS}
--cancer-bs ${CANCER_BS}
deps:
- ds_gen/gen_h5.py
- package_ext/radml_infiltr_ext/raw_ds/__covid_ds.py
- package_ext/radml_infiltr_ext/raw_ds/__infiltration_ds.py
- package_ext/radml_infiltr_ext/raw_ds/__non_infiltration_ds.py
- ${DS_ROOT}/${item.name}.zip
outs:
- ${PPL_DIR}/h5/${item.name}
- ${PPL_DIR}/logs/${item.name}.txt
wdir: ${WDIR}
extract_props:
foreach: ${datasets}
do:
cmd: >-
python ds_gen/extract_props.py
--input ${PPL_DIR}/h5/${item.name}
--out ${PPL_DIR}/props/${item.name}.h5
deps:
- ds_gen/extract_props.py
- package_ext/radml_infiltr_ext/__prop_extractors.py
- ${PPL_DIR}/h5/${item.name}
outs:
- ${PPL_DIR}/props/${item.name}.h5
wdir: ${WDIR}
analyze_props:
foreach: ${datasets}
do:
cmd: >-
python ds_gen/analyze_props.py
${PPL_DIR}/props/${item.name}.h5
--exp ${exp}
deps:
- ds_gen/analyze_props.py
- ${PPL_DIR}/props/${item.name}.h5
wdir: ${WDIR}
This pipeline has been finished recently and everything is ok:
$ dvc status dvc.yaml -v
2024-11-23 14:43:05,264 DEBUG: v3.55.2 (pip), CPython 3.10.12 on Linux-6.8.0-35-generic-x86_64-with-glibc2.35
2024-11-23 14:43:05,264 DEBUG: command: /home/ermolaev/projects/radml_backup/venv/bin/dvc status dvc.yaml -v
2024-11-23 14:43:06,574 DEBUG: built tree 'object 49dfe42a3b62b2b0b224ec01af6012df.dir'
2024-11-23 14:43:06,589 DEBUG: built tree 'object 9d1d302acaea3c6bda4674aa4a266025.dir'
2024-11-23 14:43:06,602 DEBUG: built tree 'object 3a9fdb2b2e42ab255c819046aebeb75d.dir'
2024-11-23 14:43:06,631 DEBUG: built tree 'object 06f7eb33ed9d66a7e59b6b845b4443f2.dir'
2024-11-23 14:43:06,678 DEBUG: built tree 'object e4bee7723db87c4f844766f892100b38.dir'
2024-11-23 14:43:06,720 DEBUG: built tree 'object 1133884ea3711b5f8327e4926f0e3f79.dir'
2024-11-23 14:43:06,751 DEBUG: built tree 'object 392c1a2949b0432b54eca066bb7d246d.dir'
2024-11-23 14:43:06,763 DEBUG: built tree 'object 1f7a4caad4547d5c08d100a374a4e527.dir'
2024-11-23 14:43:06,768 DEBUG: built tree 'object 49dfe42a3b62b2b0b224ec01af6012df.dir'
2024-11-23 14:43:06,773 DEBUG: built tree 'object 9d1d302acaea3c6bda4674aa4a266025.dir'
2024-11-23 14:43:06,778 DEBUG: built tree 'object 3a9fdb2b2e42ab255c819046aebeb75d.dir'
2024-11-23 14:43:06,783 DEBUG: built tree 'object 06f7eb33ed9d66a7e59b6b845b4443f2.dir'
2024-11-23 14:43:06,788 DEBUG: built tree 'object e4bee7723db87c4f844766f892100b38.dir'
2024-11-23 14:43:06,793 DEBUG: built tree 'object 1133884ea3711b5f8327e4926f0e3f79.dir'
2024-11-23 14:43:06,798 DEBUG: built tree 'object 392c1a2949b0432b54eca066bb7d246d.dir'
2024-11-23 14:43:06,803 DEBUG: built tree 'object 1f7a4caad4547d5c08d100a374a4e527.dir'
2024-11-23 14:43:07,546 DEBUG: Lockfile '../02_seg/dvc.lock' needs to be updated.
2024-11-23 14:43:08,264 DEBUG: Lockfile for '../../../ribs/pipelines/06_cage_sgm/dvc.yaml' not found
2024-11-23 14:43:09,289 DEBUG: Lockfile for '../../../ribs/pipelines/05_unf_dt/dvc.yaml' not found
2024-11-23 14:43:09,572 DEBUG: Lockfile for '../../../emphysema/pipelines/01_binary_qa/dvc.yaml' not found
2024-11-23 14:43:12,624 DEBUG: Lockfile for '../../../pcfat/pipelines/02_seg/dvc.yaml' not found
Data and pipelines are up to date.
2024-11-23 14:43:15,857 DEBUG: Analytics is enabled.
2024-11-23 14:43:16,067 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpt1nyql2h', '-v']
2024-11-23 14:43:16,102 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpt1nyql2h', '-v'] with pid 2955525
I wanted to freeze generation stage, but command has failed with strange error:
$ dvc freeze dvc.yaml:generate_h5 -v
2024-11-23 14:43:53,999 DEBUG: v3.55.2 (pip), CPython 3.10.12 on Linux-6.8.0-35-generic-x86_64-with-glibc2.35
2024-11-23 14:43:54,001 DEBUG: command: /home/ermolaev/projects/radml_backup/venv/bin/dvc freeze dvc.yaml:generate_h5 -v
2024-11-23 14:43:55,031 ERROR: failed to freeze 'dvc.yaml:generate_h5' - Stage 'generate_h5' not found inside 'dvc.yaml' file
Traceback (most recent call last):
File "/home/ermolaev/projects/radml_backup/venv/lib/python3.10/site-packages/dvc/stage/loader.py", line 134, in __getitem__
resolved_data = self.resolver.resolve_one(name)
File "/home/ermolaev/projects/radml_backup/venv/lib/python3.10/site-packages/dvc/parsing/__init__.py", line 198, in resolve_one
raise EntryNotFound(f"Could not find '{name}'")
dvc.parsing.EntryNotFound: Could not find 'generate_h5'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ermolaev/projects/radml_backup/venv/lib/python3.10/site-packages/dvc/commands/freeze.py", line 15, in _run
func(target)
File "/home/ermolaev/projects/radml_backup/venv/lib/python3.10/site-packages/dvc/repo/freeze.py", line 19, in freeze
return _set(repo, target, True)
File "/home/ermolaev/projects/radml_backup/venv/lib/python3.10/site-packages/dvc/repo/__init__.py", line 58, in wrapper
return f(repo, *args, **kwargs)
File "/home/ermolaev/projects/radml_backup/venv/lib/python3.10/site-packages/dvc/repo/freeze.py", line 11, in _set
stage = repo.stage.get_target(target)
File "/home/ermolaev/projects/radml_backup/venv/lib/python3.10/site-packages/dvc/repo/stage.py", line 217, in get_target
return self.load_one(path=path, name=name)
File "/home/ermolaev/projects/radml_backup/venv/lib/python3.10/site-packages/dvc/repo/stage.py", line 301, in load_one
return stages[name]
File "/home/ermolaev/projects/radml_backup/venv/lib/python3.10/site-packages/dvc/stage/loader.py", line 136, in __getitem__
raise StageNotFound(self.dvcfile, name) # noqa: B904
dvc.stage.exceptions.StageNotFound: Stage 'generate_h5' not found inside 'dvc.yaml' file
2024-11-23 14:43:55,039 DEBUG: Analytics is enabled.
2024-11-23 14:43:55,198 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpfkdoyszp', '-v']
2024-11-23 14:43:55,220 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpfkdoyszp', '-v'] with pid 2955573
I suspected that stage must have the same name as .lock file contains and I added print of all stages in dvc/repo/stage.py", line 301, in load_one
and stages list is the following: ['generate_h5@0', 'generate_h5@1', 'generate_h5@2', 'generate_h5@3', 'generate_h5@4', 'generate_h5@5', 'generate_h5@6', 'generate_h5@7', 'extract_props@0', 'extract_props@1', 'extract_props@2', 'extract_props@3', 'extract_props@4', 'extract_props@5 ', 'extract_props@6', 'extract_props@7', 'analyze_props@0', 'analyze_props@1', 'analyze_props@2', 'analyze_props@3', 'analyze_props@4', 'analyze_props@5', 'analyze_props@6', 'analyze_props@7']
.
I decided to try to freeze generate_h5@0
, but it still gives an error, but the other:
$ dvc freeze dvc.yaml:generate_h5@0 -v
2024-11-23 14:45:31,532 DEBUG: v3.55.2 (pip), CPython 3.10.12 on Linux-6.8.0-35-generic-x86_64-with-glibc2.35
2024-11-23 14:45:31,532 DEBUG: command: /home/ermolaev/projects/radml_backup/venv/bin/dvc freeze dvc.yaml:generate_h5@0 -v
['generate_h5@0', 'generate_h5@1', 'generate_h5@2', 'generate_h5@3', 'generate_h5@4', 'generate_h5@5', 'generate_h5@6', 'generate_h5@7', 'extract_props@0', 'extract_props@1', 'extract_props@2', 'extract_props@3', 'extract_props@4', 'extract_props@5', 'extract_props@6', 'extract_props@7', 'analyze_props@0', 'analyze_props@1', 'analyze_props@2', 'analyze_props@3', 'analyze_props@4', 'analyze_props@5', 'analyze_props@6', 'analyze_props@7']
2024-11-23 14:45:31,833 ERROR: failed to freeze 'dvc.yaml:generate_h5@0' - cannot dump a parametrized stage: 'generate_h5@0'
Traceback (most recent call last):
File "/home/ermolaev/projects/radml_backup/venv/lib/python3.10/site-packages/dvc/commands/freeze.py", line 15, in _run
func(target)
File "/home/ermolaev/projects/radml_backup/venv/lib/python3.10/site-packages/dvc/repo/freeze.py", line 19, in freeze
return _set(repo, target, True)
File "/home/ermolaev/projects/radml_backup/venv/lib/python3.10/site-packages/dvc/repo/__init__.py", line 58, in wrapper
return f(repo, *args, **kwargs)
File "/home/ermolaev/projects/radml_backup/venv/lib/python3.10/site-packages/dvc/repo/freeze.py", line 13, in _set
stage.dump(update_lock=False)
File "/home/ermolaev/projects/radml_backup/venv/lib/python3.10/site-packages/dvc/stage/__init__.py", line 787, in dump
self.dvcfile.dump(self, **kwargs)
File "/home/ermolaev/projects/radml_backup/venv/lib/python3.10/site-packages/dvc/dvcfile.py", line 238, in dump
self._dump_pipeline_file(stage)
File "/home/ermolaev/projects/radml_backup/venv/lib/python3.10/site-packages/dvc/dvcfile.py", line 272, in _dump_pipeline_file
self._check_if_parametrized(stage)
File "/home/ermolaev/projects/radml_backup/venv/lib/python3.10/site-packages/dvc/dvcfile.py", line 269, in _check_if_parametrized
raise ParametrizedDumpError(f"cannot {action} a parametrized {stage}")
dvc.dvcfile.ParametrizedDumpError: cannot dump a parametrized stage: 'generate_h5@0'
2024-11-23 14:45:31,836 DEBUG: Analytics is enabled.
2024-11-23 14:45:31,869 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpa1qc4446', '-v']
2024-11-23 14:45:31,879 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpa1qc4446', '-v'] with pid 2955729
So, how it works? Are template stages not supported? Why and when such support will be implemented in a such case?