I have found a way to somewhat achieve what I want, but it’s not fully working. But maybe can be worth sharing with you, as a starting point.
Regarding the naming of my stages, in the params.yaml
file, instead of having a list of each stage, I could have a dict like this:
create_dataset_list:
model_1:
vg: league-of-legends
stat: champions
script: standard_dataset_creation.py
dataset_yaml: trainset.yaml
folder_images: trainset_images
training_tweaks: create_trainset.py
output: trainset.h5
This seems to do the trick, as it appears in the output of dvc status
.
As for splitting the params.yaml file and grouping the the paramtrizations of the stages by model, I found this way. In dvc.yaml
I can add:
vars:
- model_1_params.yaml
So that parametrization can be searched in this file in addition to the default params.yaml. The idea here would be to add as many files as I have models. While it seems to read the file, I am now running into a problem. If I format my model_1_params.yaml
file as follows, it will throw the following error cannot redefine 'create_dataset_list' from 'model_1_params.yaml' as it already exists in 'params.yaml'
.
# model_1_params.yaml
create_dataset_list:
model_1:
vg: league-of-legends
stat: champions
script: standard_dataset_creation.py
dataset_yaml: trainset.yaml
folder_images: trainset_images
training_tweaks: create_trainset.py
output: trainset.h5
I feel I am close to having a working solution. Do you have any idea?