Hi @yisaienkov !
There is not an out-of-the-box integration between DVC Pipelines and Hydra, yet. We are currently exploring what improvements could be made in that regard. Don’t hesitate on commenting your opinions or requests on that discussion or new issues.
As best practices that can be recommended now, it would depend on how you are using hydra within the stages, any additional info about your pipeline would be helpful. Regardless, here are some thoughts, assuming a basic Hydra app like the one used in the tutorial:
Regarding “hydra outputs” I would override hydra.output_subdir
to None and use DVC outputs
` as you would usually do without Hydra. You don’t really need to use the subfolder date-based versioning that Hydra provides as DVC+Git will do the proper versioning for you.
Regarding “hydra config”, I would suggest tracking the parent config directory as a DVC dependency
.
Putting all together for this sample application the dvc.yaml
would be:
satages:
my_app:
cmd: python my_app.py ${hydra_args}
deps:
- conf
outs:
- my_app_output
And params.yaml
:
hydra_args: "+db=postgresql db.timeout=20"
You would just run this with dvc repro
and modify params.yaml
to pass other args to hydra.
As said in the beginning, these are just some workarounds for a very simple app. If you are wiling to share more details, we can discuss your use case and see what else can be done.