Thanks for the links, they address exactly the questions I had! Unfortunately, no solutions yet
I’ve used a number of ad-hoc solutions, and I was not satisfied with any of them.
For every output file I would like to be able to find exactly how the file was produced, i.e. version of code (git commit), hyperparameters (e.g. command-line parameters to the executables), and data (I have several “datasets” that go through identical pipelines). I experimented with long file- and directory-names (called “experiment-as-a-folder” in issue #2532), and included values of all parameters in a filename (e.g. summaryFigures/dataA_numclusters5_epochs10/). Much of this information is already in the DVC file, but the workflow is not “clicking” for me yet.
I think I’d like to separate a generic workflow/pipeline/DAG definition from each of its “instances” (with the hashes and all).
I will watch those two issues on GitHub.
Thanks again!
Alex