Best practice for hyperparameters sweep


I wonder what is best practice for hyperparameter optimization, to do e.g. a parameter sweep, to capture both the models and the metrics. I’d like to try 10 different values for the number of clusters (and also a few other parameters), and then plot figures visualizing each clustering.

Do you just execute dvc run in a loop from a script? Then that script essentially becomes the “Makefile”?

Also, do you create a new branch for each experiment, or just a new folder with a unique name?



Hi @alex,
we have some GitHub issues related to this subject. here is one related to your question.
Also here:

Would you mind sharing your use case and describing how would you expect such workflow to behave? The more feedback we gather, the faster we can develop some initial draft on how to proceed with this subject.


Thanks for the links, they address exactly the questions I had! Unfortunately, no solutions yet :slight_smile:

I’ve used a number of ad-hoc solutions, and I was not satisfied with any of them.

For every output file I would like to be able to find exactly how the file was produced, i.e. version of code (git commit), hyperparameters (e.g. command-line parameters to the executables), and data (I have several “datasets” that go through identical pipelines). I experimented with long file- and directory-names (called “experiment-as-a-folder” in issue #2532), and included values of all parameters in a filename (e.g. summaryFigures/dataA_numclusters5_epochs10/). Much of this information is already in the DVC file, but the workflow is not “clicking” for me yet.

I think I’d like to separate a generic workflow/pipeline/DAG definition from each of its “instances” (with the hashes and all).

I will watch those two issues on GitHub.

Thanks again!

1 Like

Hi Alex!

This is good insight! It would be great if you can share this in the issue tracker (under any of the issues @Paffciu mentioned).

1 Like