DVC on HPC with CML and large(r) number of experiments

Hi @RiCk, there’s not a documented workflow for this scenario today, but I’d be curious to discuss it with you.

There’s some support for doing a grid search in DVC: exp run. If you use this starting point, I could imagine a couple ways to run the search on CML:

  1. Put this into your CML workflow so that the workflow has something like dvc exp run -S 'train.min_split=2,8,64' -S 'train.n_est=100,200' --queue; dvc queue start. This might be limiting since it would require editing the workflow file with the search parameters.
  2. Queue experiments locally and push each one to trigger its own CML job. You could run dvc exp run -S 'train.min_split=2,8,64' -S 'train.n_est=100,200' --queue locally and then push each one to its own branch that would each trigger a CML job. This would currently require a bash script to parse all the queued experiment names and push them to branches, but if it works well for you, I’m sure we could make it easier to do within DVC.