CML+ DVC / GitHub Actions / hyper parameter tuning

Hi folks

Looking for advice.

We’re trying to use the actions triggered manually by GitHub to train models and, of course, do hyperparameter tuning. We want to use CML to run in the cloud + DVC to track everything.
In GitHub actions, you can define inputs and then GitHub provides the UI to specify them.
We want to keep the structure of the DVC file as much as possible with hyperparameters specified in a yaml file. Is there an easy way to create tokens to map GitHub action entries, update param.yaml accordingly, commit the change, and use cml-runner to run a pipeline?
A simple solution is to send hyperparameters as arguments to the python script, but we will miss some interesting DVC tracking functionality.

Cheers

Hi @mcosta

In GitHub actions, you can define inputs and then GitHub provides the UI to specify them.

Are you trying to build your own Github Action?

The most effective ways that we have seen so far are:

  • just commit the param.yaml to the repo so you ensure all to be tracked by git
  • we have seen also the bot structure where you setup a “bot alike” using the issue_comment event to use your comments as bot interface, getting the parameters from there updating the params. This has the inconvenience of not being really tracking your parameters in git.

Maybe if you can share a bit what you need to build we can be more helpful.

1 Like

@davidgortega

The idea is to use the custom action triggered by workflow_dispatch (manual) and use the GitHub UI to specify hyperparameters.
We have params.yaml in the repo.
The idea can be summarized in the image below. Imagine that we just want to change the learning rate.
It’d be good to change params.yaml based on this input, commit, and then use CML to run in cloud (Azure, AWS, etc).
I’m not sure, though, if GitHub will allow us to do this.

Are you planning to use the workflow_dispatch event with custom inputs to specify the hyperparameters for each run? You could do something like this:

on:
  workflow_dispatch:
    inputs:
      epochs:
        required: true
        description: Number of epochs
        default: 70
      layers:
        required: true
        description: Number of layers
        default: 9
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: iterative/setup-cml@v1
      - run: >-
          cml-runner
          --cloud=aws
          --cloud-region=us-west
          --cloud-type=t2.micro
          --labels=cloud
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
  train:
    needs: deploy
    runs-on: [self-hosted, cloud]
    steps:
      - uses: actions/checkout@v2
      - uses: iterative/setup-dvc@v1
      - uses: iterative/setup-cml@v1
      - run: |
          echo "$PARAMETERS" > params.json

          dvc ···
          
          cml-send-comment /dev/stdin <<-END
            ## Hyperparameters
            
            $(dvc params diff --show-md --targets params.json HEAD)
            
            _You can merge $(cml-pr params.json) to save the new parameters._
          END
        env:
          PARAMETERS: ${{ toJSON(github.event.inputs) }}
          GITHUB_TOKEN: ${{ github.token }}
1 Like

@0x2b3bfa0

This looks cool!
Will try it in a couple of hours.

Tkx!

1 Like