Cml / GitHub Actions / aws

Guys,

I’m trying to run cml-runner in aws via GitHub actions.
The script below opens an instance that we can see in the aws console.
The result can be summarized as:

Since we don’t have any information about what’s going on, guessing isn’t so easy. Again, a suggestion for resolving it is appreciated.

Note : This is related to Question 812


name: Test
on: [push]
jobs:
  deploy:
    runs-on: [ubuntu-latest]
    steps:
      - uses: actions/checkout@v2
      - uses: iterative/setup-cml@v1
      - run: |
          cml-runner \
          --cloud=aws \
          --cloud-region=eu-west-3 \
          --cloud-type=t2.micro \
          --labels=cml-runner
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
  train:
    needs: deploy
    runs-on: [self-hosted, cml-runner]
    steps:
      - uses: actions/checkout@v2
      - uses: iterative/setup-dvc@v1
      - uses: iterative/setup-cml@v1
      - run: |

          echo "Hello!"
        
          
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Should be

runs-on: [self-hosted, cml-runner]

(updated in the OP)

I just tried to create a local g3.4xlarge ec2 instance and it looks like aws is rejecting it. From my understanding of what we have on the aws console, this would be allowed. If we check the quota details, we have:

If we run this.

on: [push]
jobs:
  deploy:
    runs-on: [ubuntu-latest]
    steps:
      - uses: actions/checkout@v2
      - uses: iterative/setup-cml@v1
      - run: |
          cml-runner \
          --cloud=aws \
          --cloud-region=eu-west-1 \
          --cloud-type=g3.4xlarge \
          --cloud-hdd-size 64 \
          --cloud-spot
          --labels=cml-runner
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
  train:
    needs: deploy
    runs-on: [self-hosted, cml-runner]
    container: 
      image: docker://dvcorg/cml
      options: --gpus all
    steps:
      - uses: actions/checkout@v2
      - uses: iterative/setup-dvc@v1
      - uses: iterative/setup-cml@v1
      - uses: actions/setup-python@v2
        with:
          python-version: '3.8'
      - run: |
          apt-get update -y
        
          pip install -r requirements.txt

We have the following error:

Error: Failed creating the machine: Not able to decode: MaxSpotInstanceCountExceeded: Max spot instance count exceeded
│ 	status code: 400, request id: eeb4b582-1d03-463e-b3c9-409e279cc9ef
│ 
│   with iterative_cml_runner.runner,
│   on main.tf line 14, in resource "iterative_cml_runner" "runner":
│   14: resource "iterative_cml_runner" "runner" {

Is this an aws configuration issue or something with cml-runner?

Cheers

It looks like this as an AWS issue. Are you able to manually create an instance from the cloud console?

Hi @0x2b3bfa0

Sorry for the delay, but I’m only able to allocate time for this task at (my) night.
Quota was set to 4 vCPUs and g3.4xlarge requires 16. Changed it to g3s.xlarge and it worked.

As always, thank you for your support!

1 Like

You’re welcome, @mcosta!