CML self hosted runners on demand with GPUs

1 Like

Thank you for this tutorial.
But i faced a problem while running the runner.
The docker run command needs an additional command to be added in order to run the runner properly.
The whole example is shown below

docker run --name myrunner -d --gpus all \
    -e RUNNER_IDLE_TIMEOUT=1800 \
    -e RUNNER_LABELS=cml,gpu \
    -e RUNNER_REPO=$my_repo_url \
    -e repo_token=$my_repo_token \
    iterativeai/cml:0-dvc2-base1-gpu runner

1 Like

Thanks @bmabir17 for pointing this out (blog/cml: check for GPU · Issue #3408 · iterative/dvc.org · GitHub)!

Thanks for the blog
I have a small question. Can we also create a self hosted runner with CPU machine?

Moreover, when I tried to run the pipeline via gitlab ci, I could not install required python dependencies (pip install -r requirements). I got a warning message followed by an error.

WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.python.org', port=443): Read timed out. (read timeout=15)")': /simple/joblib/

ERROR: No matching distribution found for joblib==1.2.0

I thought it was because of a network issue but it was not. I also tried to recreate the docker image but still, the same problem persisted. Can somebody help me how can I fix this?

Can we also create a self hosted runner with CPU machine?

Of course! Are you using cml runner in conjunction with the --cloud option?

Moreover, when I tried to run the pipeline via GitLab CI/CD, I could not install required python dependencies […]

Sounds very similar to iterative/cml#1324 (comment); can you log in to the self-hosted runner and try running the pip install command manually? It looks like a connectivity issue. :thinking:

@0x2b3bfa0 Thanks for the reply

I am using my ssh cloud instance as a self-hosted runner. I have installed the CML runner on my machine by following this guide (Install as a Package) and executed the following command:

cml runner launch \
  --repo="$REPOSITORY_URL" \
  --token="$PERSONAL_ACCESS_TOKEN" \
  --labels="cml-gpu" \
  --idle-timeout="never"  # or "3min", "1h", etc..

When I executed the above command, it successfully ran the CML runner. However, it failed (with above mentioned warning messages) to install the requirements.txt or pypi packages that were given via GitLab CI.

yes, I tried to install the packages manually it worked. And also, checked the internet connection with ping google.com. It is working.

In addition, I tried docker approach to run the CML DVC container as mentioned in above blog.

sudo docker run --name RUNNER_NAME \
-e RUNNER_IDLE_TIMEOUT=100h \
-e RUNNER_LABELS=cml, cpu \
-e RUNNER_REPO= REPO_URL \
-e repo_token= REPO_TOKEN \
dvcorg/cml-py3:latest

But, it gave me an error

{"level":"error","status":"terminated"}
Error: REPO_TOKEN does not have enough permissions to access workflow API
    at CML.repo_token_check (/cml/src/cml.js:236:13)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
        Destroy scheduled: 30 seconds remaining.
        No TF resource found

I tried to resolve this issue by creating a new access token with all the required permissions, but nothing seems to work out.