DVC + Github Actions + GCP Storage

I’m trying triggering a pipeline to run DVC and download the data from GCP Storage but the log of GitHub Actions returns the following error:

ERROR: unexpected error - Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object., 401

I think this happens due to giving the right permissions to the Service Account but the one that I’m using has the Storage Object Viewer, which gives the permission I need.

Here is part of my pipeline file

- name: Setup Cloud SDK
  uses: google-github-actions/setup-gcloud@v0.2.0
  with:
    project_id: ${{ secrets.GCP_PROJECT }}
    service_account_key: ${{ secrets.GCP_KEY }}
    export_default_credentials: true

- name: CML Run
  shell: bash
  env:
    repo_token: ${{ secrets.GITHUB_TOKEN }}
    GOOGLE_APPLICATION_CREDENTIALS: ${{ secrets.GCP_KEY }}
  run: |
    # run-cache and reproduce pipeline
    dvc remote add -d -f myremote gs://myproject/
    dvc pull mypath/data.csv.zip.dvc
    dvc repro -m
    
    # Report metrics
    echo "## Metrics" >> report.md
    git fetch --prune
    dvc metrics diff main --show-md >> report.md
    
    # Publish confusion matrix diff
    echo -e "## Plots\n### Confusion Matrix" >> report.md
    cml-publish $PWD/mypath/reports/confusion-matrix.png --md >> report.md
    cml-send-comment report.md

Can you please share the verbose traceback about the run of the command that is raising that error? There is also a possibility that the google storage backend we use doesn’t recognize the login method and fallbacking to anonymous authentication, so setting the credentialpath to the tokenfile might work (via dvc remote modify --local).

Sure! Here is the traceback

Run # run-cache and reproduce pipeline
Setting 'mlops-talks' as a default remote.
_request non-retriable exception: Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object., 401
Traceback (most recent call last):
  File "/home/runner/.local/lib/python3.8/site-packages/gcsfs/retry.py", line 110, in retry_request
    return await func(*args, **kwargs)
  File "/home/runner/.local/lib/python3.8/site-packages/gcsfs/core.py", line 332, in _request
    validate_response(status, contents, path)
  File "/home/runner/.local/lib/python3.8/site-packages/gcsfs/retry.py", line 97, in validate_response
    raise HttpError(error)
gcsfs.retry.HttpError: Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object., 401
ERROR: unexpected error - Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object., 401

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
Error: Process completed with exit code 255.

It seems like it is possible that gcsfs is not recognizing your credentials and falling back to the anonymous login. Can you try to do the same with setting the credentialpath to your service account file and test it again?

Sure! I did this but here is another error:

ERROR: unrecognized arguments: *** *** *** *** *** *** *** *** *** *** ***
usage: dvc remote modify [-h] [--global | --system | --project | --local]
                         [-q | -v] [-u]
                         name option [value]

Here it is with the credentialpath

dvc remote modify --local myremote credentialpath $GOOGLE_APPLICATION_CREDENTIALS

credentialpath takes a path where the credentials file is stored, from what I understand the $GOOGLE_APPLKICATION_CREDENTIALS is the contents of that file, so you should probably write it off to a temporary file (e.g /tmp/creds.json) and then pass that file as the argument

Well, I’m doing this using GitHub Actions, that’s why I’m using that $GOOGLE_APPLICATION_CREDENTIALS which is a secret with the contents of that json file. Can I use that configure the credentials using github secrets?

1 Like