Configure: DVC + CircleCI + GDRIVE

Trying to configure DVC to pull gdrive stored artifacts from a CircleCI build? I followed instructions from docs, hints in this discussion-question & this git-hub issue but there seems to be something missing.

Here is the excerpt of the circle-ci configuration being used:

- run:
     name: Data checkout
     command: |
        dvc remote modify storage --local gdrive_use_service_account true;
        dvc remote modify storage --local gdrive_service_account_json_file_path /dev/null;
        dvc remote modify storage --local gdrive_service_account_user_email <my@service.iam.gserviceaccount.com>;
        dvc pull --recursive -r storage --verbose;
     environment:
        GDRIVE_CREDENTIALS_DATA: $GDRIVE_CREDENTIALS_DATA

The $GDRIVE_CREDENTIALS_DATA environment variable was configured in Circle-CI server to be that of my Google’s Service Account JSON to access my google drive. But the CircleCI runner reports this error as if the environment variable did not get read.

0% 0/1 [00:00<?, ?file/s{'info': ''}]
Go to the following link in your browser:

    https://accounts.google.com/o/oauth2/auth?client_id=710796635688-iivsgbgsb6uv1fap6635dhvuei09o66c.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.appdata&access_type=offline&response_type=code&approval_prompt=force

Everything is up to date.              
ERROR: failed to pull data from the cloud - GDrive remote auth failed with credentials in 'GDRIVE_CREDENTIALS_DATA'.
Backup first, remove or fix them, and run DVC again.
It should do auth again and refresh the credentials.

Things I am unsure about.

  1. Is the GDRIVE_CREDENTIALS_DATA environment variable the right name DVC commands looks for?
  2. Which JSON content should this environment variable have?
    • .dvc/tmp/gdrive-user-credentials.json ?
    • Google Service account JSON looking like this?
{
  "type": "service_account",
  "project_id": "<my_project>",
  "private_key_id": "<key_id>",
  "private_key": "<key_hash>",
  "client_email": "<my@service.iam.gserviceaccount.com>",
  "client_id": "<client_id>",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
}
  • Or Google client secret JSON looking like this?
{
  "installed": {
    "client_id": "<google_client_id>",
    "project_id": "<google_project_id>",
    "auth_uri": "https://accounts.google.com/o/oauth2/auth",
    "token_uri": "https://oauth2.googleapis.com/token",
    "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
    "client_secret": "<client_secret>",
    "redirect_uris": [
      "urn:ietf:wg:oauth:2.0:oob",
      "http://localhost"
    ]
  }
}
  1. Is the 1st time google browser authentication step still required for CI machines?

Many thanks

I guess you had met a bug , you can try to set any value for gdrive_service_account_json_file_path and try again.

Yea, that git-issue is one of the links I was looking at because it reports a working alternative to the bug. You’ll note I followed the same DVC setup but I am still not able to get DVC pulling data, so I must be missing something…

The problem is that my CI machine still goes through GDrive’s Oauth interactive mechanism to authenticate via the browser, but the GDrive service credentials passed through the environment variable seem are either not read, or are not set properly by me.

Answers to any of my 3 questions would help narrow down things.