CML + Github actions + Google Drive / Service Account

Hi,

What is the correct way to use Github secrets with Google Drive key (json file)?
We are getting the following error:

ERROR: failed to pull data from the cloud - To use service account, set gdrive_service_account_json_file_path, and optionallygdrive_service_account_user_email in DVC config

Local execution is configured by

dvc remote modify myremote gdrive_use_service_account true

and

dvc remote modify myremote --local \
             gdrive_service_account_json_file_path path/to/file.json

Cheers

:wave: Hello, @mcosta!

You can create a repository secret with the contents of the .dvc/tmp/gdrive-user-credentials.json file that DVC is going to generate locally after the first successful authentication.

Then, you can expose it on your workflow by using the following environment variable, as suggested in the documentation:

steps:
  - run: dvc pull
    env:
      GDRIVE_CREDENTIALS_DATA: ${{ secrets.GDRIVE_CREDENTIALS_DATA }}

Hi @0x2b3bfa0

That was our first attempt. It did not work, though.
Note that we have our config with gdrive_use_service_account = true

Can you please confirm that the GDRIVE_CREDENTIALS_DATA variable has been populated with the contents of the .dvc/tmp/gdrive-user-credentials.json file and not with the contents of the service account JSON file issued by Google Cloud Platform?

If this doesn’t solve your issue, can you please post the output of dvc doctor and dvc pull --verbose --remote myremote?

@0x2b3bfa0

On our first attempt we added the content from the json directly extracted from Google Coud Platform. Then we did exactly what was mentioned in documentation and updated the secret with the content of the local file (.dvc/tmp/gdrive-user-credentials.json).
The error it gave was always the same.

The yaml file looks like this

name: model-training
on: [push]
jobs:
run:
runs-on: [ubuntu-latest]
container: docker://dvcorg/cml:0-dvc2-base1
steps:
- uses: actions/checkout@v2
- name: Test
env:
REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GDRIVE_CREDENTIALS_DATA: ${{ secrets.GDRIVE_CREDENTIALS_DATA }}

    run: 
      # Install requirements
      pip install -r Requirements.txt
      
      dvc remote modify filterout --local gdrive_user_credentials_file .dvc/tmp/gdrive-user-credentials.json

Missing this part:

dvc pull -r filterout --run-cache

dvc repro

Can you please attach a copy of the error message, preferebly when running your dvc commands with the --verbose option?

What did you intend to achieve with this command? DVC should automatically pick the credential file contents from the GDRIVE_CREDENTIALS_DATA environment variable.

Can you please try to run the workflow after commenting out this line?

@0x2b3bfa0

Sorry, that line was a consequence of a forgotten trial and error flow…

After removing it, adding --verbose to dvc pull and forcing a new push, we have the following:

Note: was not able to copy / paste the logs. It did not allow me to do it. It says that new users are only allowed to add 2 links…

I’ve been able to reproduce the issue. :see_no_evil:

The following steps work as expected, but they diverge quite a bit from what I recall from previous tests:

on: push
jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: iterative/setup-dvc@v1
      - run: |
          git init
          dvc init
          
          dvc remote add origin gdrive://root
          dvc remote modify origin --local gdrive_use_service_account true
          dvc remote modify origin --local gdrive_service_account_json_file_path /dev/null
          
          date > file
          dvc add file
          dvc push --remote origin
        env:
          GDRIVE_CREDENTIALS_DATA: ${{ secrets.ORIGINAL_SERVICE_ACCOUNT_JSON }}

@0x2b3bfa0

Tried to follow your script. Removed
container: docker://dvcorg/cml:0-dvc2-base1 line
and added

  • uses: iterative/setup-dvc@v1

With this change we are having an error upstream in pip install -r Requirements.txt

ERROR: Could not open requirements file: [Errno 2] No such file or directory: ‘Requirements.txt’

@0x2b3bfa0

After looking to https://github.com/iterative/cml#using-cml-with-dvc

and to your demo code we managed to have it pulling data. We created the secret with Google Service Account key and used as you did.

Although it is present in the requirements file, we had to pip install dvc[gdrive] after installing what is in the Requirements.txt.

dvc repro fails

ERROR: failed to reproduce ‘dvc.yaml’: [Errno 2] No such file or directory: PosixPathInfo: ‘data/Documents’

Theoretical this should exist. For example, in the logs you have things like:

2021-06-17 21:54:54,452 DEBUG: Created ‘copy’: .dvc/cache/0c/048c861ad12cd2bb19e031019c446f → data/documents/train/Document/01.jpg

Script Used

name: model-training
on: [push] # Trigger - push event
jobs:
run:
runs-on: [ubuntu-latest]
#container: docker://dvcorg/cml:0-dvc2-base1
steps:
- uses: actions/checkout@v2
- uses: iterative/setup-cml@v1
- name: Model Diff
env:
REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GDRIVE_CREDENTIALS_DATA: ${{ secrets.ORIGINAL_SERVICE_ACCOUNT_JSON }}
run: |
pip install -r Requirements.txt
pip install dvc[gdrive]
dvc remote modify filterout --local gdrive_service_account_json_file_path /dev/null
dvc pull -r filterout --verbose
dvc repro

Umm, it looks that something is case sensitive…

data/documents vs data/Documents

Awesome! You should be able to use GitHub Actions to replace all the useful packages shipped with the Docker images if you want to:

steps:
  - uses: actions/setup-python@v2
  - uses: iterative/setup-dvc@v1
  - uses: iterative/setup-cml@v1

Please keep in mind that the vast majority of Linux distributions rely on case-sensitive filesystems: a file named requirements.txt can’t be accessed with any other case combination, like Requirements.txt as in your answer.

Can you please check if this file begins with an uppercase letter as in your message?

If you use our containers or the iterative/setup-dvc@v1 action, you shouldn’t need/want to install DVC manually to your Python environment. We ship them with all the backends, including [gdrive] for convenience.

Again, it looks like data/Documents is not the same as data/documents due to case sensitivity.

Sorry, I noticed your message after posting the same. As you guessed, everything is case sensitive in almost every sane Linux system: even if you’re accustomed to work in case-insensitive filesystems locally, you always try to take that into account for portability. :sweat_smile:


Note: I had to consolidate all my messages in a single place because new users can’t post more than three consecutive replies.

@0x2b3bfa0

Case sensitive issues are solved.
It is not issue free, though.

Our dvc.yaml has command that looks like

cmd: python src/filterout_train.py -hp …/docs_params.yaml

Then the following error occurs.

python: can’t open file ‘src/filterout_train.py’: [Errno 2] No such file or directory

Both Windows and Mac machines run smoothly. So the default working directory is probably defined differently here.

Details of the error

Running stage ‘train_documents’:

6952> python src/filterout_train.py -hp …/docs_params.yaml

6953python: can’t open file ‘src/filterout_train.py’: [Errno 2] No such file or directory

69542021-06-17 22:51:28,195 ERROR: failed to reproduce ‘dvc.yaml’: failed to run: python src/filterout_train.py -hp …/docs_params.yaml, exited with 2

Stupid problem with capitalization. We changed the filename to lowercase assuming it would be sent with git push, but in reality no changes were detected. Because the change was made together with several others, we didn’t notice…
It’s working!

1 Like

It would be nice to have a reference yaml file somewhere.