Hi, I was wondering how I have to setup the github actions yaml file, so that it pulls the data and the model from google drive? Maybe I need to make the Gdrive folder public, or add my Gdrive credentials? Right now, the sanity-check fails using the workflow file from the video.
test.yaml:
name: auto-testing
on: [push]
jobs:
run:
runs-on: [ubuntu-latest]
container: docker://dvcorg/cml-py3:latest
steps:
- uses: actions/checkout@v2
- name: sanity-check
env:
repo_token: ${{ secrets.GITHUB_TOKEN }}
run: |
# Your ML workflow goes here
pip install -r requirements.txt
python test.py
If you want to combine DVC with Github actions to achieve some CI automation, maybe you should take a look at our another product CML. In its documents it provides some info on how to setup this.
However, in my case, that led to the following error:
dvc pull
ERROR: configuration error - GDrive remote auth failed with credentials in '.../.dvc/tmp/gdrive-credentials.json'.
Backup first, remove or fix them, and run again.
It should do auth again and refresh the credentials.
Details:: '_module'
ERROR: GDrive remote auth failed with credentials in '.../.dvc/tmp/gdrive-credentials.json'.
Backup first, remove or fix them, and run again.
It should do auth again and refresh the credentials.
Details:
Learn more about configuration settings at <https://man.dvc.org/remote/modify>.
That means, I have to use the json file (or the content) on both the local version and the GitHub version? I cannot not use the json file locally and GDRIVE_CREDENTIALS_DATA on GitHub at the same time?
I tried it now. Locally it the json file works, on GitHub, I get the same error message as before.
Is there an example yaml file available somewhere?
name: auto-testing
on: [push]
jobs:
run:
runs-on: [ubuntu-latest]
container: docker://dvcorg/cml-py3:latest
steps:
- uses: actions/checkout@v2
- name: sanity-check
env:
repo_token: ${{ secrets.GITHUB_TOKEN }}
GDRIVE_CREDENTIALS_DATA : ${{ secrets.GDRIVE_CREDENTIALS_DATA }}
run: |
# Your ML workflow goes here
pip install -r requirements.txt
dvc pull data
dvc repro
The data pull still fails with the same error message.
WARNING: You are using pip version 21.1; however, version 21.3.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
/usr/local/lib/python3.6/dist-packages/pycaret/loggers/mlflow_logger.py:14: FutureWarning: MLflow support for Python 3.6 is deprecated and will be dropped in an upcoming release. At that point, existing Python 3.6 workflows that use MLflow will continue to work without modification, but Python 3.6 users will no longer get access to the latest MLflow features and bugfixes. We recommend that you upgrade to Python 3.7 or newer.
Pycaret: 2.3.10
import mlflow
Traceback (most recent call last):
File "src/test.py", line 55, in <module>
(x_train, y_train), (x_test, y_test) = mdl.load_data()
File "src/test.py", line 44, in load_data
x_train, y_train = self.read_images_labels(self.training_images_filepath, self.training_labels_filepath)
File "src/test.py", line 22, in read_images_labels
with open(labels_filepath, 'rb') as file:
FileNotFoundError: [Errno 2] No such file or directory: 'data/MINST/train/train-labels-idx1-ubyte'
Error: Process completed with exit code 1.
Is it enough to pull the parent directory of all the data files, or do I need to pull them all individually specifying all the file names?
Yes, I am using the JSON file only on my local computer and on a remote workstation. On GitHub I copy-pasted the content of the JSON file into a GitHub secret named GDRIVE_CREDENTIALS_DATA. But it is still not working.
It seems dvc on GitHub is ignoring GDRIVE_CREDENTIALS_DATA, even though it is set.
45ERROR: failed to pull data from the cloud - To use service account, set gdrive_service_account_json_file_path, and optionallygdrive_service_account_user_email in DVC config