All the AWS CLI commands, accessing the same bucket, work. However, when it tries to run the DVC pull commands, I get access denied errors, even though it should be using the same credentials.
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied
Just before that, it says its “Preparing to collect status from ‘duolingo-dvc/det-grade’”.
In case its relevant. The role’s permissions to the s3 bucket are defined as follows:
data "aws_iam_policy_document" "standard-batch-job-role" {
# S3 read access to related buckets
statement {
actions = [
"s3:Get*",
"s3:List*",
]
resources = [
data.aws_s3_bucket.duolingo-dvc.arn,
"${data.aws_s3_bucket.duolingo-dvc.arn}/*",
]
effect = "Allow"
}
}
AWS doesn’t make it easy to copy the full stack trace, but here is a screenshot:
AWS’s documentation for accessing the credentials within the AWS Batch container can be found here. I’m pretty sure I’ve pulled these correctly, as otherwise the aws s3 commands would fail, which they do not. I verified this by setting the AWS_ environment variables to incorrect values, and the aws s3 commands do fail in that case.
From the docs (remote add)
Make sure you have the following permissions enabled:
s3:ListBucket
s3:GetObject
s3:PutObject
s3:DeleteObject.
This enables the S3 API methods that are performed by DVC (list_objects_v2 or list_objects, head_object, upload_file, download_file, delete_object, copy).
It already has s3:ListBucket and s3:GetObject. It’s failing on list_objects_v2. It should only need the other s3:PutObject and s3:DeleteObject if I’m doing a dvc push or something that mutates the remote store, which I am not.
For the sake of thoroughness, I added DeleteObject and PutObject permissions to the role. As expected, I still get the same error when it tries to call ListObjectsV2.
I’ve traced this down to being an issue with s3fs. Even though the access key ID, access secret, and session token are all being passed into s3fs, for some reason that library’s call to ListObjectsV2 fails, even though it works fine when I run aws s3 ls within the container. I can even reproduce it by writing my own python script that calls s3fs.S3FileSystem(key=...,secret=...,token=...).ls('duolingo-dvc'), which fails when running inside the container (but works locally). I’m not sure where to look next.
Any updates on this? I am currently facing the same issue running dvc pull with remote s3 within a Gitlab CICD pipeline. I used the same IAM role locally and in the pipeline. The error just shows in the CICD pipeline.