I have a SageMaker Studio terminal session where I’m trying to pull dvc controlled data. I’m able to do the git clone but when I try the “dvc pull” I get the error:
unexpected error - Access Denied: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied
The -v doesn’t provide much insight other than it’s likely something in the aiobotocore module.
2023-10-04 13:48:00,541 ERROR: unexpected error - Access Denied: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied
Traceback (most recent call last):
File “/opt/conda/lib/python3.9/site-packages/s3fs/core.py”, line 113, in _error_wrapper
return await func(*args, **kwargs)
File “/opt/conda/lib/python3.9/site-packages/aiobotocore/client.py”, line 383, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied
Platform: Python 3.9.15 on Linux-4.14.322-246.539.amzn2.x86_64-x86_64-with-glibc2.26
dvc_data = 2.16.4
dvc_objects = 1.0.1
dvc_render = 0.6.0
dvc_task = 0.3.0
scmrepo = 1.3.1
http (aiohttp = 3.8.5, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.5, aiohttp-retry = 2.8.3),
s3 (s3fs = 2023.9.2, boto3 = 1.28.17)
I know it’s not the underlying IAM role that is the problem as I can use the AWS cli to pull down files. I can also use the s3fs directly to download files.
The SageMaker Studio app is connected to a VPC and the S3 access has VPC Endpoints (VPCe). However, due to security configuration outside of my control, the VPCe for S3 is on the regional endpoint and NOT the global endpoint.
Is there a way to see the actual endpoint DVC is trying to use when making an S3 call? Any other ideas on how to troubleshoot what’s going on?