Dvc external output, dependencies KeyError: 'x-amz-bucket-region'

stages:
gather_pool_list:
cmd: python pipeline_scripts/gather_pool_list.py
–data_dir ${general.data_dir}
${gather_pool_list}
deps:
- pipeline_scripts/gather_pool_list.py
- code_data_collecting/data_fetcher.py
outs:
- pipeline_data/gather_pool_list/pools.csv
- s3://general-info.pool.csv:
cache: false

dvc repro -s gather_pool_list -v

raise:
2025-02-12 12:16:32,455 ERROR: failed to reproduce ‘gather_pool_list’: ‘x-amz-bucket-region’

File “C:\Users\grshn.conda\envs\torch_python39\lib\site-packages\s3fs\core.py”, line 359, in get_s3
return await self._s3creator.get_bucket_client(bucket)
File “C:\Users\grshn.conda\envs\torch_python39\lib\site-packages\s3fs\utils.py”, line 53, in get_bucket_client
region = response[“ResponseMetadata”][“HTTPHeaders”][“x-amz-bucket-region”]
KeyError: ‘x-amz-bucket-region’

I’m using yandex s3 (not amazon).

dvc status

2025-02-12 12:17:34,663 ERROR: unexpected error - ‘x-amz-bucket-region’
File “C:\Users\grshn.conda\envs\torch_python39\lib\site-packages\s3fs\core.py”, line 359, in get_s3
return await self._s3creator.get_bucket_client(bucket)
File “C:\Users\grshn.conda\envs\torch_python39\lib\site-packages\s3fs\utils.py”, line 53, in get_bucket_client
region = response[“ResponseMetadata”][“HTTPHeaders”][“x-amz-bucket-region”]
KeyError: ‘x-amz-bucket-region’

From my environment I can access files from s3fs:
s3 = s3fs.S3FileSystem(anon=False)
s3.ls(‘general-info’)
return:
[‘general-info/pools.csv’]

aws cli also works.
How can I use external output? I tried use s3 dependencies - same error.

Are you using some s3 compatible storage by chance?

It is triggered most likely by this issue - fix(region cache): x-amz-bucket-region can be missing for s3 compatible by shcheklein · Pull Request #929 · fsspec/s3fs · GitHub that I made a PR for recently.

How can I use external output?

External outputs are deprecated btw. I would not recommend using them.

1 Like

From yandex.cloud
The Object Storage API is partially compatible with the AWS S3 API, so you can use tools built for S3.

Compatibility with the Amazon S3 API

To manage Object Storage, you can use tools that are compatible with Amazon S3, including the API, CLI, WinSCP, Java SDK, or Python SDK.