Hi,
I encountered an error when using dvc-push
on a local minio (s3 compatible) on a NAS . I have a pipeline that pushes new data weekly, and it has worked for months without any issue. Now suddently the pipeline fails systematically, raising a Read timeout
after hanging for a while at this step:
0% Querying cache in 'dvc-backup/files/md5'| |72/294912 [00:19<03:08, 1564.25fil
Here are a few things I tried:
- increasing read timeout/connect timeout → still hanging at step 72, for any amount of time
- updating dvc from 3.49.0 → 3.59.2 : no change
Here is the verbose output once the timeout has been triggered :
025-06-02 13:10:30,391 ERROR: unexpected error - Read timeout on endpoint URL: "http://n07b.internal.cyclair.fr:9020/dvc-backup?list-type=2&prefix=files%2Fmd5%2F09%2F&delimiter=&encoding-type=url"
Traceback (most recent call last):
File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/httpsession.py", line 222, in send
response = await session.request(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/aiohttp/client.py", line 760, in _request
resp = await handler(req)
^^^^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/aiohttp/client.py", line 738, in _connect_and_send_request
await resp.start(conn)
File "/home/devcyclair/.local/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 512, in start
message, payload = await protocol.read() # type: ignore[union-attr]
^^^^^^^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/aiohttp/streams.py", line 672, in read
await self._waiter
aiohttp.client_exceptions.SocketTimeoutError: Timeout on reading data from socket
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc/cli/__init__.py", line 211, in main
ret = cmd.do_run()
^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc/cli/command.py", line 27, in do_run
return self.run()
^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc/commands/status.py", line 53, in run
st = self.repo.status(
^^^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc/repo/__init__.py", line 58, in wrapper
return f(repo, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc/repo/status.py", line 124, in status
return _cloud_status(
^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc/repo/status.py", line 98, in _cloud_status
status_info = self.cloud.status(obj_ids, jobs, remote=remote)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc/data_cloud.py", line 323, in status
o, m, n, d = self._status(default_objs, jobs=jobs, odb=odb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc/data_cloud.py", line 343, in _status
return compare_status(
^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc_data/hashfile/status.py", line 179, in compare_status
dest_exists, dest_missing = status(
^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc_data/hashfile/status.py", line 151, in status
exists.update(odb.oids_exist(hashes, jobs=jobs, progress=pbar.callback))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc_objects/db.py", line 454, in oids_exist
return list(oids & set(wrap_iter(remote_oids, callback)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc_objects/db.py", line 35, in wrap_iter
for index, item in enumerate(iterable, start=1):
File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc_objects/db.py", line 346, in _list_oids_traverse
yield from self._list_oids(prefixes=traverse_prefixes, jobs=jobs)
File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc_objects/db.py", line 250, in _list_oids
for path in self._list_prefixes(prefixes=prefixes, jobs=jobs):
File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc_objects/db.py", line 225, in _list_prefixes
yield from self.fs.find(paths, batch_size=jobs, prefix=prefix)
File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc_objects/fs/base.py", line 832, in find
for result in fut.result():
^^^^^^^^^^^^
File "/usr/lib/python3.11/concurrent/futures/_base.py", line 456, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc_objects/executors.py", line 88, in batch_coros
result = fut.result()
^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/s3fs/core.py", line 880, in _find
out = await self._lsdir(path, delimiter="", prefix=prefix, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/s3fs/core.py", line 755, in _lsdir
async for c in self._iterdir(
File "/home/devcyclair/.local/lib/python3.11/site-packages/s3fs/core.py", line 805, in _iterdir
async for i in it:
File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/paginate.py", line 30, in __anext__
response = await self._make_request(current_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/client.py", line 394, in _make_api_call
http, parsed_response = await self._make_request(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/client.py", line 420, in _make_request
return await self._endpoint.make_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/endpoint.py", line 113, in _send_request
while await self._needs_retry(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/endpoint.py", line 273, in _needs_retry
responses = await self._event_emitter.emit(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/hooks.py", line 68, in _emit
response = await resolve_awaitable(handler(**kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/_helpers.py", line 6, in resolve_awaitable
return await obj
^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/retryhandler.py", line 107, in _call
if await resolve_awaitable(self._checker(**checker_kwargs)):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/_helpers.py", line 6, in resolve_awaitable
return await obj
^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/retryhandler.py", line 126, in _call
should_retry = await self._should_retry(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/retryhandler.py", line 165, in _should_retry
return await resolve_awaitable(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/_helpers.py", line 6, in resolve_awaitable
return await obj
^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/retryhandler.py", line 174, in _call
checker(attempt_number, response, caught_exception)
File "/home/devcyclair/.local/lib/python3.11/site-packages/botocore/retryhandler.py", line 247, in __call__
return self._check_caught_exception(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/botocore/retryhandler.py", line 416, in _check_caught_exception
raise caught_exception
File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/endpoint.py", line 194, in _do_get_response
http_response = await self._send(request)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/endpoint.py", line 296, in _send
return await self.http_session.send(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/httpsession.py", line 270, in send
raise ReadTimeoutError(endpoint_url=request.url, error=e)
botocore.exceptions.ReadTimeoutError: Read timeout on endpoint URL: "http://n07b.internal.cyclair.fr:9020/dvc-backup?list-type=2&prefix=files%2Fmd5%2F09%2F&delimiter=&encoding-type=url"
2025-06-02 13:10:30,464 DEBUG: link type reflink is not available ([Errno 13] Permission denied: '/workspaces/.qGOFObVUT_-XoUALVKf-LQ.tmp')
2025-06-02 13:10:30,464 DEBUG: Removing '/workspaces/.qGOFObVUT_-XoUALVKf-LQ.tmp'
2025-06-02 13:10:30,464 DEBUG: link type hardlink is not available ([Errno 95] no more link types left to try out)
2025-06-02 13:10:30,464 DEBUG: Removing '/workspaces/.qGOFObVUT_-XoUALVKf-LQ.tmp'
2025-06-02 13:10:30,464 DEBUG: link type symlink is not available ([Errno 13] Permission denied: '/workspaces/dvc-backup/.dvc/cache/files/md5/.xSFMoCjcFZEfHg5MmiZR8A.tmp' -> '/workspaces/.qGOFObVUT_-XoUALVKf-LQ.tmp')
2025-06-02 13:10:30,464 DEBUG: Removing '/workspaces/.qGOFObVUT_-XoUALVKf-LQ.tmp'
2025-06-02 13:10:30,464 DEBUG: Removing '/workspaces/dvc-backup/.dvc/cache/files/md5/.xSFMoCjcFZEfHg5MmiZR8A.tmp'
2025-06-02 13:10:30,467 DEBUG: Version info for developers:
DVC version: 3.59.2 (pip)
-------------------------
Platform: Python 3.11.12 on Linux-6.8.0-59-generic-x86_64-with-glibc2.35
Subprojects:
dvc_data = 3.16.10
dvc_objects = 5.1.0
dvc_render = 1.0.2
dvc_task = 0.40.2
scmrepo = 3.3.11
Supports:
http (aiohttp = 3.12.4, aiohttp-retry = 2.9.1),
https (aiohttp = 3.12.4, aiohttp-retry = 2.9.1),
s3 (s3fs = 2025.5.1, boto3 = 1.37.3)
Config:
Global: /home/devcyclair/.config/dvc
System: /etc/xdg/dvc
Cache types:
Cache directory: ext4 on /dev/sdb1
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/sdb1
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/44c5755e78e24bc3e72ce1055ef5830e
I am still unsure what happens at step 72 that cause the querying to cache to hang ?
A specific file maybe ? How could I get more details about what is going on at this step ?
Could the problem be related to the minio (I am able to use dvc fetch at mutliple commit versions) ?
Could you help me debugging it and understand what is going wrong ?
Thank you !