Dvc push hangs indefinitely in middle of querying cache

Hi,
I encountered an error when using dvc-push on a local minio (s3 compatible) on a NAS . I have a pipeline that pushes new data weekly, and it has worked for months without any issue. Now suddently the pipeline fails systematically, raising a Read timeout after hanging for a while at this step:

 0% Querying cache in 'dvc-backup/files/md5'|                             |72/294912 [00:19<03:08, 1564.25fil

Here are a few things I tried:

  • increasing read timeout/connect timeout → still hanging at step 72, for any amount of time
  • updating dvc from 3.49.0 → 3.59.2 : no change

Here is the verbose output once the timeout has been triggered :

025-06-02 13:10:30,391 ERROR: unexpected error - Read timeout on endpoint URL: "http://n07b.internal.cyclair.fr:9020/dvc-backup?list-type=2&prefix=files%2Fmd5%2F09%2F&delimiter=&encoding-type=url"
Traceback (most recent call last):
  File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/httpsession.py", line 222, in send
    response = await session.request(
               ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/aiohttp/client.py", line 760, in _request
    resp = await handler(req)
           ^^^^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/aiohttp/client.py", line 738, in _connect_and_send_request
    await resp.start(conn)
  File "/home/devcyclair/.local/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 512, in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/aiohttp/streams.py", line 672, in read
    await self._waiter
aiohttp.client_exceptions.SocketTimeoutError: Timeout on reading data from socket

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc/cli/__init__.py", line 211, in main
    ret = cmd.do_run()
          ^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc/cli/command.py", line 27, in do_run
    return self.run()
           ^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc/commands/status.py", line 53, in run
    st = self.repo.status(
         ^^^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc/repo/__init__.py", line 58, in wrapper
    return f(repo, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc/repo/status.py", line 124, in status
    return _cloud_status(
           ^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc/repo/status.py", line 98, in _cloud_status
    status_info = self.cloud.status(obj_ids, jobs, remote=remote)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc/data_cloud.py", line 323, in status
    o, m, n, d = self._status(default_objs, jobs=jobs, odb=odb)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc/data_cloud.py", line 343, in _status
    return compare_status(
           ^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc_data/hashfile/status.py", line 179, in compare_status
    dest_exists, dest_missing = status(
                                ^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc_data/hashfile/status.py", line 151, in status
    exists.update(odb.oids_exist(hashes, jobs=jobs, progress=pbar.callback))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc_objects/db.py", line 454, in oids_exist
    return list(oids & set(wrap_iter(remote_oids, callback)))
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc_objects/db.py", line 35, in wrap_iter
    for index, item in enumerate(iterable, start=1):
  File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc_objects/db.py", line 346, in _list_oids_traverse
    yield from self._list_oids(prefixes=traverse_prefixes, jobs=jobs)
  File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc_objects/db.py", line 250, in _list_oids
    for path in self._list_prefixes(prefixes=prefixes, jobs=jobs):
  File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc_objects/db.py", line 225, in _list_prefixes
    yield from self.fs.find(paths, batch_size=jobs, prefix=prefix)
  File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc_objects/fs/base.py", line 832, in find
    for result in fut.result():
                  ^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/home/devcyclair/.local/lib/python3.11/site-packages/dvc_objects/executors.py", line 88, in batch_coros
    result = fut.result()
             ^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/s3fs/core.py", line 880, in _find
    out = await self._lsdir(path, delimiter="", prefix=prefix, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/s3fs/core.py", line 755, in _lsdir
    async for c in self._iterdir(
  File "/home/devcyclair/.local/lib/python3.11/site-packages/s3fs/core.py", line 805, in _iterdir
    async for i in it:
  File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/paginate.py", line 30, in __anext__
    response = await self._make_request(current_kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/client.py", line 394, in _make_api_call
    http, parsed_response = await self._make_request(
                            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/client.py", line 420, in _make_request
    return await self._endpoint.make_request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/endpoint.py", line 113, in _send_request
    while await self._needs_retry(
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/endpoint.py", line 273, in _needs_retry
    responses = await self._event_emitter.emit(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/hooks.py", line 68, in _emit
    response = await resolve_awaitable(handler(**kwargs))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/_helpers.py", line 6, in resolve_awaitable
    return await obj
           ^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/retryhandler.py", line 107, in _call
    if await resolve_awaitable(self._checker(**checker_kwargs)):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/_helpers.py", line 6, in resolve_awaitable
    return await obj
           ^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/retryhandler.py", line 126, in _call
    should_retry = await self._should_retry(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/retryhandler.py", line 165, in _should_retry
    return await resolve_awaitable(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/_helpers.py", line 6, in resolve_awaitable
    return await obj
           ^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/retryhandler.py", line 174, in _call
    checker(attempt_number, response, caught_exception)
  File "/home/devcyclair/.local/lib/python3.11/site-packages/botocore/retryhandler.py", line 247, in __call__
    return self._check_caught_exception(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/botocore/retryhandler.py", line 416, in _check_caught_exception
    raise caught_exception
  File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/endpoint.py", line 194, in _do_get_response
    http_response = await self._send(request)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/endpoint.py", line 296, in _send
    return await self.http_session.send(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/devcyclair/.local/lib/python3.11/site-packages/aiobotocore/httpsession.py", line 270, in send
    raise ReadTimeoutError(endpoint_url=request.url, error=e)
botocore.exceptions.ReadTimeoutError: Read timeout on endpoint URL: "http://n07b.internal.cyclair.fr:9020/dvc-backup?list-type=2&prefix=files%2Fmd5%2F09%2F&delimiter=&encoding-type=url"

2025-06-02 13:10:30,464 DEBUG: link type reflink is not available ([Errno 13] Permission denied: '/workspaces/.qGOFObVUT_-XoUALVKf-LQ.tmp')
2025-06-02 13:10:30,464 DEBUG: Removing '/workspaces/.qGOFObVUT_-XoUALVKf-LQ.tmp'
2025-06-02 13:10:30,464 DEBUG: link type hardlink is not available ([Errno 95] no more link types left to try out)
2025-06-02 13:10:30,464 DEBUG: Removing '/workspaces/.qGOFObVUT_-XoUALVKf-LQ.tmp'
2025-06-02 13:10:30,464 DEBUG: link type symlink is not available ([Errno 13] Permission denied: '/workspaces/dvc-backup/.dvc/cache/files/md5/.xSFMoCjcFZEfHg5MmiZR8A.tmp' -> '/workspaces/.qGOFObVUT_-XoUALVKf-LQ.tmp')
2025-06-02 13:10:30,464 DEBUG: Removing '/workspaces/.qGOFObVUT_-XoUALVKf-LQ.tmp'
2025-06-02 13:10:30,464 DEBUG: Removing '/workspaces/dvc-backup/.dvc/cache/files/md5/.xSFMoCjcFZEfHg5MmiZR8A.tmp'
2025-06-02 13:10:30,467 DEBUG: Version info for developers:
DVC version: 3.59.2 (pip)
-------------------------
Platform: Python 3.11.12 on Linux-6.8.0-59-generic-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 3.16.10
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.40.2
        scmrepo = 3.3.11
Supports:
        http (aiohttp = 3.12.4, aiohttp-retry = 2.9.1),
        https (aiohttp = 3.12.4, aiohttp-retry = 2.9.1),
        s3 (s3fs = 2025.5.1, boto3 = 1.37.3)
Config:
        Global: /home/devcyclair/.config/dvc
        System: /etc/xdg/dvc
Cache types: 
Cache directory: ext4 on /dev/sdb1
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/sdb1
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/44c5755e78e24bc3e72ce1055ef5830e

I am still unsure what happens at step 72 that cause the querying to cache to hang ?
A specific file maybe ? How could I get more details about what is going on at this step ?
Could the problem be related to the minio (I am able to use dvc fetch at mutliple commit versions) ?
Could you help me debugging it and understand what is going wrong ? :slightly_smiling_face: :pray:

Thank you !

Hi,

If this worked before and stopped working, it could be that a dependency was updated or MinIO itself changed. We haven’t made any major changes to the dvc push/pull logic recently.

That said, here are a few things you can try:

  1. The read timeout issue might be caused by dvc sending more requests than the server can handle. It’s not clear why this behavior only started now, but you can try limiting the number of threads dvc uses with the --jobs flag, for example:

    dvc push --jobs 4
    
  2. It’s worth checking whether MinIO was updated recently or not.

  3. One likely cause could be an update in a dependency, particularly s3fs. You can try downgrading it to an older version to see if that resolves the issue.

  4. If you’re using uv, you can install DVC and its dependencies as they were on a specific date using the --exclude-newer flag. For example:

    uv pip install "dvc[s3]" --exclude-newer 2025-02-05
    

    If that works, you can diff the installed dependencies and find out what the culprit is. :slight_smile: