Dvc-push raises ERROR: unexpected error - Read timeout on endpoint URL

Hi,

I encountered an error when using dvc-push on a local minio (s3 compatible) on a nas . This command has worked before but is suddenly broken, without any change in the code and dependencies versions.

Here is the versbose output :

$ dvc push -v
2024-11-15 17:00:30,892 DEBUG: v3.49.0 (pip), CPython 3.10.12 on Linux-6.8.0-47-generic-x86_64-with-glibc2.35
2024-11-15 17:00:30,892 DEBUG: command: /home/devcyclair/.local/bin/dvc push -v
Collecting                                                                                                                                                                                                                       |0.00 [00:00,    ?entry/s]
2024-11-15 17:00:31,964 DEBUG: Preparing to transfer data from '/media/cold_storage/dvc-backup/.dvc/cache/files/md5' to 's3://dvc-backup/files/md5'
2024-11-15 17:00:31,964 DEBUG: Preparing to collect status from 'dvc-backup/files/md5'
2024-11-15 17:00:31,982 DEBUG: Collecting status from 'dvc-backup/files/md5'
2024-11-15 17:00:32,275 DEBUG: Querying 2 oids via object_exists
2024-11-15 17:00:32,438 DEBUG: Querying 1 oids via object_exists                                                                                                                                                                                           
2024-11-15 17:00:32,956 DEBUG: Estimated remote size: 57344 files                                                                                                                                                                                          
2024-11-15 17:00:32,956 DEBUG: Querying 35222 oids via traverse                                                                                                                                                                                            
Pushing                                                                                                                                                                                                                                                    
2024-11-15 17:02:22,419 ERROR: unexpected error - Read timeout on endpoint URL: "http://local-nas-adress.fr:9020/dvc-backup?list-type=2&prefix=files%2Fmd5%2F58%2F&delimiter=&encoding-type=url"                                                      
Traceback (most recent call last):
  File "/home/devcyclair/.local/lib/python3.10/site-packages/aiobotocore/httpsession.py", line 223, in send
    response = await session.request(
  File "/home/devcyclair/.local/lib/python3.10/site-packages/aiohttp/client.py", line 605, in _request
    await resp.start(conn)
  File "/home/devcyclair/.local/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 976, in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
  File "/home/devcyclair/.local/lib/python3.10/site-packages/aiohttp/streams.py", line 640, in read
    await self._waiter
aiohttp.client_exceptions.ServerTimeoutError: Timeout on reading data from socket

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/devcyclair/.local/lib/python3.10/site-packages/dvc/cli/__init__.py", line 211, in main
    ret = cmd.do_run()
  File "/home/devcyclair/.local/lib/python3.10/site-packages/dvc/cli/command.py", line 27, in do_run
    return self.run()
  File "/home/devcyclair/.local/lib/python3.10/site-packages/dvc/commands/data_sync.py", line 64, in run
    processed_files_count = self.repo.push(
  File "/home/devcyclair/.local/lib/python3.10/site-packages/dvc/repo/__init__.py", line 58, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/devcyclair/.local/lib/python3.10/site-packages/dvc/repo/push.py", line 140, in push
    push_transferred, push_failed = ipush(
  File "/home/devcyclair/.local/lib/python3.10/site-packages/dvc_data/index/push.py", line 76, in push
    result = transfer(
  File "/home/devcyclair/.local/lib/python3.10/site-packages/dvc_data/hashfile/transfer.py", line 203, in transfer
    status = compare_status(
  File "/home/devcyclair/.local/lib/python3.10/site-packages/dvc_data/hashfile/status.py", line 179, in compare_status
    dest_exists, dest_missing = status(
  File "/home/devcyclair/.local/lib/python3.10/site-packages/dvc_data/hashfile/status.py", line 151, in status
    exists.update(odb.oids_exist(hashes, jobs=jobs, progress=pbar.callback))
  File "/home/devcyclair/.local/lib/python3.10/site-packages/dvc_objects/db.py", line 454, in oids_exist
    return list(oids & set(wrap_iter(remote_oids, callback)))
  File "/home/devcyclair/.local/lib/python3.10/site-packages/dvc_objects/db.py", line 35, in wrap_iter
    for index, item in enumerate(iterable, start=1):
  File "/home/devcyclair/.local/lib/python3.10/site-packages/dvc_objects/db.py", line 346, in _list_oids_traverse
    yield from self._list_oids(prefixes=traverse_prefixes, jobs=jobs)
  File "/home/devcyclair/.local/lib/python3.10/site-packages/dvc_objects/db.py", line 250, in _list_oids
    for path in self._list_prefixes(prefixes=prefixes, jobs=jobs):
  File "/home/devcyclair/.local/lib/python3.10/site-packages/dvc_objects/db.py", line 225, in _list_prefixes
    yield from self.fs.find(paths, batch_size=jobs, prefix=prefix)
  File "/home/devcyclair/.local/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 832, in find
    for result in fut.result():
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/home/devcyclair/.local/lib/python3.10/site-packages/dvc_objects/executors.py", line 88, in batch_coros
    result = fut.result()
  File "/home/devcyclair/.local/lib/python3.10/site-packages/s3fs/core.py", line 845, in _find
    out = await self._lsdir(path, delimiter="", prefix=prefix, **kwargs)
  File "/home/devcyclair/.local/lib/python3.10/site-packages/s3fs/core.py", line 720, in _lsdir
    async for c in self._iterdir(
  File "/home/devcyclair/.local/lib/python3.10/site-packages/s3fs/core.py", line 770, in _iterdir
    async for i in it:
  File "/home/devcyclair/.local/lib/python3.10/site-packages/aiobotocore/paginate.py", line 30, in __anext__
    response = await self._make_request(current_kwargs)
  File "/home/devcyclair/.local/lib/python3.10/site-packages/aiobotocore/client.py", line 388, in _make_api_call
    http, parsed_response = await self._make_request(
  File "/home/devcyclair/.local/lib/python3.10/site-packages/aiobotocore/client.py", line 416, in _make_request
    return await self._endpoint.make_request(
  File "/home/devcyclair/.local/lib/python3.10/site-packages/aiobotocore/endpoint.py", line 102, in _send_request
    while await self._needs_retry(
  File "/home/devcyclair/.local/lib/python3.10/site-packages/aiobotocore/endpoint.py", line 264, in _needs_retry
    responses = await self._event_emitter.emit(
  File "/home/devcyclair/.local/lib/python3.10/site-packages/aiobotocore/hooks.py", line 66, in _emit
    response = await resolve_awaitable(handler(**kwargs))
  File "/home/devcyclair/.local/lib/python3.10/site-packages/aiobotocore/_helpers.py", line 15, in resolve_awaitable
    return await obj
  File "/home/devcyclair/.local/lib/python3.10/site-packages/aiobotocore/retryhandler.py", line 107, in _call
    if await resolve_awaitable(self._checker(**checker_kwargs)):
  File "/home/devcyclair/.local/lib/python3.10/site-packages/aiobotocore/_helpers.py", line 15, in resolve_awaitable
    return await obj
  File "/home/devcyclair/.local/lib/python3.10/site-packages/aiobotocore/retryhandler.py", line 126, in _call
    should_retry = await self._should_retry(
  File "/home/devcyclair/.local/lib/python3.10/site-packages/aiobotocore/retryhandler.py", line 165, in _should_retry
    return await resolve_awaitable(
  File "/home/devcyclair/.local/lib/python3.10/site-packages/aiobotocore/_helpers.py", line 15, in resolve_awaitable
    return await obj
  File "/home/devcyclair/.local/lib/python3.10/site-packages/aiobotocore/retryhandler.py", line 174, in _call
    checker(attempt_number, response, caught_exception)
  File "/home/devcyclair/.local/lib/python3.10/site-packages/botocore/retryhandler.py", line 247, in __call__
    return self._check_caught_exception(
  File "/home/devcyclair/.local/lib/python3.10/site-packages/botocore/retryhandler.py", line 416, in _check_caught_exception
    raise caught_exception
  File "/home/devcyclair/.local/lib/python3.10/site-packages/aiobotocore/endpoint.py", line 183, in _do_get_response
    http_response = await self._send(request)
  File "/home/devcyclair/.local/lib/python3.10/site-packages/aiobotocore/endpoint.py", line 287, in _send
    return await self.http_session.send(request)
  File "/home/devcyclair/.local/lib/python3.10/site-packages/aiobotocore/httpsession.py", line 262, in send
    raise ReadTimeoutError(endpoint_url=request.url, error=e)
botocore.exceptions.ReadTimeoutError: Read timeout on endpoint URL: "http://n07b.internal.cyclair.fr:9020/dvc-backup?list-type=2&prefix=files%2Fmd5%2F58%2F&delimiter=&encoding-type=url"

2024-11-15 17:02:22,466 DEBUG: link type reflink is not available ([Errno 95] no more link types left to try out)
2024-11-15 17:02:22,466 DEBUG: Removing '/media/cold_storage/.w3XFN3zXNxwLtRQBpuagyA.tmp'
2024-11-15 17:02:22,466 DEBUG: Removing '/media/cold_storage/.w3XFN3zXNxwLtRQBpuagyA.tmp'
2024-11-15 17:02:22,467 DEBUG: Removing '/media/cold_storage/.w3XFN3zXNxwLtRQBpuagyA.tmp'
2024-11-15 17:02:22,467 DEBUG: Removing '/media/cold_storage/dvc-backup/.dvc/cache/files/md5/.GXxpZO04Pjwu5ogRbG_dJA.tmp'
2024-11-15 17:02:22,478 DEBUG: Version info for developers:
DVC version: 3.49.0 (pip)
-------------------------
Platform: Python 3.10.12 on Linux-6.8.0-47-generic-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 3.15.1
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.4.0
        scmrepo = 3.3.1
Supports:
        http (aiohttp = 3.9.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.4, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2024.3.1, boto3 = 1.34.69)
Config:
        Global: /home/devcyclair/.config/dvc
        System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sdb1
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/sdb1
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/b863b51c08c72b278e2405c62a6e3a6e

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2024-11-15 17:02:22,480 DEBUG: Analytics is enabled.
2024-11-15 17:02:22,513 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpmj99hq2y', '-v']
2024-11-15 17:02:22,526 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpmj99hq2y', '-v'] with pid 3649880
sys:1: RuntimeWarning: coroutine 'S3FileSystem._find' was never awaited
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

Here is the output of dvc doctor :

$ dvc doctor
DVC version: 3.49.0 (pip)
-------------------------
Platform: Python 3.10.12 on Linux-6.8.0-47-generic-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 3.15.1
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.4.0
        scmrepo = 3.3.1
Supports:
        http (aiohttp = 3.9.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.4, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2024.3.1, boto3 = 1.34.69)
Config:
        Global: /home/devcyclair/.config/dvc
        System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sdb1
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/sdb1
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/b863b51c08c72b278e2405c62a6e3a6e

Could you help me debugging it and understand what is going wrong ? :slightly_smiling_face: :pray:

Thank you !

1 Like

Was it one off or does it still happen to you folks? Seems to be a regular socket timeout issue. You could try to manipulate with read_timeout / connect_timeout values here - Amazon S3

It happened systematically for a specific commit (and not for the previous ones). I didn’t try to change the read_timeout or connect_timeout but it seems to be solved (for now) by using dvc push --jobs 1 instead of dvc push. Could it be related to the amount of files to be pushed ?