'dvc push' multiple small files to aws s3 causes timeout error

Hi,
I have problem with pushing large amount of small files to s3 via ‘dvc push’ command (~2000 files few hundred kb each) from a local machine (Ubuntu).

Stable wired internet connection (~50Mbps measured speed), latest DVC (2.6.4).

While ‘querying cache…’ phase progress slows down at 85% and eventually gives an ‘unexpected error’ (FSTimeoutError).

Tried to increase max number of opened files (system-wide, by changing /etc/sysctl.conf) - no effect.

Thought if it’s worth to change timeout value (if any) but after checking dvc-remote page couldn’t find any info about it.

Could someone advice how to solve this issue?

Thanks!

Hi @yokohama ! Could you please share the output of dvc doctor and dvc push -v ?

Are you using aws s3 directly or an s3-compatible storage? I’m asking because there has been a recent issue regarding a Timeout Error related to SeaweedFS


Hi, daavoo! Thanks for quick answer!
As for AWS - to be sure, I have to check if it’s direct or other option you’ve mentioned.
Also, the other thread is really looks like mine at a first glance… will read it deeply meanwhile…
Here are logs:

-----------------

14:06 $ dvc push --jobs 1 FANCY_DVC_FILE.dvc -v
2021-09-01 14:11:18,710 DEBUG: Preparing to transfer data from '../../../.dvc/cache' to 's3://FANCY_S3_PATH'
2021-09-01 14:11:18,710 DEBUG: Preparing to collect status from 's3://FANCY_S3_PATH'
2021-09-01 14:11:18,712 DEBUG: Collecting status from 's3://FANCY_S3_PATH'
2021-09-01 14:11:18,712 DEBUG: Querying 1 hashes via object_exists
2021-09-01 14:11:20,202 DEBUG: list_hashes() returned max '122.0703125' hashes, skipping remaining results                                
2021-09-01 14:11:20,202 DEBUG: Estimated remote size: 503808 files                                                                          
2021-09-01 14:11:20,202 DEBUG: Large remote ('2162' hashes < '2519.04' traverse weight), using object_exists for remaining hashes           
2021-09-01 14:11:20,202 DEBUG: Querying 2162 hashes via object_exists
2021-09-01 14:13:29,659 ERROR: unexpected error                                                                                             
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/USER_NAME/.local/lib/python3.8/site-packages/dvc/main.py", line 55, in main
    ret = cmd.do_run()
  File "/home/USER_NAME/.local/lib/python3.8/site-packages/dvc/command/base.py", line 45, in do_run
    return self.run()
  File "/home/USER_NAME/.local/lib/python3.8/site-packages/dvc/command/data_sync.py", line 57, in run
    processed_files_count = self.repo.push(
  File "/home/USER_NAME/.local/lib/python3.8/site-packages/dvc/repo/_init_.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/USER_NAME/.local/lib/python3.8/site-packages/dvc/repo/push.py", line 48, in push
    pushed += self.cloud.push(obj_ids, jobs, remote=remote, odb=odb)
  File "/home/USER_NAME/.local/lib/python3.8/site-packages/dvc/data_cloud.py", line 85, in push
    return transfer(
  File "/home/USER_NAME/.local/lib/python3.8/site-packages/dvc/objects/transfer.py", line 221, in transfer
    status = compare_status(src, dest, obj_ids, check_deleted=False, **kwargs)
  File "/home/USER_NAME/.local/lib/python3.8/site-packages/dvc/objects/status.py", line 160, in compare_status
    dest_exists, dest_missing = status(
  File "/home/USER_NAME/.local/lib/python3.8/site-packages/dvc/objects/status.py", line 132, in status
    odb.hashes_exist(hashes, name=str(odb.path_info), **kwargs)
  File "/home/USER_NAME/.local/lib/python3.8/site-packages/dvc/objects/db/base.py", line 521, in hashes_exist
    return list(hashes & remote_hashes) + self.list_hashes_exists(
  File "/home/USER_NAME/.local/lib/python3.8/site-packages/dvc/objects/db/base.py", line 448, in list_hashes_exists
    ret = list(itertools.compress(hashes, in_remote))
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
    yield fs.pop().result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/USER_NAME/.local/lib/python3.8/site-packages/dvc/objects/db/base.py", line 439, in exists_with_progress
    ret = self.fs.exists(path_info)
  File "/home/USER_NAME/.local/lib/python3.8/site-packages/dvc/fs/fsspec_wrapper.py", line 93, in exists
    return self.fs.exists(self._with_bucket(path_info))
  File "/home/USER_NAME/.local/lib/python3.8/site-packages/fsspec/asyn.py", line 88, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/home/USER_NAME/.local/lib/python3.8/site-packages/fsspec/asyn.py", line 67, in sync
    raise FSTimeoutError
fsspec.exceptions.FSTimeoutError
------------------------------------------------------------
2021-09-01 14:13:29,692 DEBUG: Version info for developers:
DVC version: 2.6.4 (pip)
---------------------------------
Platform: Python 3.8.10 on Linux-5.11.0-27-generic-x86_64-with-glibc2.29
Supports:
	http (requests = 2.22.0),
	https (requests = 2.22.0),
	s3 (s3fs = 2021.6.1, boto3 = 1.17.49)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/mapper/vgubuntu-root
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/mapper/vgubuntu-root
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2021-09-01 14:13:29,694 DEBUG: Analytics is enabled.
2021-09-01 14:13:29,715 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmparniyeu6']'
2021-09-01 14:13:29,717 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmparniyeu6']'`Preformatted text`