Change the data chunk size in the cache directory


I’ve started to use DVC recently and experienced slow upload speed toward my remote storage.
I suspect this is related to the large amount of data chunks in the cache directory that are to be uploaded.
As far as I’ve seen, the size of those chunks is 1MB.
Is there a way to change that value so I have fewer, but bigger, blocks of data?

Many thanks in advance for your help!

This is usually determined by the underlying library used for accessing the remote storage API and is not configurable in DVC. Could you please run dvc doctor and post the output here?

Ah, I see! Thank you for your reply!

Sure, here you go:

 -> dvc doctor
DVC version: 2.51.0 (pip)
Platform: Python 3.9.16 on Linux-5.4.0-135-generic-x86_64-with-glibc2.27
        dvc_data = 0.44.1
        dvc_objects = 0.21.1
        dvc_render = 0.3.1
        dvc_task = 0.2.0
        scmrepo = 0.1.16
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        ssh (sshfs = 2023.1.0),
        webdav (webdav4 = 0.9.8),
        webdavs (webdav4 = 0.9.8)
Cache directory: ext4 on /dev/mapper/ubuntu--vg-root
Caches: local
Remotes: webdavs
Workspace directory: ext4 on /dev/mapper/ubuntu--vg-root
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/cd00bdc44282bd9bb00d035a4a91b52f

It looks like you’re using a webdav remote, and yeah, in this case we don’t support setting the chunk size for webdav file transfers in DVC. Feel free to open a feature request for this in our github repo.

I think the blocksize is 2MB, same as all filesystems in fsspec. But I think the problem is not with chunking but with number of concurrent transfers which is 4 * logical CPU by default.

You can try setting jobs in config for the remote, or pass --jobs <n> flag to the dvc push.

dvc remote modify <remote_name> jobs 4

See remote modify.


dvc push --jobs 4

See push --jobs.

Thank you for your help, I’ll look into that!