Hi all,
We’re currently evaluating DVC for a rather complex use case, and would appreciate any insights or recommendations.
Our context:
We’re working with a large repository (~2 TB) that includes a huge number of files — over 1000 — ranging in size from 1MB to 100GB, organized like:
db/datasets/ds1
db/mixtures/mx1
...
The repo is used by multiple developers (mostly on Windows) and also by automated processes (mostly Linux) — scheduled tasks, CI pipelines, etc.
Up until now, we’ve been using SVN, which handled this somehow (not great, but it worked). We’re now migrating to DVC for better versioning, history, and overall developer experience.
Our DVC setup:
-
Data storage: S3-compatible (MinIO)
-
DVC remote: s3://datasets/dvc
-
Cache: shared cache hosted on one Linux server (XFS + noatime)
-
Access methods we’ve tried: NFS, SMB, and rclone-mounted S3
-
Cache config:
[core] analytics = false remote = s3 [cache] dir = Y:/ shared = group type = symlink [remote "s3"] url = s3://datasets/dvc endpointurl = https://s3.somedomain.com
The problems we’re facing:
-
Cache performance is very poor — especially over NFS. Interaction with DVC (e.g.
checkout
,pull
) is painfully slow. -
Windows developers are hitting errors like:
File "C:\Python313\Lib\site-packages\dvc_data\hashfile\db\local.py", line 117, in protect os.chmod(path, self.CACHE_MODE) OSError: [WinError 6] The handle is invalid: 'Y:/files\\md5\\ec\\352218c5b3676a3cd594034188759f'
My (possibly naive) intuition is that DVC is trying to “protect” the cache file with Windows-specific file permissions, which fails on a Linux-hosted SMB/NFS share. Not sure if that’s the real cause.
I’m aware ofunprotect
option, but it doesn’t seems like good idea if we want to keep data consistent.
What we’re trying to achieve:
- Efficient shared cache across platforms
- Reliable versioning and history tracking
- Good user experience for developers (especially on Windows)
- Minimal duplication of cache (disk space is a concern)
Any advice or best practices would be greatly appreciated — particularly:
- Is it realistic to share DVC cache across platforms via SMB/NFS?
- Any recommendations for Windows users to avoid the
WinError 6
issue? - Overall, i have a feeling we are doing something completely wrong. How actual workflow for distributed teams looks like in real life?
Technical details
dvc doctor output from windows client
DVC version: 3.60.0 (choco)
---------------------------
Platform: Python 3.13.5 on Windows-2022Server-10.0.20348-SP0
Subprojects:
dvc_data = 3.16.10
dvc_objects = 5.1.1
dvc_render = 1.0.2
dvc_task = 0.40.2
scmrepo = 3.3.11
Supports:
azure (adlfs = 2024.12.0, knack = 0.12.0, azure-identity = 1.23.0),
gdrive (pydrive2 = 1.21.3),
gs (gcsfs = 2025.5.1),
http (aiohttp = 3.12.13, aiohttp-retry = 2.9.1),
https (aiohttp = 3.12.13, aiohttp-retry = 2.9.1),
oss (ossfs = 2025.5.0),
s3 (s3fs = 2025.5.1, boto3 = 1.38.27),
ssh (sshfs = 2025.2.0)
Config:
Global: C:\Users\gitlab-runner\AppData\Local\iterative\dvc
System: C:\ProgramData\iterative\dvc
Cache types: symlink
Cache directory: NTFS on Y:\
Caches: local
Remotes: s3
Workspace directory: NTFS on E:\
Repo: dvc, git
dvc doctor output from linux machine, where we testing solution
DVC version: 3.60.1 (deb)
-------------------------
Platform: Python 3.12.11 on Linux-6.1.0-37-amd64-x86_64-with-glibc2.36
Subprojects:
Supports:
azure (adlfs = 2024.12.0, knack = 0.12.0, azure-identity = 1.23.0),
gdrive (pydrive2 = 1.21.3),
gs (gcsfs = 2025.5.1),
hdfs (fsspec = 2025.5.1, pyarrow = 20.0.0),
http (aiohttp = 3.12.12, aiohttp-retry = 2.9.1),
https (aiohttp = 3.12.12, aiohttp-retry = 2.9.1),
oss (ossfs = 2025.5.0),
s3 (s3fs = 2025.5.1, boto3 = 1.37.3),
ssh (sshfs = 2025.2.0),
webdav (webdav4 = 0.10.0),
webdavs (webdav4 = 0.10.0),
webhdfs (fsspec = 2025.5.1)
Config:
Global: /root/.config/dvc
System: /etc/xdg/dvc
Thanks in advance!