Shared cache on NFS is slow

jmiller · September 16, 2021, 2:29am

I have multiple users who each can launch their own AWS instance for data analysis, then shut it down when they are done. Rather than having to rebuild the dvc cache on each of these instances when they startup, we want to have a persistent shared cache.

So I’ve created a shared DVC cache on EFS (AWS’s version of NFS) with symlinks in the workspace, but the problem is that ‘dvc fetch’ and ‘dvc checkout’ are now quite slow for certain branches that have 1000s of files tracked by DVC. First, for a ‘dvc fetch’, even if no downloads need to be done from the remote, just “Querying cache…” can take up to 8 minutes. Then ‘dvc checkout’ can take 6 more minutes. Presumably this has to do with NFS communication overhead to list files in the cache (I see the progress bar saying things like “110 files/sec”).

Is there any way to speed up a shared cache stored on NFS, especially when you have 1000s of tracked files?

YanxiangGao · September 16, 2021, 2:24pm

Could you please get the cprofile dump of your command, It can be get via
adding a --cprofile --cprofile-dump my.prof option in your command.

jmiller · September 20, 2021, 4:24pm

Is any private information possibly included in these dump files (passwords, AWS keys, file or variable contents, etc)? I am a little hesitant to post the full files because of that.

But I analyzed the files with snakeviz and can tell you that for “dvc fetch” the vast majority of time was spent in hashes_exist > ... > ~:0(<built-in method posix.stat>).

Same for “dvc checkout” – it all comes down to posix.stat.

YanxiangGao · September 21, 2021, 4:05am

Guess it is related to https://github.com/iterative/dvc/issues/5562 this one?

jmiller · September 22, 2021, 10:17pm

Yes, looks like it. I’ll look forward to any improvements that could be made on the DVC side.

For those using Amazon’s EFS, I recommend using General Purpose mode (not max I/O), which has lower latency and thus appears to speed up the stat() calls.

Topic		Replies	Views
Dvc checkout takes long time Questions	1	1252	March 22, 2022
Setup DVC to work with shared data on NAS server Questions	10	15209	June 12, 2019
Shared cache directory Questions	14	3169	July 5, 2018
Timing to create a dvc repo for a 60GB dataset? Questions	15	961	March 21, 2022
Direct copy between Shared Cache and External Dependencies/Outputs Questions	10	1863	June 3, 2021

Shared cache on NFS is slow

Related topics