Dvc checkout takes long time

Hello,
I’m testing dvc in the workspace(or dvc project) of a gpu server and NFS mounted NAS to that server.

The dataset is about a million images (~40GB) and is located in NAS.

I run the following command in my workspace:

 dvc add "dataset path in NAS" -o .   (already set dvc cache dir in a path in NAS.)

This takes about 2 hours. (I wonder it’s normal).

The problem is when I run ‘dvc checkout’ in the other workspace to see it works fine, it takes about 1 hour. (I also set dvc config cache.type symlink.)

Even though I make symlink files from the original dataset using os.symlink in python, it doesn’t take this long. (I guess it’s about a few minutes).

Could I shorten this ‘dvc checkout’ time as short as possible?

Hi!

Unfortunately certain operations such as exists() and stat() can be extremely slow when running on an NFS fs. We currently have plans to improve performance, see this issue for more information: