@uzair
So, I’ve been playing around and it seems that indeed, a lot of small files is still painful.
I run my own tests on 140 mb dataset, 70k files:
Using DVC with default JOBS vaue (4*cpu num) - 1100 s
plain aws s3 cp --recursive - 3000s
Even though dvc spends a lot of time acquiring lock in first case, transfer is much faster, than if we were to reduce number of jobs.
So my suggestion is to play around with number of jobs (dvc push --jobs {X}) - that might help to some extend. Regretfully it seems that a lot of files is still painfully slow.
Some solution might be to pack the files, but that will be at cost of cache size, when we decide to update the dataset.
@uzair please tell me if any of this solutions could help you. You can also chime in original issue for dir optimizations and share your problem: https://github.com/iterative/dvc/issues/1970. We might need to reconsider current state of optimizations and think if there is something more to be done.