Same remote for multiple repos


#1

Hi

I was wondering how the remote mechanisms work.
Assuming I have two different DVC repos, that may use the same base dataset.
Can I use the same remote for both of them, so the dataset is not stored twice?

This is strictly speaking not possible because of hashing collisions, is it?

Regards
Matthias


#2

Hi @ynop !

Thank you for a great question! Sure, you can do that. If some file has a hash 123456 in one project, it is going to have the same hash in another project. Strictly speaking hash collisions are always a possibility simply because of the nature of hashes, but, as it is in many other scenarios, in our case they are pretty unlikely, since the data is pretty big and the number of data files is quite small when compared with git for example. So you are safe using the same repo to store data for different projects.

Thanks,
Ruslan