Manage data from one dvc folder with colleagues

Thank you for creating this convenient data versioning tool.

We’re a team of several developers and we’re dealing with a lot of image data. My question is how to manage conflicts when multiple people manage versions in one dvc folder.

For example, if there is a total of 1000 data, developer 1 is learning the model using only 1 to 300 data, and developer 2 updates the data folder to use 200 to 500 data, developer 1 will lose its data.

We’ve also considered downloading and using data in our respective workspaces, but this is impossible because the amount of data is too large. Therefore, it seems that multiple people will have to manage the version in one data folder.

Does the DVC support solutions for these problems?

Hi @injo . How is your dvc cache set up?

Does the DVC support solutions for these problems?

You might find the use case Fast and Secure Data Caching Hub interesting and associated How to Share a Cache Among Projects

For example, if there is a total of 1000 data, developer 1 is learning the model using only 1 to 300 data, and developer 2 updates the data folder to use 200 to 500 data, developer 1 will lose its data.

@injo could you clarify this please? why would developer 1 lose their data?