Hello community,
I am new to data version control. All the examples I see talk about how to update the remote if you update the dataset in the data folder in your local working environment. My question is the following:
suppose I have a remote with 100 images and I pull. That command will copy those files to local, how do I correctly version those 100 images that I have in a remote folder?
Then, let’s suppose someone adds 100 more images to the remote, getting 200. How do I indicate again that this is the second version of the dataset?
If the data were in local, it would be more or less clear. However, I’m not sure how to do it when the cycle starts on the remote.
Best regards and thank you very much!
DVC is heavily integrated with version control systems, so when you add some data and then push it to the remote you also need to commit the changes on the tracked DVC file. When the other person adds new data, they will also do the same and update the DVC file. This is the base versioning system, so when you checkout to the previous revision and pull you’d see only the files from that revision.
I don’t think I have explained my question properly.
The question is if when I start the project there are already 100 images copied to the remote (without any previous DVC process), does it make sense to connect to a remote that already has data?
On the other hand, the addition of new data, is if 100 more images are copied (without DVC push) to the remote. How to manage those 100 untracked files.
Thanks for the answer
I see. DVC doesn’t just copy the files as is and manage them in that way, but rather use it’s own structure (which is an implementation detail). So it would be better if you could designate a different place as the remote storage unit, and then let the DVC add those files using to-remote
functionality. This way, DVC will transfer the data from the actual storage where it is kept unversioned into the remote storage that it manages and create a DVC file for further tracking. See: add
Thank you very much for your reply.
Yes, that’s what I was thinking. Have the files uploaded to a different site and then transfer them with push to a remote versioned one.