Archive / Share a snapshot of a DVC remote

Dear DVC team,

Thank you for the great work and coming with an agnostic approach for DVC!

It seems that a) one could create a tar.gz of the directory used as a DVC remote and b) that someone else could unpack this directory somewhere else and use it as a local remote.

Is there any counter-argument for distributing an archive of a DVC remote?

I have seen that it seems fine according to Copying a dvc repository. But it was in another context (filesystem specificities).

Why archiving a DVC remote? The idea is to share publicly a snapshot of a DVC remote.

One way could be to dvc push to a specific public remote when someone would want to create a snapshot. However, this implies two things one might not want. First, this requires a dedicated public server. Second, this prevents from having a DOI for the snapshot.

Would you have a better suggestion than distributing an archive to share a snapshot of a DVC remote?

Thanks,
Pierre-Alexandre

Hello @pafonta!
Thank you for sharing your thoughts on the DVC.

As to the point you have been making:
What is your use case? What kind of remote are you using?
It seems to me that in many cases snapshotting the remote or the cache could be achieved by zip-ping or tar-ing the cache/remote directory. Do you think we need a special command for that?

Hello @Paffciu!

Thank you for your prompt reply.

Great!

In this case, this is a SSH remote.

The use case is from academia. In a nutshell. One uses DVC for a ML project which has public milestones. People should be able to reproduce experiments / models for each milestones. But in-between, evolutions should be private. Also, each snapshot should get a DOI (Wikipedia) to be a companion to, for example, a scientific paper. I hope it clarifies.

Regarding getting a special DVC command for snapshots, I guess the use case above is common in academia or will become common in the 1-3 years horizon. Also, for other types of remote or extremely huge tracked data and models, it could be tricky to do it by hand. So, I think that it could become handy for DVC to have such feature. I hope it could help make DVC even greater.

Have a good day!

@pafonta
Thank you very much for clearing this up.
Your use case makes perfect sense, and need to have ability to “officialy” prepare snapshot sounds reasonable in case of assigining DOI. Could I ask you to create feature request for that on our github? We try to keep development discussions there so that we have a single place of discussion history.

@Paffciu If it could help the community, here is the GitHub issue.

1 Like