Hey there, I’m a totally new user to dvc, so please keep that in mind when answering my question . I have multiple variants of the same type of model (for example, let’s say a network that has learned to do NLP for english sentences, and a separate network that has learned to do it for french sentences). In my example, as I plan to do language comprehension for more languages, more model variants will need to be added (let’s say I plan to have 1 model variant for each of the hundreds/thousands of languages/dialects I want to handle in the future). Let’s say I want to have a model versioning system like dvc to help with this (I am purely interested in the model versioning side of things for the moment, not data versioning). How would I go about this? As far as I can tell, dvc by default expects that 1 repo (i.e. a GCS directory) would be used to store the different versions of all the models. In my toy example, I would be interested in having a separate GCS directory for each language (i.e. all the model versions for english would get stored on 1 directory, all the model versions for french would get stored in another directory). Is there an easy way to do this?
@abiswas, usually the workflow around DVC is to work locally in your project repo, and
dvc add <stuff> to track them and then set up a remote in GCS (or other) and push them.
DVC will create a cache, which is content-addressable, so it’s not in the shape of directory or files you expect but in the shape of hashes (Similar to
.git folder if you look into it).
If you need a different version of datasets or need to pull datasets in another machine,
dvc pull or
dvc checkout will help.
We do support external (remote) files as well, but we usually don’t recommend it that, unless there’s a strong requirement to do so. Could you please share have you setup the directory and the remotes? Thanks.