How to deal with variants of the same model (not versions)

abiswas · February 8, 2021, 9:19pm

Hey there, I’m a totally new user to dvc, so please keep that in mind when answering my question . I have multiple variants of the same type of model (for example, let’s say a network that has learned to do NLP for english sentences, and a separate network that has learned to do it for french sentences). In my example, as I plan to do language comprehension for more languages, more model variants will need to be added (let’s say I plan to have 1 model variant for each of the hundreds/thousands of languages/dialects I want to handle in the future). Let’s say I want to have a model versioning system like dvc to help with this (I am purely interested in the model versioning side of things for the moment, not data versioning). How would I go about this? As far as I can tell, dvc by default expects that 1 repo (i.e. a GCS directory) would be used to store the different versions of all the models. In my toy example, I would be interested in having a separate GCS directory for each language (i.e. all the model versions for english would get stored on 1 directory, all the model versions for french would get stored in another directory). Is there an easy way to do this?

skshetry · February 9, 2021, 10:35am

@abiswas, usually the workflow around DVC is to work locally in your project repo, and
do dvc add <stuff> to track them and then set up a remote in GCS (or other) and push them.

DVC will create a cache, which is content-addressable, so it’s not in the shape of directory or files you expect but in the shape of hashes (Similar to .git folder if you look into it).

If you need a different version of datasets or need to pull datasets in another machine, dvc pull or dvc checkout will help.

We do support external (remote) files as well, but we usually don’t recommend it that, unless there’s a strong requirement to do so. Could you please share have you setup the directory and the remotes? Thanks.

Topic		Replies	Views
Using DVC to keep track of multiple model variants Questions	8	2613	August 21, 2020
Using DVC for non-machine learning models Questions	1	803	October 2, 2020
Manage data from one dvc folder with colleagues Questions	2	335	July 3, 2022
Does dvc have a model sharing feature? Questions	1	287	December 15, 2022
Trying to understand data storage Questions	7	2297	October 30, 2022

How to deal with variants of the same model (not versions)

Related topics