Is it possible to version files independently?

hugoehlinger · February 3, 2023, 10:02pm

Hello DVC community!

I have a tricky use case and I’m not sure this is feasible with DVC.
My project features several datasets that I would like to version independently. I would like to keep track of:

dataset 1 with its own set of tags
datset 2 with its own set of tags
dataset 3 with its own set of tags
so that I can execute a script with any combination of the versions of the three datasets (ex: dataset 1 v1, dataset 2 v2, dataset 3 v3 or dataset 1 v3, dataset 2 v2, dataset 3 v3, etc.).

Is this use case feasible with DVC?

ronan · February 7, 2023, 12:31pm

Hello @hugoehlinger !
DVC can’t easily handle independently-versioned datasets within the same project. I guess the best way would be to have a data preparation stage that has the dataset versions as params and pulls the right combination of data from elsewhere (possibly auxiliary DVC repos containing only one dataset each).

hugoehlinger · February 15, 2023, 9:18pm

Hello @ronan ! Thanks for answering me and giving me this tip
Best,

Topic		Replies	Views
Using DVC for non-machine learning models Questions	1	801	October 2, 2020
How to deal with variants of the same model (not versions) Questions	1	613	February 9, 2021
A separate data-registry for each dataset or combine them into one? Questions	1	386	July 23, 2022
Version control of the raw data with the colleagues simultaneously Questions	5	674	April 14, 2022
Manage data from one dvc folder with colleagues Questions	2	335	July 3, 2022

Is it possible to version files independently?

Related topics