Best practices with data regsitry

Chisuikafuku · April 20, 2023, 5:12pm

Hi !

I’m fairly new to the concept of data registry and I’m starting to set it up for my company. I would have liked to have your opinion on good practices on the following points:

Should I put code in my data registry repository, e.g. utility functions that cleaned up the data when received (removed duplicates, etc.) or should I store them in a separate repo and dedicated my data registry repository just for data versioning ?
Is it relevant to store multiple datasets in a repo? For example if we want to tag our dataset it can quickly become a mess. Isn’t it more relevant to do a repo per dataset ?

Thx in advance

Topic		Replies	Views
A separate data-registry for each dataset or combine them into one? Questions	1	392	July 23, 2022
Large Data Registry on NAS with multiple DVC and non-DVC users Questions	8	895	August 21, 2022
Using DVC for non-machine learning models Questions	1	808	October 2, 2020
DVC local storage usecase Questions	6	1605	January 20, 2021
Hi everyone! First question - How to point multiple projects to single dataset? Questions	5	1421	February 17, 2021

Best practices with data regsitry

Related topics