Hi !
I’m fairly new to the concept of data registry and I’m starting to set it up for my company. I would have liked to have your opinion on good practices on the following points:
-
Should I put code in my data registry repository, e.g. utility functions that cleaned up the data when received (removed duplicates, etc.) or should I store them in a separate repo and dedicated my data registry repository just for data versioning ?
-
Is it relevant to store multiple datasets in a repo? For example if we want to tag our dataset it can quickly become a mess. Isn’t it more relevant to do a repo per dataset ?
Thx in advance