Best practices with data regsitry

Hi !

I’m fairly new to the concept of data registry and I’m starting to set it up for my company. I would have liked to have your opinion on good practices on the following points:

  1. Should I put code in my data registry repository, e.g. utility functions that cleaned up the data when received (removed duplicates, etc.) or should I store them in a separate repo and dedicated my data registry repository just for data versioning ?

  2. Is it relevant to store multiple datasets in a repo? For example if we want to tag our dataset it can quickly become a mess. Isn’t it more relevant to do a repo per dataset ?

Thx in advance