Hi DVC Community
I would be generally interested on how others manage labels/annotations.
Assume I have one large dataset managed by DVC, which is further processed in multiple ML projects. The datasets consists of documents, where each file could have multiple labels and sublabels. Two files may have the same label, but different sublabels. Hence, a hierarchical folder structure is not suitable for my use case. Labels and sublabels may change over time, but the content of the files does not change.
A possible solution that I have in mind would be to have all files versioned within a single folder and keep track of the labels in a separate CSV. The CSV track the filename and the corresponding labels. But this would require additional effort to keep the data and the CSV in sync when files are added, deleted or updated.
Are there similar use cases or possible alternative solutions to this?
Thanks in advance for any advice.