Standardized metadata fields #995
Replies: 1 comment
-
Notes To be handled in david-allison/manx-corpus-search#132 (keep the Some of the notes were from before we had Kelly's dictionary integrated, or a 'corrected text' functionality. Long-term, I expect a number will go. I'd support adding initials for the notes, I see this as lower-priority, but check with the group, and if it's important I'll get to it. Metadata Agreed, and briefly discussed in our meeting on 2023-01-25.
Given the principles above, I'd welcome an initial formalisation of what's required. I'd recommend tackling this via incremental improvements as those typically take much less time for a group to come to a consensus on. Tags Prototype: https://manxcorpus.com/Tags Rob suggested these tags off the top of his headTags
Thanks! Give me a ping when the group have come to a consensus and I'll get it done. |
Beta Was this translation helpful? Give feedback.
-
It would be good to establish what metadata fields should exist across the corpus, what form the entries should take, and what they refer to. This would enable easier searching of metadata in future, make it easier to implement new features, allow contributors to standardize metadata when submitting texts, and help users access and use this information. The metadata is a really important part of the usefulness of a corpus.
At present there is some variation in field names and entries; for example, there is “transcription”, “transcribed”, and "transcribed & translated". There’s sometimes a date with the name in these fields, sometimes not. There’s also "translator" and "translated".
Ideally the metadata would show what kind of transcription has been made: diplomatic, normalized, typos corrected etc. It should show who made it and when.
Would also be good to pin down what is an acceptable entry for “type”. For example. there is “song”, “bannag”, “Carval”, and “Traditionary Ballad”. I can’t see any usefulness for “traditionary ballad” (as there’s only one of them) or “bannag”.
Where there are notes it is useful to know to whom they are due. Some texts have a lot of linguistic notes without saying who has written them. (On that subject it’s my opinion that linguistic notes should not appear inline at all as they clutter the page and are outside of the scope of a searchable corpus.)
“Source” should have a standardised format.
I’m sure others will have more suggestions about how this aspect of the project can be improved.
Beta Was this translation helpful? Give feedback.
All reactions