Question about Unimorph Labeling Standards

10 views
Skip to first unread message

Stephen Bothwell

unread,
Mar 5, 2025, 6:19:47 PMMar 5
to unimorph
Dear All,

Hello! I am currently doing some research that involves examining tokenization in language models using morphological metrics, and I have been attempting to use Unimorph as a resource in the process. In doing so, I have tried to parse Unimorph files in various languages and map morphological features to their dimensions as defined in Sylak-Glassman 2016. However, I have often run into problems where labels that fall outside of that standard are present.

For example, languages like Azerbaijani, Latin, and Turkish contain the LOC (locative) case tag across a variety of forms. Sylak-Glassman 2016 does not include the locative as part of Unimorph's standard, and this tag is not mentioned in any publications on Unimorph to my knowledge. (I believe a similar issue for this and other labels was raised in a previous thread.)

Are there any updated standards for Unimorph's annotation scheme beyond Sylak-Glassman 2016? I have not had any luck finding them or validation tools like those described in the Unimorph 3.0 paper.

Thank you for your time, and I hope that you have a great day!

Sincerely,
Stephen Bothwell
Ph.D. Candidate
Department of Computer Science and Engineering
University of Notre Dame
Reply all
Reply to author
Forward
0 new messages