POS annotation

25 views
Skip to first unread message

محمد علي

unread,
Nov 2, 2023, 3:23:56 PM11/2/23
to inception-users
I would like to annotate Arabic text with a new part of speech (POS). I have established a new tagset with the 'tagsets' option. I followed these steps: New Project > Standard Project > Settings > Layer > Part of Speech > Granularity (I was unable to change it to character level) > XPOS > my new tagset. In Arabic, one token can consist of more than one morpheme, such as a verb with an object like (كتبه < كتبـ + ـه). I aim to annotate كتبـ as a verb and ـه as a pronoun. However, I am facing difficulty separating one token into more than one morpheme to annotate them with different POS. Is there a solution?

Richard Eckart de Castilho

unread,
Nov 2, 2023, 3:26:05 PM11/2/23
to inception-users
Hi,

> On 2. Nov 2023, at 16:23, ⁨محمد علي⁩ <⁨zmh...@gmail.com⁩> wrote:
>
> I would like to annotate Arabic text with a new part of speech (POS). I have established a new tagset with the 'tagsets' option. I followed these steps: New Project > Standard Project > Settings > Layer > Part of Speech > Granularity (I was unable to change it to character level) > XPOS > my new tagset. In Arabic, one token can consist of more than one morpheme, such as a verb with an object like (كتبه < كتبـ + ـه). I aim to annotate كتبـ as a verb and ـه as a pronoun. However, I am facing difficulty separating one token into more than one morpheme to annotate them with different POS. Is there a solution?

Instead of using the built-in POS layer, you could create your own custom layer for Part-of-Speech tagging. On that layer, you can configure the granularity to be character. Only, you won't be able to export your data in CoNLL formats etc. You can then only export as CAS XMI, CAS JSON (or WebAnno TSV).

-- Richard
Reply all
Reply to author
Forward
0 new messages