Ambiguous examples in the data

28 views
Skip to first unread message

Tillmann Dönicke

unread,
Jul 25, 2022, 11:46:19 AM7/25/22
to The 1st Shared Task on Multilingual Clause-level Morphology 2022
Dear organizers,


I worked with the German data a lot for the analysis task, and observed that many examples are ambiguous. For example, the pronoun “ihm” is always ambiguous between “DAT(3,SG,MASC)” and “DAT(3,SG,NEUT)”. Therefore, all sentences that contain it are ambiguous as well, e.g.:


hattet ihr euch ihm nicht angepasst?

- anpassen IND;PST;PRF;NOM(2,PL);ACC(2,PL);DAT(3,SG,NEUT);NEG;Q <- gold

- anpassen IND;PST;PRF;NOM(2,PL);ACC(2,PL);DAT(3,SG,MASC);NEG;Q <- not gold, but also correct


There are other ambiguities, e.g. the verb forms for “SBJV;LGSPEC1;” and “IND;PST;LGSPEC1” are often identical.

In the data, there seems to be no pattern which analysis is used as gold. For example, we have both:


hattet ihr euch ihm nicht angepasst?

- anpassen IND;PST;PRF;NOM(2,PL);ACC(2,PL);DAT(3,SG,NEUT);NEG;Q <- gold

- anpassen IND;PST;PRF;NOM(2,PL);ACC(2,PL);DAT(3,SG,MASC);NEG;Q <- not gold, but also correct


würden sie uns ihm nicht anpassen?

- anpassen SBJV;NOM(3,PL);ACC(1,PL);DAT(3,SG,MASC);NEG;Q <- gold

- anpassen SBJV;NOM(3,PL);ACC(1,PL);DAT(3,SG,NEUT);NEG;Q <- not gold, but also correct


or


sie passten ihn sich an.

- anpassen SBJV;LGSPEC1;NOM(3,PL);ACC(3,SG,MASC);DAT(3,PL,RFLX) <- gold

- anpassen IND;PST;LGSPEC1;NOM(3,PL);ACC(3,SG,MASC);DAT(3,PL,RFLX) <- not gold, but also correct


sie passten mich an.

- anpassen IND;PST;LGSPEC1;NOM(3,PL);ACC(1,SG) <- gold

- anpassen SBJV;LGSPEC1;NOM(3,PL);ACC(1,SG) <- not gold, but also correct


in the development set.


I have a system that can predict all possible analyses per input. It's really frustrating that its performance in the ST does not depend on whether these analyses are correct but rather how “lucky” it is when it selects one of them for evaluation. And there are hundreds of clearly ambiguous examples. Should it be like that? Will there be any solution for this issue?


@all: Did anyone observe this for the other languages as well?


Best

Tillmann

omer goldman

unread,
Jul 26, 2022, 1:21:08 PM7/26/22
to Tillmann Dönicke, The 1st Shared Task on Multilingual Clause-level Morphology 2022
Hi Tillmann,

Thanks for floating this issue. This ambiguity should definitely not hinder the numerical evaluation of the performance of any system.
We are working on solving that, and it should be fixed on time for testing the systems. So the system will be evaluated against the correct analysis closest to the system's output.

Thanks again,
The organizers

‫בתאריך יום ב׳, 25 ביולי 2022 ב-18:46 מאת ‪Tillmann Dönicke‬‏ <‪tillmann...@gmail.com‬‏>:‬
--
You received this message because you are subscribed to the Google Groups "The 1st Shared Task on Multilingual Clause-level Morphology 2022" group.
To unsubscribe from this group and stop receiving emails from it, send an email to participants-mcmshare...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/participants-mcmsharedtask-2022/232f5c02-460c-4c38-9314-5f8628349d94n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages