Some issues in the data

18 views
Skip to first unread message

Emre Can Açıkgöz

unread,
Jul 26, 2022, 2:59:55 PM7/26/22
to The 1st Shared Task on Multilingual Clause-level Morphology 2022
Dear Organizers,

Our team is working on systems for the Shared Task and we have encountered some issues with the data.
  • The first issue is related to Russian data. It seems that there are a number of samples (examples are from the dev set) that are ungrammatical or meaningless. For example:

                  (ungrammatical)

                  блокировать; IND;PRS;NOM(3,SG,FEM);ACC(3,SG,MASC);DAT(3,SG,NEUT);INS(3,SG,MASC);

                  она блокирует его ему ему.

                  or

                  (meaningless)

                  блокировать;    IND;PRS;NOM(3,SG,MASC);NEG;Q;ACC(2,PL);AT+ABL(2,SG);INS(3,SG,MASC);

                  не блокирует ли он вас от тебя ему?
  • The second issue is about the evaluation of model outputs for languages with relatively free word order like Turkish or Russian.

                  For example in Russian:


                  блокировать;    IND;PST;NOM(1,SG,NEUT);Q;ACC(3,SG,MASC);AT+ABL(3,PL);INS(RFLX)    


                  could be both:

                  блокировало ли я его от них собой?

                  and

                  блокировало ли я его собой от них?    


                  or for Turkish:

                  Türkçeleştirmek;

                  INFR;PST;PRSP;NOM(2,PL);NEG;Q;ACC(1,PL)    


                  could be both:

                  bizi Türkçeleştirmiş olmayacak mıydınız?

                  and

                  Türkçeleştirmiş olmayacak mıydınız bizi?

How would these results be evaluated? We would appreciate your clarifications.
Thank you,

omer goldman

unread,
Jul 28, 2022, 9:08:22 AM7/28/22
to Emre Can Açıkgöz, The 1st Shared Task on Multilingual Clause-level Morphology 2022
Hi,

Thanks for letting us know about these issues.

Consulting with our Russian and Turkish annotators, it seems like there was indeed a problem in generating the Russian data. Specifically, incorrect pronouns were used whenever there was a need for pronouns in the instrumental case. We are correcting the the problem and we'll upload a corrected version of the Russian data in the upcoming days.

Regarding the second issue of alternative word order, this year's task is limited in scope to a single word order that was judged as canonical by our annotators. This order is supposed to be uniformly applied to all examples, so that's the order the systems should output.

Thanks again,

‫בתאריך יום ג׳, 26 ביולי 2022 ב-21:59 מאת ‪'Emre Can Açıkgöz' via The 1st Shared Task on Multilingual Clause-level Morphology 2022‬‏ <‪participants-mc...@googlegroups.com‬‏>:‬
--
You received this message because you are subscribed to the Google Groups "The 1st Shared Task on Multilingual Clause-level Morphology 2022" group.
To unsubscribe from this group and stop receiving emails from it, send an email to participants-mcmshare...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/participants-mcmsharedtask-2022/c3be9a2f-529a-441d-95f6-1df4eeb214d1n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages