Adjudication Proposal for Round 1 and Round 2 ToughTable WD

42 views
Skip to first unread message

Raphaël Troncy

unread,
Sep 7, 2022, 2:42:07 AM9/7/22
to Sem-Tab Challenge
Dear organizers,

We would like to question a number of annotations provided in the ground truth that we believe could be revisited.

In the Round 1, HardTable:
- for the table SUJBVAXP, the CEA annotation of (row=3,col=1) is currently Q414375 which is a disambiguation page while we believe that Q1968338 should be the correct annotation.

In the Round 2, ToughTable-WD:
 - for the table 1C9LFOKN, the CTA annotation of (col=1) is Q11028 (information) while we believe that Q35657 (U.S. state) should be the correct type;
 - for the table 8QA9EYPI, the CTA annotation of (col=0) is Q11028 again while we believe that Q5 (or Q82955) would be correct types;
 - for the table LC4VF1A9, the CTA annotation of (col=0) is Q7048977 (non-physical entity) while we believe that Q5 (human) would be more appropriate;

 - for the table PRDTMM8A, the CEA annotation of (row=51,col=6) is Q142 (France) while we believe that Q159 (Russia) should be the proper entity;
- for the table W1858N3I, the CEA annotation of (row=336,col=2) is Q8023663 (glossary of video game terms) while we believe that Q25397095 (sandbox game) should be the correct entity.

Finally, to follow-up on the thread started by Marco at https://groups.google.com/g/sem-tab-challenge/c/7mUGlL7sFQ8/m/YyC2kTgdAQAJ regarding the Round 2 ToughTable:
 - for the table LC4VF1A9, cell (84,2) = "singing", Vincenzo has argued why Q27939 is a proper CEA annotation and we would also propose that Q17172850 (voice) is also a proper one no, since this is also compatible with the CTA "musical instrument".
 - for the table Q7CDPWKD, cell (45,6) = "Unknown (elective))", this is indeed very hard to get Q186431 as an annotation and we have been ourselves fooled by Q76068807.

We suggest that all participant teams also share their doubts and suggestions to improve the gold standards if mistakes are found and acknowledged by the organizers. This so called adjudication phase is common in many benchmarking competition from the NLP community and would benefit to all.

Best regards.

  Raphaël Troncy ... on behalf of the DAGOBAH Team

Vincenzo Cutrona

unread,
Sep 30, 2022, 11:47:03 AM9/30/22
to Sem-Tab Challenge
Dear Raphael,
we really appreciate your initiative and I think we should strongly support it. Indeed, we need to discuss annotations whenever there is a strong (dis-)agreement among participants, also because we know WikiData evolves rapidly, so maybe some old links used to update the 2T-WD version are wrong today. Or again, like in your last examples, we may start considering alternative annotations for the same cell (e.g., singing + voice).

So, I again thank you and also Marco for your valuable threads, which will surely contribute to increase the overall datasets quality.
I can tell you that we organizers will definitely discuss this adjudication phase in depth after this edition of SemTab
.

Best,
Vincenzo
Reply all
Reply to author
Forward
0 new messages