Some classes in the CEA

28 views
Skip to first unread message

Marco Cremaschi

unread,
Aug 31, 2022, 12:30:28 PM8/31/22
to sem-tab-...@googlegroups.com
Dear organizers,
Looking at the 2T validation dataset, we found that some table cells are annotated with types (classes) instead of entities.
Example:
cell LC4VF1A9 84 2, q27939 ("singing") is "subclass of" -> "activity" and "musical performance".

I also would like to underline the presence of some potentially inconsistent annotations.
Example:
cell Q7CDPWKD 45 6, "Unknown (elective))” annotated with q186431 (conclave)

Best regards,

Marco Cremaschi
Assistant Professor
Insid&s 2 Lab
Department of Informatics, Systems and Communication
University of Milano - Bicocca
address Building U14, Viale Sarca 336, 20126, Milan, Italy
room 1022 (Insid&s 2 Lab)
phone (+39) 02 6448 7921
skype cremaschi.m

Google Scholar - ORCID - ResearchGate - Web Site

Ernesto Jimenez-Ruiz

unread,
Sep 1, 2022, 9:28:53 AM9/1/22
to Marco Cremaschi, Cutrona Vincenzo, Sem-Tab Challenge
Ciao Marco

There is a very thin line between types and entities in wikidata and some of them can be seen as both...

The second case may indeed be an error. @Cutrona Vincenzo?

Ernesto


--
You received this message because you are subscribed to the Google Groups "Sem-Tab Challenge" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sem-tab-challe...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/sem-tab-challenge/4517c8ee-70d7-429c-bdfb-a5b89ad91be3%40Spark.


--
Ernesto Jiménez-Ruiz
Lecturer in Artificial Intelligence

Department of Computer Science
School of Science & Technology
City, University of London

Vincenzo Cutrona

unread,
Sep 2, 2022, 1:05:25 PM9/2/22
to Sem-Tab Challenge
Ciao Marco!

Thanks for reporting potential issues about 2T. Indeed, while the original DBpedia version have been manually revised, the Wikidata version has been built by exploiting existing links between the two KGs, thus there's room for potential inconsistencies and the dataset is far from being perfect 🙂 We need such kind of reports to understand if the building process of the Wikidata version is still robust after almost 3 years.

Given that the tables are from the validation set, I believe I can reply with a bunch of details without actually helping you (also because Round #2 is already due!) So, let's come to your concerns. 

In the first case, I think the annotation is right if we have a deeper look at the table values:
Andi Deris | Speed metal | Singing | Helloween

Here we're not describing Andi Deris's role within the Helloween (i.e., the relation is singer of), but instead, we're saying that within his band, Andi uses "singing" during the performance to produce musical sounds. Thus, I think q27939 may fit this case very well (as Ernesto said, the line between types and entities is very thin, so we can't always group them into two perfectly distinguished groups). If we check the type annotation in CTA, we indeed find the entity Q34379, which is "musical instrument".

About the second case, again, I think table values can help us:
Vatican City State | Pope Francis | 13 March 2013 | 6 years, 266 days |  |  | Absoluute theocracyy | Unknown (elective)

This case is a bit harder than the previous one, because the last column contains the "heir apparent" of the realm listed in the first column, but this does not apply to Vatican City (that's why we have "unknown"), where instead the monarch is elected (that's why we have "elective" as value). Thus, the cell is annotated with "conclave", which is the mean used to find the successor of the Pope, and I say that if not totally right, we can consider the provided annotation at least "quite reasonable". I believe this kind of mixed information is common in Wikipedia tables (you extend the topic of the column as the table grows, without creating new columns, just for the sake of keeping the table as much dryer as possible). But consider that this is my personal assumption; I have no clear evidence.
I do agree that one may argue that this case is very hard to understand without a bit of background/context. However, I can say that this specific table was taken directly from Wikipedia, along with the existing hyperlinks, thus Wikipedians actually chose the right annotation for that specific value cell. To be true to 2T principles, we must consider annotations from Wikipedia as true because we decided to trust Wikipedia as a correct source of information (in the spirit of the "wisdom of the people").

Anyway, have you found other alternatives that better fit these two cases? If so, I would really like to have a constructive discussion to evaluate them and understand your different point of view.

Best,
Vincenzo
Reply all
Reply to author
Forward
0 new messages