Dataset suggestion

57 views

Skip to first unread message

Martin Lentschat

unread,

Aug 15, 2024, 5:31:12 AM8/15/24

to Sem-Tab Challenge

Hello everyone,

During my PhD, I worked on extracting knowledge (as n-Ary relations) from scientific articles in the food-packaging domain, and I built several datasets. My approach is driven by a domain ontology and uses data from a document tables (which are partial n-Ary relations) and complement them with information from the full-text.

I would like to know if my datasets could be of interest to you, maybe as a challenge in the 2025 Datasets Track.

The first one is a dataset of tables, with annotations that corresponds of the tasks Cell-Entity Annotation, Column-Type Annotation and Row-to-Instance Annotation (RIA) in the sense of Liu et al. 2023 (https://doi.org/10.1016/j.websem.2022.100761). The annotation was done manually and automatically (using a modification of https://hal.science/hal-01256476/document).

Datapaper : https://www.data-in-brief.com/article/S2352-3409(22)00211-6/pdf

Dataset and codes : https://dataverse.cirad.fr/dataset.xhtml?persistentId=doi:10.18167/DVN1/GCZBC9

I think that this dataset is in SemTab scope.

I also have a dataset of symbolic and quantitative entities present in the full-texts. This is not in the scope of SemTab but works with the third dataset.

Datapaper : https://www.data-in-brief.com/action/showPdf?pii=S2352-3409%2821%2900419-4

Dataset and code : https://dataverse.cirad.fr/dataset.xhtml?persistentId=doi:10.18167/DVN1/U7HK8J

The third dataset if made of reconstituted n-Ary relations using the tables and text data. This could be a new challenge aimed at complementing table data with text data.

https://dataverse.cirad.fr/dataset.xhtml?persistentId=doi:10.18167/DVN1/1BBJBQ

https://www.sciencedirect.com/science/article/pii/S0957417422014567

Feel free to take a look and come back to me with your insights or questions.

Best,

Martin Lentschat

Reply all

Reply to author

Forward

0 new messages