Dataset suggestion

49 views
Skip to first unread message

Martin Lentschat

unread,
Aug 15, 2024, 5:31:12 AM8/15/24
to Sem-Tab Challenge

Hello everyone,

During my PhD, I worked on extracting knowledge (as n-Ary relations) from scientific articles in the food-packaging domain, and I built several datasets. My approach is driven by a domain ontology and uses data from a document tables (which are partial n-Ary relations) and complement them with information from the full-text.

I would like to know if my datasets could be of interest to you, maybe as a challenge in the 2025 Datasets Track.

The first one is a dataset of tables, with annotations that corresponds of the tasks Cell-Entity Annotation, Column-Type Annotation and Row-to-Instance Annotation (RIA) in the sense of Liu et al. 2023 (https://doi.org/10.1016/j.websem.2022.100761). The annotation was done manually and automatically (using a modification of https://hal.science/hal-01256476/document).
I think that this dataset is in SemTab scope.

I also have a dataset of symbolic and quantitative entities present in the full-texts. This is not in the scope of SemTab but works with the third dataset.

The third dataset if made of reconstituted n-Ary relations using the tables and text data. This could be a new challenge aimed at complementing table data with text data.


Feel free to take a look and come back to me with your insights or questions.

Best,
Martin Lentschat
Reply all
Reply to author
Forward
0 new messages