SemTab 2021 -- Data and Tasks for Round 3

53 views
Skip to first unread message

Jiaoyan Chen

unread,
Sep 23, 2021, 4:56:23 AM9/23/21
to Sem-Tab Challenge
Dear all,

Oral and poster presentations will be selected from the results in Round 1 and 2. These will be announced soon and please register the conference for the presenters.

We decide to have a final round (R3) for you to evaluate your systems, and the results will be considered for the awards (final ranking). R3 includes three table sets with 7 tasks, covering three KGs:

1. BioDiv tables with CTA-WD and CEA-WD;
2. GitTables with CTA-DBP and CTA-SCH (SCH denotes Schema.org);
3. HardTablesR3 with CTA-WD, CEA-WD and CPA-WD.

Please visit http://www.cs.ox.ac.uk/isg/challenges/sem-tab/2021/index.html for data download and more information. 

Regarding the evaluation, due to time limitation and the purpose of testing systems' generalisation (one submission per task), we are not going to setup AICrowd challenge pages, but invite you to send the final annotation files directly to us (jiaoy...@gmail.com and other organisers in cc) before 15 Oct. We will locally run the evaluators and open the results. For each annotation file, please entitle it in form of "teamname_taskname", e.g., "teamA_BioDiv-CEA-WD", and for each team, please package all your annotation files into a .tar.gz or .zip file named with your team name. 

Please contact us by replying this post for technical issues relevant to R3. 

Thanks.

Regards,
Jiaoyan

Nora Youssef

unread,
Sep 30, 2021, 7:41:06 AM9/30/21
to Jiaoyan Chen, Sem-Tab Challenge

Hi Jiaoyan, @all


Please pay attention to these issues in the Round 3 dataset. 

 

Dataset

file

Issue

Fix

BiodivTab

d4c54ecd718449e19a14216c55803455.csv

extra row found

delete the first row

GitTables

GitTables_CTA_DBP_Round3_Targets

_dbpedia suffix found in table names

delete _dbpedia suffix

GitTables

GitTables_CTA_SCH_Round3_Targets

_schema suffix found in table names

delete _schema suffix


Regards,
Nora

--
You received this message because you are subscribed to the Google Groups "Sem-Tab Challenge" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sem-tab-challe...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/sem-tab-challenge/5c3238f8-d6bf-4e60-a46d-5ba014eb658fn%40googlegroups.com.


--
Nora Youssef
Lecturer Assistant | Computer Science | FCIS | ASU

Nora Youssef

unread,
Oct 5, 2021, 4:08:53 AM10/5/21
to Jiaoyan Chen, Sem-Tab Challenge
Hi all,

Following the encountered issues. GitTables,
1. the actual csv files have an extra column, I guess the index column of pandas has been saved to the csv which shouldn't be the case?
2. Some columns are asked to be the target in DBpedia, it doesn't make sense in my opinion. 
* i.e., Table 2423 has DBP, CTA of columns 8 and 9. 8th column is actually empty (I have excluded the index column), any recommendations?
3. Some tables are found in the DBP CTA targets with no actual csv files (we can simply ignore them, but just point them out):
  • GitTables_2565
  • GitTables_1829
  • GitTables_1735
  • GitTables_1923
  • GitTables_1568
  • GitTables_2265
  • GitTables_1699
  • GitTables_1902
Regards,
Nora

Madelon

unread,
Oct 8, 2021, 8:44:31 AM10/8/21
to Sem-Tab Challenge
Hi Nora,

Thanks for participating in this round and sharing your questions. I address them point-by-point below.

1. The first column of a CSV file, indeed, corresponds to row indices. Please ignore this column when matching the target column identifiers to the table content. Also note that the "col" prefix in the column names of the tables can be ignored, e.g. the column target with column_id "1" should be matched to "col1" in the table CSV). These issues will be resolved in future versions.
2. You may come across empty columns, which is representative for real-life scenarios.
3. You can ignore targets for which no table is provided. These will be ignored in the evaluation as well.

Kind regards,
Madelon Hulsebos

Op dinsdag 5 oktober 2021 om 10:08:53 UTC+2 schreef Nora Abdelmageed:

Jiaoyan Chen

unread,
Oct 11, 2021, 5:05:21 AM10/11/21
to Madelon, Nora Youssef, Sem-Tab Challenge
Dear Nora and Medelon,

Sorry for my late reply. Your issues and explanations are all very helpful to all the participants. 

I will take them into consideration in making the evaluation for the final round. The targets with no actual CSV files will be ignored. 

Regards,
Jiaoyan

You received this message because you are subscribed to a topic in the Google Groups "Sem-Tab Challenge" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sem-tab-challenge/cG_gGoaDF9I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sem-tab-challe...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/sem-tab-challenge/e34bc99b-6760-4069-80fb-427d1df9c286n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages