Announcement: Several changes on the data format and evaluation scripts

45 views
Skip to first unread message

Phong Nguyen

unread,
Jul 2, 2022, 5:35:37 PM7/2/22
to LM-KBC
Dear LM-KBC participants,

Thanks for your interest in the LM-KBC challenge. We look forward to your submissions.

As there have been very useful comments about the released data and evaluation scripts, we have decided to make several changes in the GitHub repository. Please `git pull` or `git clone` to get the newest version to your local machine.

While the most crucial thing, which is the contents of the data, is preserved, we summarize the most important changes we have made as follows:
  1. We changed the data format, from CSV to JSON-lines. We also merged all relations into a single file. In addition, we merged the rows by their subject-relation pairs. Please carefully read the Data format section (https://github.com/lm-kbc/dataset#data-format) for how we format the files and how you can read/write them properly.
  2. We replaced “NONE” object-entity with an empty list [], for consistency.
  3. We updated the evaluate.py script to handle cases where ground truth is empty.
  4. We refactored the baseline.py file and the getting started notebook.

We believe the changes make it easier for you to understand the task and start your own solution’s implementation. If you already have code that worked for the old data format, you would need to change your I/O functions to adapt to the new data format (again, the Data format section would be helpful for you).

We are sorry for any inconvenience and hope to see many great solutions from you!

Best regards,
The LM-KBC organizers
Reply all
Reply to author
Forward
0 new messages