Questions related to KG Challenge

61 views

Skip to first unread message

Debanjali Biswas

unread,

Jun 23, 2023, 6:58:22 AM6/23/23

to lm-kbc2023

Hi,

We are interested in participating in Track 1 of the challenge. We have some questions regarding the guidelines:

Are we limited to using only the Wikidata API for entity disambiguation, or is entity disambiguation an additional task in this track?
Are we permitted to request additional information from the language model? For example, can we ask for the definition of a "capital city" by using the prompt "Definition of capital city is..."? Can we inquire about the domain knowledge of subject and object entities using prompts?
Are we allowed to fine-tune models using secondary datasets like Wikidata5M
Are models like Luke allowed in this challenge?

We eagerly await your response. Thank you in advance.

Regards,
Debanjali
(GESIS)

j.c....@vu.nl

unread,

Jun 23, 2023, 7:54:17 AM6/23/23

to lm-kbc2023

Thanks for your interest in the LM-KBC challenge. I hope this answers your questions sufficiently. If you have any further questions, don’t hesitate, to ask us again.

You are not limited to the Wikidata API. The entity disambiguation can be seen as part of the tasks in both tracks. As a baseline method, you could just use the Wikidata API. For a higher F1 score, you might try to develop your own disambiguation method.
Yes, this is allowed.
This is a bit more difficult discussion. In your case, we believe that fine-tuning on Wikidata5m might lead to data leakage, hence, the usage for the fact prediction component is not allowed. The same holds for other WIkidata-based datasets. However, it would be okay to fine-tune your component on some text-based datasets. However, it would be okay to use Wikidata5m or another Wikidata-based dataset for the entity disambiguation method.
Yes, using these kinds of models is allowed! The only important point to keep in mind is that your model should not be trained on Wikidata.

As a general guideline, we would recommend to self-assess whether the usage of a certain model or dataset is fair or not. If in doubt, just drop us another message.