Clarification Regarding Task A Dataset

44 views

Skip to first unread message

Xinyi Zhao

unread,

Jun 4, 2025, 5:36:27 AM6/4/25

to LLMs4OL Challenge

Dear LLMs4OL Organizing Team,

While analyzing the dataset for Task A (Text2Onto), I noticed that many terms listed in terms.txt do not appear in the provided documents within documents.jsonl.

Could you kindly confirm if my understanding is correct?

Is terms.txt intended to list all possible terms that could form an ontology within a given domain, regardless of their presence in the provided documents.jsonl?
Is terms2docs.json the direct output of term extraction from documents.jsonl, and should it be used to determine which terms are actually grounded in the current document set?

Thank you in advance for your clarification and for organizing this challenge.

Best regards,

Xinyi Zhao

Nandana Mihindukulasooriya

unread,

Jun 5, 2025, 12:34:06 PM6/5/25

to LLMs4OL Challenge

Dear Xinyi Zhao,

Thanks for your interest in the challenge.

(1) The term extraction can be both extractive (the term appears exactly as is in the text) or abstractive (the term is referred to, maybe with slightly different linguistic variations). But can you please provide a sample of terms you identified, and I would be happy to have a look to make sure that is the case.

(2) From a task point of view, the input will be a `documents.jsonl` file with a corpus of documents and the expected output that will be evaluated is `terms.txt` and `types.txt`. The other documents are mostly intermediate documents that you can use during the training phase to improve your system. You can use them to see if the extracted terms/types were correctly grounded in the document or any other purpose based on your system.

Does that answer your questions? Happy to follow up with further clarifications if needed.

Best Regards,
Nandana

Reply all

Reply to author

Forward

0 new messages