Expected submission deliverable/format

Samuel Larkin

unread,

Jun 2, 2025, 8:32:35 AM6/2/25

to LLMs with Limited Resources for Slavic Languages 2025

Hi,

we would like some clarification on what is expected to be delivered for a submission. Are we expected to submit translation files and answer files or a model? If we are expected to submit translation files and answer files, what is their expected format (plain text, jsonl, ...)?

The confusion stems from the fact that lm-eval has some hardcoded test sets and all we have to do locally, is to give it a model's path to evaluate. In this case we don't have to care about the format.

On the task's website it is said: "In the test phase, we will release closed test sets for all tasks." which points to a translation/answer files submission format.

Is the expected submission the output of each participant running `lm-eval` locally with their model with new task definitions which would use the test data?

Thanks

Kathy Haemmerl

unread,

Jun 10, 2025, 10:24:07 AM6/10/25

to slavic-ll...@googlegroups.com

Hi Samuel,

Thank you for your question. We've decided on a format for the QA task; it will be JSONL files with each line looking something like this:

{"doc_id": 0, "question_id": "A1.1.H01", "pred": 1}

This is an example from the development set for Sorbian.

There should be one output file per test set input file. The required fields are 'doc_id' (int), 'question_id' (str) for Sorbian, and 'pred' (int). The Ukrainian QA doesn't have the 'question_id' field so this can be None. For Sorbian QA the 'question_id' is the uniquely identifying one even across different files. You may add other fields to your output, e.g., the question and possible answers for manual inspection, but we will evaluate based on the required fields.

For MT, we'll be coordinating with the General shared task organisers about the expected format.

We additionally plan to provide a helper script for converting the lm-evalharness output to the expected submission format.

Best regards,
Kathy Hämmerl for the Organizers

Samuel Larkin

unread,

Jun 25, 2025, 2:51:12 PM6/25/25

to LLMs with Limited Resources for Slavic Languages 2025

Just to confirm, the `lm-eval` helper script is just for convenience, correct? That is, as long as we are following all the requirements (one model per language pair for QA+MT, etc.) and as long as our output is in the correct format, we can either produce that with `lm-eval` or with our own scripts?

Daryna Dementieva

unread,

Jun 26, 2025, 6:43:06 AM6/26/25

to LLMs with Limited Resources for Slavic Languages 2025

Dear Samuel,

Yes, you are absolutely right. We provide the scripts just as baselines, but you are free to use them, edit them, or design fully your own as far as it helps you to create a submission of the desired format.

Best,

Daryna

среда, 25 июня 2025 г. в 20:51:12 UTC+2, samuel...@gmail.com:

Reply all

Reply to author

Forward