First Call for Participation: LLMs with Limited Resources WMT2025 Shared Task

32 views
Skip to first unread message

Marion Di Marco

unread,
Apr 9, 2025, 8:37:14 AMApr 9
to wmt-...@googlegroups.com

First Call for Participation: LLMs with Limited Resources WMT2025 Shared Task


LLMs with Limited Resources for Slavic Languages @ WMT2025 @ EMNLP2025


Website: https://www2.statmt.org/wmt25/limited-resources-slavic-llm.html

Join our Google Group! https://groups.google.com/g/slavic-llms-mt2025 

HuggingFace Collection: https://huggingface.co/collections/tum-nlp/llms-for-slavic-languages-67f3993bf057be6a8d6665ab

This shared task explores how LLMs perform on MT and QA jointly, aiming to investigate task synergy under limited data and compute resources. Ukrainian (uk) is a mid-resource language (~40M L1 speakers), while Upper Sorbian (hsb) and Lower Sorbian (dsb) are minority West Slavic languages (30k and 7k L1 speakers, respectively) spoken in Germany.

Data Overview

Ukrainian

Upper Sorbian & Lower Sorbian (two separate tracks)

  • MT directions: de→hsb, de→dsb

  • QA: Multiple-choice questions based on actual CEFR-based language certification exams (A1–C1 levels)

  • We will prepare the following resources:

    • Parallel & monolingual corpora via Witaj-Sprachzentrum and Leipzig Corpora Collection;

    • Previous WMT low-resource tracks (2020–2022);

    • QA task adapted from language certifications of different levels.

Submission Guidelines

  • Models must produce both MT & QA outputs for the chosen language(s);

  • Submissions are language-specific; submit to one or multiple language tracks;

  • Participants can only use one of the following base models that are restricted to 3B parameters maximum:

Key Dates (AoE)

  • Registration opens now!: Join our Google group https://groups.google.com/g/slavic-llms-mt2025 

  • Training/dev data release: Late April

  • Test data release: Late June

  • Submission deadline: Early July

  • System description deadline: Late July

  • Final workshop: 5-9th November @ EMNLP 2025 in Suzhou, China!

Organisers

TUM Heilbronn:

Daryna Dementieva
Marion di Marco
Lukas Edman
Alexander Fraser
Kathy Hämmerl
Shu Okabe

Witaj-Sprachzentrum:

Beate Brězan,
Anita Hendrichowa
Marko Měškank
Tomaš Šołta

Acknowledgements

We thank the UNLP 2024 Shared Task team (Roman Kyslyi, Mariana Romanyshyn, Oleksiy Syvokon) for kindly sharing Ukrainian QA resources.


Reply all
Reply to author
Forward
0 new messages