WMT26 GenMT: blindset released

50 views
Skip to first unread message

Kocmi T.

unread,
Jun 19, 2026, 6:47:02 AM (6 days ago) Jun 19
to wmt-...@googlegroups.com
Hi All,

We have just released the blindset for General MT, you can find all details here:
https://www2.statmt.org/wmt26/translation-task.html#submission

We are working on the OCELoT submission system, and we expect to publish it early next week, but all resources for translations are already released to be translated. Deadline is 2nd July and we will not extend it.

Please, reach out if you have questions that are not answered on the webpage. If we discover any issue, we will announce it on our website and share it here.

--
Tom Kocmi (he/him)
Staff Researcher, Europe
ko...@cohere.com

Adam Dobrowolski

unread,
Jun 19, 2026, 8:19:27 AM (5 days ago) Jun 19
to wmt-...@googlegroups.com
Hi Tom,

The file wmt26_genmt_blindset.jsonl contains different directions than previously announced.
All directions have only target language. No source languages specified.
Did the list of directions change?

Regards,
Adam



Below the list of tgt_langs in the json file:
aeb 150
ar_AR 150
arz 150
arz_Arab 198
bel_Cyrl 198
ces_Latn 198
cs 1050
cs_CZ 917
de_AT 319
de_CH 322
de_DE 2149
de_IT 2707
deu_Latn 513
ekk_Latn 198
en 200
en_US 3536
es_ES 917
et_EE 917
fo 345
hin_Deva 6947
hr 1000
hye_Armn 198
ind_Latn 198
is 345
isl_Latn 198
jpn_Jpan 388
kaz_Cyrl 198
ko_KR 1880
kor_Hang 198
lij_Latn 198
lld_Latn 198
mni_Beng 5000
mni_Latn 5000
mni_Mtei 5000
pl_PL 597
ru 1000
ru_RU 917
rus_Cyrl 198
sme_Latn 198
tha_Thai 198
ukr_Cyrl 513
vie_Latn 315
zh_CN 747
zho_Hans 198
zho_Hant_TW 198

--
You received this message because you are subscribed to the Google Groups "WMT: Workshop on Machine Translation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wmt-tasks+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/wmt-tasks/CACW3GxZ-EpuztntpGi4ZyxJwGJECtrwLNDGc0FxpJWfAyP%2B5XQ%40mail.gmail.com.

Kocmi T.

unread,
Jun 19, 2026, 8:31:06 AM (5 days ago) Jun 19
to WMT: Workshop on Machine Translation
Hi Adam,

thank you for the question, the list of GenMT didn't change, however as every year blindset contains also test suites, this year we allowed test-suite participants to include also unsupported languages because multilingual systems are taking over bilingual systems.
You can choose not to translate these languages, however, we'd encourage everyone who builds multilingual systems to translate them, unsupported languages won't affect system ranking and in the model card poll you can list the supported languages which same as last year we will highlight in the findings.

One additional comment, while it may look like additional languages (such as mni_Beng) are much larger than the GenMT languages, it is because those are sentence level, while GeneralMT data are document level. In reality, all test-suites together are just a bit larger than whole GenMT blindset in terms of word count.

Have a lovely day,
Kocmi
Reply all
Reply to author
Forward
0 new messages