GenMT: deadline extended by 6 hours

Kocmi T.

unread,

Jul 4, 2025, 8:59:51 AM7/4/25

to wmt-...@googlegroups.com

Hi All,

several teams struggled with submission format not running the verification script in advance. We extended the deadline by 6 hours to allow everyone to participate. Two common issues:

- number of double new lines "\n\n" MUST be identical to the source text, this is a paragraph break which we will use during evaluation as alignment

- for each participating language, your system MUST translate every single line in that language pair including all domains and testsuites.

If you run into any issues past this deadline or you went to sleep after failing to submit (for our colleagues from Asia), write us asap on ko...@cohere.com to see if we can allow for extension on an individual basis.

Thank you all for participating!

Tom Kocmi

Di Wu

unread,

Jul 4, 2025, 10:22:48 AM7/4/25

to wmt-...@googlegroups.com

Hi Tom,

Thanks for your efforts.

We submitted our .jsonl. It passed the verification script, and we double checked \n\n is aligned. However, we found misalignments when opening it in the OCELoT system, see screenshot below.

Can we suppose it is some issue with the OCELoT? In other words, does the verification script strictly ensure that every accepted JSONL file meets the specified requirements?

Best,

Di

'Kocmi T.' via WMT: Workshop on Machine Translation <wmt-...@googlegroups.com> 于2025年7月4日周五 14:59写道：

--
You received this message because you are subscribed to the Google Groups "WMT: Workshop on Machine Translation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wmt-tasks+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/wmt-tasks/CACW3GxZC9ibqRPnY4PSXDBDSgt%2BD1h2c1M43Fr79k%3DF3iJ_PtA%40mail.gmail.com.

--

TEL: +86-13070100521

Home Page: moore3930.github.io

David Vilar

unread,

Jul 4, 2025, 10:35:30 AM7/4/25

to wmt-...@googlegroups.com

I can confirm that we see a similar misalignment with our submission. Does OCELoT split at (single) newlines? As this may change between source and hypothesis, it may cause this misalignment.

Cheers,

David

To view this discussion visit https://groups.google.com/d/msgid/wmt-tasks/CAJTVwvmxNiXni%3DAcxHnWuntGN7n%2BSRFECzqCdbpg9RFBp2Cn%3DQ%40mail.gmail.com.

Roman Grundkiewicz

unread,

Jul 4, 2025, 10:35:36 AM7/4/25

to wmt-...@googlegroups.com

Hi,

That should be okay if you submit only for a subset of languages as per the warning box at the top of the screen. Just make sure your submission passes the external python verification script.

Roman

To view this discussion visit https://groups.google.com/d/msgid/wmt-tasks/CAJTVwvmxNiXni%3DAcxHnWuntGN7n%2BSRFECzqCdbpg9RFBp2Cn%3DQ%40mail.gmail.com.

Roman Grundkiewicz

unread,

Jul 4, 2025, 10:43:21 AM7/4/25

to wmt-...@googlegroups.com

Correct, for GenMT, it does split at every newline character, that's why there are empty rows as well. I didn't want to fix it after we created the competitions in OCELoT.

Roman

To view this discussion visit https://groups.google.com/d/msgid/wmt-tasks/CAO0hRLbbkTDUHfUaTgHtT7bue4g3_zv1CVhq-SE_K5ECyxjuww%40mail.gmail.com.

Di Wu

unread,

Jul 4, 2025, 10:48:58 AM7/4/25

to wmt-...@googlegroups.com

Great, thanks for your clarification.

Roman Grundkiewicz <rgrund...@gmail.com> 于2025年7月4日周五 16:43写道：

To view this discussion visit https://groups.google.com/d/msgid/wmt-tasks/CAJPsrT-V4brb7MkrEX-VGsQaUwm_siQ%2BNCB1e6HHAM4ZeecBog%40mail.gmail.com.

Ada Wan

unread,

Jul 5, 2025, 3:45:02 AM7/5/25

to wmt-...@googlegroups.com

Then why not just translate by document and generalize all tasks with the same guidelines?

(And evaluate by dataset length, vocabulary by perplexity (keeping in mind that 0, the best outcome, is 0 in perplexity).)

See:

Ada Wan. 2022. Fairness in representation for multilingual NLP: Insights from controlled experiments on conditional language modeling. In International Conference on Learning Representations (ICLR), 2022. https://openreview.net/forum?id=-llS6TiOew

Ada Wan. 2021. Representation and bias in multilingual NLP: Insights from controlled experiments on conditional language modeling. https://openreview.net/forum?id=dKwmCtp6YI

To view this discussion visit https://groups.google.com/d/msgid/wmt-tasks/CAJTVwvnftk9ZJJVjkCwYmyNxC4gBQQQC-FLcNvbdcGuGhq5aNg%40mail.gmail.com.

Reply all

Reply to author

Forward