Terminology task: formatting (newlines)

Svanhvít Lilja Ingólfsdóttir

unread,

Jul 4, 2025, 9:14:33 AM7/4/25

to WMT: Workshop on Machine Translation

Hello! We're working on our submission to the Terminology task.

Are the newlines instructions as strict for the Terminology task as for the General MT task?The method used for solving the issue for the General task fails in some cases for the Terminology dataset, so we wonder if we need the newlines to be 100% consistent or if it's lower in priority in this task.

All the best,

Svanhvít

Tom Kocmi

unread,

Jul 4, 2025, 9:25:43 AM7/4/25

to wmt-...@googlegroups.com

Hi Svanhvit,

Each shared task has their own rules and deadlines, best to refer to the task webpage or organizers directly. The double newlines is likely only specific for General MT where we plan to test long context.

Have a lovely day,

Kocmi

(in Europe, [kotsmi], he/him)

--
You received this message because you are subscribed to the Google Groups "WMT: Workshop on Machine Translation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wmt-tasks+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/wmt-tasks/bac0aef7-9cf0-4d2d-9ff9-4ebed260d647n%40googlegroups.com.

Ada Wan

unread,

Jul 5, 2025, 3:45:19 AM7/5/25

to wmt-...@googlegroups.com

Then why not just translate by document and generalize all tasks with the same guidelines?

(And evaluate by dataset length, vocabulary by perplexity (keeping in mind that 0, the best outcome, is 0 in perplexity).)

See:

Ada Wan. 2022. Fairness in representation for multilingual NLP: Insights from controlled experiments on conditional language modeling. In International Conference on Learning Representations (ICLR), 2022. https://openreview.net/forum?id=-llS6TiOew

Ada Wan. 2021. Representation and bias in multilingual NLP: Insights from controlled experiments on conditional language modeling. https://openreview.net/forum?id=dKwmCtp6YI

To view this discussion visit https://groups.google.com/d/msgid/wmt-tasks/CAG7yQs2s4pZP%3DaheyccTQDG4VpTZv%3DEkvH3wn0gOxrZa2Xwk6g%40mail.gmail.com.

semeno...@gmail.com

unread,

Jul 5, 2025, 5:46:09 PM7/5/25

to WMT: Workshop on Machine Translation

Dear Svanhvít,

Thank you for your question! In general, we are having the guidelines independent from the general MT task, you can find them here (the input data and the output format descriptions), as well as the full guidelines for submission here.

Related to your question specifically: no, we do not have such requirements about the newlines, because the outputs are submitted in the JSONL format and a single segment is embedded unambiguously as a value of a source or target key.

Feel free to reach out if you have more questions!

Best,

Kirill Semenov,

UZH

Svanhvít Lilja Ingólfsdóttir

unread,

Jul 6, 2025, 6:19:50 PM7/6/25

to WMT: Workshop on Machine Translation

Dear Kirill, thank you for the information, that's good to know.

Best regards,

Svanhvít

Reply all

Reply to author

Forward