Re: Questions about WMT26 General MT and MT Test Suites

18 views
Skip to first unread message

Kocmi T.

unread,
May 14, 2026, 5:21:39 AM (yesterday) May 14
to xiat0210, wmt-...@googlegroups.com
Hi Xian,

Those are great questions, to clarify publicly, I am adding a whole mailing list. 

> whether the General MT task may include additional translation instructions that substantially modify the format or presentation of the source text.

Short answer is no, the General MT is testing machine translation capabilities only, non-instruction systems can easily compete as in the past. We are only adding context to instructions which if followed will improved the human scores (annotators will see the same instructions and will be asked systems who failed to follow them as if it would be one of the translation errors)
Here is couple of areas that the additional instructions can cover (you can imagine reverse instructions as well): 
  • request to change the tone ("translate following social media post in informal tone")
  • keep HTML tags intact, output valid JSON
  • do not translate hashtags, placeholders, JSON keys...
  • do not reproduce errors in the source text
> segment formating

Generally, good translation should maintain the original style/segmentation (we won't ask to reformat the source). While you can merge lines, annotators may not like it. 
In addition, we need a segmentation into segments for human evaluation. Last year we used double new lines (single new line didn't affect anything), we will continue to do the same. In addition, HTML tags such as <br> or <p> will also be used for the segment parsing.
What is expected to be kept will again be written in the prompt instructions to help instruction following systems.

> is there any possibility that some of its samples could be selected or adapted for the General MT test set, or are the two tracks completely separate in terms of test-set construction?

If you submit Test Suites, all your samples will be part of General MT and all GenMT participants will be required to translate it for your analysis. However, for human evaluation of GenMT, we will not be using test suites samples, instead we are building blindset as every year. This is the same setup as we had for years, what we will attempt this year is increasing visibility of test suites (and take insights from test suites, for example if top systems will struggle on some test suite areas). 

Let me know if you have other questions,
Have a lovely day,
Kocmi

On Thu, May 14, 2026 at 9:00 AM xiat0210 <xiat...@qq.com> wrote:

Dear Mr. Kocmi,

I hope you are doing well. I am participating in WMT26, and I would first like to thank you for your previous help and clarification. I have made some progress in preparing my system, but I still have a few questions about the General MT task and the MT Test Suites.

First, I would like to ask whether the General MT task may include additional translation instructions that substantially modify the format or presentation of the source text. For example, could an instruction ask the system to turn a fluent social-media movie review into a ranked recommendation list? In other words, should I understand the additional translation instructions as being unrestricted in principle, as long as they are executable and still grounded in the source text—for example, even when they require rewriting or reorganizing the source content into the target language?

Second, I noticed that in previous years, paragraphs in the source text seemed to be separated by \n, while this year the source paragraphs appear to be separated by \n\n. Should the target-side output be rendered in a Markdown-like format? More specifically, when the source contains \n, should this line break generally be preserved as \n in the translation, or should it be normalized into a space unless otherwise instructed?

Third, I would like to better understand the relationship between the MT Test Suites and the General MT task. If I submit an MT Test Suite that I designed, is there any possibility that some of its samples could be selected or adapted for the General MT test set, or are the two tracks completely separate in terms of test-set construction?

Thank you very much for your time and guidance. Your clarification would be very helpful for aligning our system design and data preparation with the shared task expectations.

Best regards,
Xia Tian

Reply all
Reply to author
Forward
0 new messages