question regarding the restricted models' sizes

80 views
Skip to first unread message

Damir Korenčić

unread,
Jun 16, 2025, 8:10:42 AMJun 16
to WMT: Workshop on Machine Translation
Hi,

In the description of the general task it says that the maximum allowed size for a translation model is 20B.

But what if more than one model is used for generating the final translation?

Does the restriction apply to individual models separately, to the sum of their parameters, to the number of parameters at the peak of memory allocation, or something else?


thank you, regards,

Damir

Tom Kocmi

unread,
Jun 16, 2025, 8:32:31 AMJun 16
to wmt-...@googlegroups.com
Hi Damir,

this is a great question and it is hard to easily answer, can you share more details? Here are classical approaches and how we consider them (based on earlier discussion we had). Importantly, describe it in the paper and our system description poll, so we can clearly mark it in findings.

- Mixture-of-experts - the total number of parameters counts, not the active number of parameters (there is also efficiency shared task at WMT for such models)
- model ensemble - sum of all models parameters counts
- smaller model trained off larger model outputs - the number of parameters of the smaller (inference) model counts (not teacher if used solely for training)
- best of N answers or post-editing polishing with the same model - the size of the model counts only once (not how many times it was requested per single translation)
- usage of MT metrics in inference (such as in MBR) - ignore size of the MT metrics models if and only if they are used to provide quality estimation/ranking and do not provide feedback on how to improve the translation. However, if a proprietary model is used for quality estimation (like GPT), the model is automatically unconstrained since it breaks the requirement about your model being published and translations being reproducable

I hope this helps a bit, please, let us know if you are using some alternative setup which isn't easy to categorize into above.

Have a lovely day,
Kocmi
(in Europe, [kotsmi], he/him)


--
You received this message because you are subscribed to the Google Groups "WMT: Workshop on Machine Translation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wmt-tasks+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/wmt-tasks/c213cd40-8936-44e4-b930-8efc872c5e48n%40googlegroups.com.

Damir Korenčić

unread,
Jun 16, 2025, 9:49:50 AMJun 16
to WMT: Workshop on Machine Translation
Hi Tom,

this is a great question and it is hard to easily answer, can you share more details? Here are classical approaches and how we consider them (based on earlier discussion we had).
Importantly, describe it in the paper and our system description poll, so we can clearly mark it in findings.

Of course.
 

- Mixture-of-experts - the total number of parameters counts, not the active number of parameters (there is also efficiency shared task at WMT for such models)
- model ensemble - sum of all models parameters counts
- smaller model trained off larger model outputs - the number of parameters of the smaller (inference) model counts (not teacher if used solely for training)
- best of N answers or post-editing polishing with the same model - the size of the model counts only once (not how many times it was requested per single translation)
- usage of MT metrics in inference (such as in MBR) - ignore size of the MT metrics models if and only if they are used to provide quality estimation/ranking and do not provide feedback on how to improve the translation. However, if a proprietary model is used for quality estimation (like GPT), the model is automatically unconstrained since it breaks the requirement about your model being published and translations being reproducable

I hope this helps a bit, please, let us know if you are using some alternative setup which isn't easy to categorize into above.

Thank you for the elaboration, all is clear to us, regarding the approaches that we consider for now.


best, Damir

Keito Kudo

unread,
Jul 3, 2025, 1:37:48 AMJul 3
to WMT: Workshop on Machine Translation
Hi Tom,

> usage of MT metrics in inference (such as in MBR) - ignore size of the MT metrics models if and only if they are used to provide quality estimation/ranking and do not provide feedback on how to improve the translation. However, if a proprietary model is used for quality estimation (like GPT), the model is automatically unconstrained since it breaks the requirement about your model being published and translations being reproducable

For open models like Qwen3 as well, does this mean that their size would be ignored if they are used within the scope of reranking/scoring? (Even if the intended use case of that model was not limited to translation evaluation alone)

best, Keito Kudo

2025年6月16日月曜日 22:49:50 UTC+9 damir.k...@gmail.com:

Tom Kocmi

unread,
Jul 3, 2025, 5:28:04 AMJul 3
to wmt-...@googlegroups.com
Hi Keito,

Since I'm unable to sync with other organizers to discuss this case at the last minute, I wanted to share my thoughts on this grey area ahead of the deadline tomorrow. What is important is that you clearly describe it in the system description.

  • The rule was primarily designed with existing MT metrics in mind, especially due to MBR considerations. This means that any model used should either be a published MT metric or a general-purpose LLM used with an MT metric-style prompt (e.g., GEMBA)
  • If the LLM scorer is trained by you from scratch for this purpose, then it is not considered an existing MT metric, and in that case, its parameter count needs to be added
  • If the scorer’s parameter count falls within the constrained track limits, I believe it can be accepted under constrained. However, if the scorer exceeds those limits, it would be better to categorize it as unconstrained.

Have a lovely day,
Kocmi
(in Europe, [kotsmi], he/him)

Reply all
Reply to author
Forward
0 new messages