Question Regarding the License of System Outputs

148 views
Skip to first unread message

Keito Kudo

unread,
Jun 27, 2025, 4:47:55 AMJun 27
to WMT: Workshop on Machine Translation
Hi,

The importance of licenses for large language models (LLMs) and their generated content has been growing recently. In the WMT general translation task, what license will be applied at release to each system's generated output and the corresponding human evaluation results?
Additionally, is it possible for each participating system to specify its own individual license?

Thank you, and best regards,
Keito Kudo

Philipp Koehn

unread,
Jun 27, 2025, 9:45:02 AMJun 27
to wmt-...@googlegroups.com
Hi,

we do not enforce any official formal license but we expect that 
the translations can be openly shared and used for follow-up research 
purposes.

Regards,
Philipp

--
You received this message because you are subscribed to the Google Groups "WMT: Workshop on Machine Translation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wmt-tasks+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/wmt-tasks/a803af0c-3212-4b1e-b325-881373e502c2n%40googlegroups.com.
Message has been deleted
Message has been deleted

Keito Kudo

unread,
Jul 1, 2025, 10:20:18 AMJul 1
to WMT: Workshop on Machine Translation

Hi Philipp,


Thank you for your prompt reply (and apologies if I accidentally reposted this).

If it does not hinder follow-up research, would it be acceptable for each team to specify its own license for its system output (e.g., in the system-description paper)?

The open LLM our team intends to use imposes no restrictions on academic research, but it does require that any model trained on its generated content inherit part of the original model’s name.


Best regards,

Keito Kudo

2025年6月27日金曜日 22:45:02 UTC+9 Philipp Koehn:

Keito Kudo

unread,
Jul 1, 2025, 10:20:20 AMJul 1
to WMT: Workshop on Machine Translation

Hi Philipp,

Thank you for your reply.

If it does not hinder follow-up research, would it be acceptable for each team to specify its own license for its system output (e.g., in the system-description paper)?
The open LLM our team intends to use imposes no restrictions on academic research, but it does require that any model trained on its generated content inherit part of the original model’s name.

Best regards,

Keito Kudo


2025年6月27日金曜日 22:45:02 UTC+9 Philipp Koehn:
Hi,

Tom Kocmi

unread,
Jul 3, 2025, 5:33:16 AMJul 3
to wmt-...@googlegroups.com
Hi Keito,

this is not a direct answer for your question but a parallel one that may help. For the constrained track, published model weights can have such limits and you are in control of the license wording. The restriction which we are placing is that your model can be used for research and replication purposes, so adding the naming acknowledgement is fine.

Have a lovely day,
Kocmi
(in Europe, [kotsmi], he/him)


Ada Wan

unread,
Jul 5, 2025, 3:47:12 AMJul 5
to wmt-...@googlegroups.com
For a direct answer, one may like to seek legal guidance. 

Philipp Koehn

unread,
Jul 11, 2025, 5:39:40 AMJul 11
to wmt-...@googlegroups.com
Hi Keito,

here is the concern we have with such licensing terms:

With the human evaluation we do on top of the system outputs, the resulting dataset
has become a useful resource for methods such as preference training. Since this
is becoming one of the core steps of LLM alignment, we would like the WMT datasets
to be as useful as possible. However, if using this year's dataset would require any
system builder to rename their resulting model, they will likely filter out your system's 
output. This reduces the value of the dataset (including wasting our costly human 
evaluation work) and makes using it more cumbersome.

Regards,
Philipp

Reply all
Reply to author
Forward
0 new messages