WMT: Blindset released (General MT, Multilingual Instuction)

264 views
Skip to first unread message

Kocmi T.

unread,
Jun 20, 2025, 9:22:58 AMJun 20
to wmt-...@googlegroups.com
Hello,

I am excited to share that we have released blindsets for General MT and Multilingual Instruction shared tasks. However, we currently struggle with updating the webpage, so we share the details temporarily here, so you can start processing them. Deadline to submit outputs into OCELoT is on 3rd July (more details will be shared soon on webpage).

General MT:
You can find the blindset here
https://drive.google.com/file/d/1-ZFcb9pcpTv10NyMgUqcusU1ILR9ked0/view?usp=sharing 
And the multimodal resources are 
 https://drive.google.com/file/d/1BngVipV1cszYzDjy_qjneS2lf4jTl7Zb/view?usp=sharing
  • Blindset is Jsonlines formatted input with several metadata. The core fields are `tgt_lang` and `src_text`. It also contains `prompt_instruction` for LLMs that you may use as it is adjusted for each domain.
  • Blindset is document-level with paragraphs separated by double new lines. You may segment it in any way, however, the translation must contain the same number of paragraphs (separated by a double new line). For this, we share with you verification script in the zip file
  • Speech domain textual information is collected by automatic speech recognition process, multimodal systems may benefit from the original video shared in multimodal resources
  • Social domain multimodal part contain original printscreens from Mastodon containing pictures
  • Although you can participate in only a subset of languages, we strongly encourage you to translate all languages, even those not supported by your system, and mark in the OCELoT poll, which languages you focussed on and which are only contrastive and we'll clearly mark it in findings.

Multilingual Instruction:
Collect outputs of all prompts in the blind testsets: https://drive.google.com/file/d/1nVzFbJ3DxYCBHuefr5cLnxuXEf2vqqa1/view?usp=sharing
  • Although the blindset contains Machine Translation data from General MT, they are split on paragraph-level. If you are building a primarily document-level MT system, we encourage you to submit separately to General MT shared task as well.

Have a lovely day,
on behalf of all organizers Tom Kocmi

zhang...@gmail.com

unread,
Jun 25, 2025, 2:30:02 AMJun 25
to WMT: Workshop on Machine Translation
Hello,

Just to confirm in the final submission, whether we need to keep the "\n" in between each sentences inside one paragraph?
e.g.
Format 1:
             Rink Rats\nChapter 1: First Day\nKyle looked at his ... \n\n ...
Format 2:
             Rink Rats Chapter 1: First Day Kyle looked at his ... \n\n ...

Thanks.

Tom Kocmi

unread,
Jun 25, 2025, 4:36:05 AMJun 25
to wmt-...@googlegroups.com
Hi Zhang,

This is a great question, the quick answer is that you do NOT have to, only double lines are critical. Although, sometimes they are needed from the context. Here is extended answer with reasons and what is going on behind and why sometimes it may be useful:

  • Human annotators will be assigning ESA scores for each paragraph separately (while seeing the full document), thus double lines are important for correct alignment with sources. This also means your translation may contain a different number of sentences and it will be up to humans to judge correctly.
  • Preliminary automatic evaluation which we use to select systems for human evaluation will be run on paragraph level for metrics/judges who can't fit the whole document into the context window.
  • There are situations where single newlines are critical because of the context, here the key piece is the dialogue domain, where each line contains a separate user. It is the same situation with your example, as a human annotator, I would penalize a system which merges all sentences in a paragraph together (in your example the merge creates a critical error, however, if a system would produce a dot, then it would be more acceptable, yet humans will do the final decision).

On a separate topic: We are finishing the OCELoT submission system today, it is here: https://ocelot-wmt.azurewebsites.net/ ... more information will be released on WMT webpage later today

Have a lovely day,
Kocmi
(in Europe, [kotsmi], he/him)


--
You received this message because you are subscribed to the Google Groups "WMT: Workshop on Machine Translation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wmt-tasks+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/wmt-tasks/d57b4a87-f197-4c83-9239-acaef3622054n%40googlegroups.com.

Seth A.

unread,
Jun 28, 2025, 8:35:47 AMJun 28
to wmt-...@googlegroups.com
Hello,
In the general MT blindset, there are both 'general' and 'testsuites' collection_ids. Is it mandatory to also submit translations for testsuites examples to participate in the General shared task?
Many thanks,
Seth

--
You received this message because you are subscribed to the Google Groups "WMT: Workshop on Machine Translation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wmt-tasks+...@googlegroups.com.

Tom Kocmi

unread,
Jun 28, 2025, 8:42:12 AMJun 28
to wmt-...@googlegroups.com
Hi Seth,

thank you for the question, yes, it is mandatory to translate everything. The size of the testsuites is much smaller (in terms of words to translate) than it looks like since they are sentence-level, in contrast to document-level general mt.

You are allowed to submit a subset of languages (translating everything for each of them), although we would encourage participants to translate even languages their system doesn't support and mark it in the OCELoT poll after submission as those are valuable resources. We will clearly mark those in findings and you may decide not to discuss those contrastive languages in your system paper.

Have a lovely day,
Kocmi
(in Europe, [kotsmi], he/him)

Ada Wan

unread,
Jul 1, 2025, 2:10:36 PMJul 1
to wmt-...@googlegroups.com, kocm...@gmail.com
Dear Tom, dear all at WMT (workshop organizers as well as participants)


A general briefing:

Please be officially notified of my findings from 2019 on (see https://sites.google.com/view/adawan), esp. "Representation and Bias", "Fairness in Representation", and my posts on X (formerly Twitter, @adawan919). I'd appreciate it if you'd please read carefully and reflect more on my findings and results. There is neither CL (computational linguistics) or NLP ("Natural Language Processing") that is possible/ethical/legal. Statistics is the driver, not textual values/meaning (esp. in the context of computing). The name "NLP" was also much of a misnomer, esp. now that "language complexity" has been resolved, "language" decomposed and generalized, processing has been, or realized to have been, automated. "Language technology" can now be just "technology" (because there is no need for "language" in tech and things work in tech not because of "language" (but everything else that I named specifically)). 
If one has studied or worked in the "language space", including but not limited to linguistics, computational linguistics, and NLP, chances are that one has been miseducated. Please think about the students who were or any new practitioners who could be misinformed or become too attached to a certain "'specialisation'/'track' identity" that is redundant and incorrect, and the consequences thereof.

Ask not what there is left to do for a direction that ought to be discontinued, ask who, which students might fall victim to the teaching of such if it were to continue.
Academic disciplines or research topics/foci don't have to last forever. Things get solved and resolved in tech.

There are reasons scientific, academic, technical, ethical, and legal for the retirement of all "language" endeavours (esp. ones in the context of computing) (see also point #7 below).
I hereby ask you to please cancel this event and call immediately.
I hereby also ask you to please stop corroborating/(co-)developing a narrative in the name of "language" that is irrelevant in computing. Otherwise, you show yourself to be, inter alia, intentional in the manipulation of "language", political and/or inappropriate sentiments with your use of technology. There should also be no further hiring in the areas related to "language".

If you should have any questions regarding my requests, or if there is anything that you do not understand, including but not limited to my notification/message to you here or the conclusions/implications of my work, please do not hesitate to contact me in writing via email AND on X (x.com) at @adawan919 within 3 business days, i.e. by 2359 on Friday, 04Jul2025 (ZRH time) for this case. If you should require an extension, please contact me immediately and no later than said deadline. Your lack of reply will be understood as your having understood the conclusions and implications of my work and my requests to you. I will confirm receipt of your reply immediately (within at most 48 hours, Monday-Friday). If you do not see my reply, please try re-sending and/or re-posting until I do.

Especially applicable for this event/initiative:
  1. Machine Translation (MT) is solved. "Hardness"/Difficulty in modeling correlates with sequence length and not anything "intrinsic" to any particular style/brand of "language" (and should researchers not be convinced, their own unresolved subjective issues with "language", or their "image"/"impression" of a particular style, might play a role --- but that should not be a research project for students). 
  2. MT is solved. It comes down to data statistics wrt algorithm, in certain suboptimal settings, length (and vocab) is/are found to correlate with hardness. The task is as simple as data in and data out --- it does not have to do with "(particular) languages". One can pass data in different styles in and get "style transfer". The possibility/success of the task of "style transfer" should already be a sign that our algorithms have generalization capability (and MT solved). My work, in resolving "language complexity", in showing that there are no significant differences between "(particular) languages", is an explicit proof of the generalization capability of our algorithms nowadays and of MT being solved and resolved (that there are no "langauges" and that MT has been reduced to a matter of statistics).
  3. Regarding segmentation and evaluation: please refer to my work, in particular Section 2 under "Fair information-theoretic evaluation metric" in 'Fairness in Representation', e.g. ICLR2022 version at https://openreview.net/pdf?id=-llS6TiOew:
    "we find that it is not necessary to assign a perspective that is centered on any one particular language, when we can evaluate simply by the total number of bits for a larger portion of texts/sequences. This can be a fairer, more general and flexible way of evaluating data that has not been or cannot be perfectly segmented or aligned line by line. We hence used instead unnormalized PP, i.e. the total number of bits needed to encode the dev set..."
    If/When computation memory and setup allow, one can pass in data in larger segments, hence obviating the need to align too precisely. (That having been expressed, evaluating alignment accuracy can be a philological/academic exercise (but not a job!).) Please reread my findings again with care. There are many details that have been solved/explained (including some in the rebuttal for the ICLR2022 version), the importance of which might have been overlooked.
  4. Again, MT is solved, and the proper thing to do would be to officially recognize my results, not to hide from them or ignore them. Please do not "fight" me, I am not your enemy. Many people in other sciences (those which do not exploit emotions) are able to just move on from potential collective oversight and/or technological progress, self-correct, start practicing the right way, keep advancing --- e.g. one would now train models with data with diverse statistical profiles instead (not "different" from a "linguistic"/philological point of view), stop interpreting models in manners that are not centered in computation and/or statistics....
    I'd be grateful and honored if, in addition to canceling this event, the WMT community would formally recognize and celebrate my results and achievement. That would also be the right and respectful thing to do.
  5. There is no more NLP, any "text-centric" computational modeling in connection with "words", "sentences", "meaning"/semantics/linguistics etc. that should be treated as real applications. (This, in part, follows from my results from 2019 on.) The closest to it could be (generalized) data science (as in, for explanation, clarification, evaluation, and interpretation) and statistical modeling (but such may be irrelevant to applications). But note that these are not the same as NLP. Claims that do not hold in another representation (e.g. standardized byte) should not be taken seriously, or as valid/true/universal/scientific. *Everything has been solved/resolved, clarified in my work (including the rebuttal for 'Fairness in Representation' (ICLR 2022) and 'Representation and Bias').
  6. Please clarify if "quality" would/might involve "grammar" (which is irrelevant, unnecessary, and unethical) in your call/work.
  7. "Reasons scientific, academic, technical, ethical, and legal for the retirement of all 'language' endeavours" apply also to "research" with "LLMs" that cannot be carried out in an honest and transparent manner, e.g. with available input data and data profile/statistics, and evaluation and interpretation with respect to such.
  8. It would be apt to ask about the nature of WMT:
    a. Please state explicitly whether this is an academic/scientific or commercial/industry event, to what extent it is being funded publicly and privately (as in, via commercial sponsorship), as well as any conflict of interest. The concern here is that you are / might be doing commercial research under the hood of academic research, with academic titles and affiliation, and/or promoting some research/education initiatives/directions which are in violation of principles of research integrity --- e.g. lack of honesty/transparency, respect (please do cite my work should you find it insightful, and if you should find it not insightful, please explain why).
    b. If this event were a private assembly in the name of "language" or "technology", or some philological entertainment (i.e. not on public funding and not for science/technology/engineering/education), please ensure that this does not lead to any sentiment manipulation/provocation. Please also note the unethical nature of "language". Please make sure that you state explicitly the nature/orientation of your event and refrain from using your official titles or professional/professorial affiliations when hosting/participating in such event.
    c. Please state any commercial and non-commercial "LLMs" that would be used as objects of investigation. Please clarify if these are completely based on open-sourced data with data specifications and statistics readily available.
    d. Please report everything honestly and transparently, and have non-misleading calls for participation.
  9. Even though there might be ways to reformulate your tasks such that they would adhere to research guidelines (e.g. by evaluating all models with transparent data and statistics), it would be best to cancel this event lest more students or public audience be miseducated/misled, as you (as well as I myself at one point) were.
  10. Many in linguistics, CL, NLP and/or ML/CS may not have thought through the irrelevance/redundancy of "language" enough because "language" has been an implicit assumption of their field/foci/specialty. But "language" has been generalized. Things in what used to be referred to as "language" have been solved and/or deemed indeterminate. Aspects that are more general do not require the mentioning of "language". By doing "language", one is / can be implicitly promoting "grammar", which is often an excuse for philologists/grammarians/linguists to infringe on or corrupt the tech space, and/or abuse its funding. So please don't. For those who work(ed) and publish(ed) honestly: before my findings, MT wasn't or might not have been fraud. But after my findings, it is; likewise with LLMs without clear data/statistics (esp. when one is doing science, working in education and/or public research, or using public funding).

To Administrators/Authorities:

Please note that there is currently rampant fraud (and possibly corruption) going on with "language", including but not limited to waste, fraud, and abuse in university disciplines/foci of linguistics, computational linguistics, NLP (Natural Language Processing), and some areas of ML (machine learning) and CS (computer science). Please report such to higher authorities should activities in these areas continue to be hosted. Disciplines/foci relating to "language" ought to be discontinued.


This post will also be posted on my X account (@adawan919). One should also post one's reply to me over X (x.com).
Please reply to all via email with comment/notification/post on X.

Thank you.

Reply all
Reply to author
Forward
0 new messages