Terminology task: terms with multiple translations

69 views

Skip to first unread message

Damir Korenčić

unread,

Jul 10, 2025, 5:56:27 AMJul 10

to WMT: Workshop on Machine Translation

Dear Organizers,

We have some doubts about the semantics of the terms' translation lists in Track 2.

The document given with the dataset states "we allow multiple translations (mostly due to having both a full translation and abbreviation". This seems to imply that the translations given in the list are equal and that one just has to choose one and use it consistently. However, the dataset contains a significant number of terms with semantically different translations given for a single term. For example:

進 : ['enter', 'go', 'proceed', 'capitalising in', 'leveraging on', 'advance', 'promote', 'further']

月 : ['month', 'April', 'June', 'May', 'July', 'August', 'September', 'October', 'November', 'December']

秒: ['second(s)']

Some translation come with clarifications:

催繳 ['uncalled', 'unpaid (shares)']

會 ['will', 'Association (not present, only as a verb or auxiliary)', 'Committee', 'committees', 'meeting(s)', 'meeting', 'summit']

... etc.

It looks like the intended use of the dictionary is to choose the right translation in the list according to both the context of the source text and to the instructions in parenthesis?

If not, what is the correct interpretation?

If above is correct, how will the evaluation metrics 2. "Terminology success rate" and

3. "Terminology consistency" look like?

For 2., will the occurrence of the translated term be counted only for the correct translation in context, or will any of the terms in the list be counted?

In any case, one would expect that the instructions within '(' ')' need to be removed, singulars and plurals expanded in separate words, but abbreviations withing the parentheses kept, before matching the terms. Is this correct, and if not, what is the correct transformation that will be applied?

For 3., will the consistency be measured separately for each separate context, as it occurs across the document, or will it be measured globally for the document.

On the technical level, the parentheses in the term lists can include both abbreviations and elaborations. Is there any regularity in this, specifically, is the last pair of parentheses guaranteed to contain contextual elaborations? Or do we need to parse and transform the terms on per-case basis?

thank you, Damir

Damir Korenčić

unread,

Jul 11, 2025, 10:10:22 AMJul 11

to WMT: Workshop on Machine Translation

As a follow-up to the previous questions, we also found some terms in Track 2 that have defective terminology lists:

icing ['（not translated, not applicable）']
Lot ['（not translated, maybe missing context）']
ISO ['（kept as is, for standards）']

What is the meaning of this? Are such terms meant not to be translated at all?

best, Damir

Reply all

Reply to author

Forward

0 new messages