Has GPT passed the Turing test for translation?

Kevin Kirton

unread,

May 17, 2023, 4:56:47 PM5/17/23

to honyaku

I first encountered this honyaku forum in 1998/99 and enjoyed an education in J-E translation that was impossible before the internet, reading the posts and interactions between many first class translators. Some names that come to mind include Tom Gally, Bill Lise, Dan Kanagy, Jim Lockhart, but there were many others.

Around 2005 there were a few posts and extensive replies around the issues of TMs (translation memories) and the possibility that the holy grail of machine translation between J<>E would be cracked. Many were of the view that it either would never happen or not in our lifetimes.

And now, after following Tom Gally's research into the quality of ChatGPT's translation quality, I feel that perhaps GPT has reached a level where it would pass a translation test equivalent to the Turing test. That is, given a handful of professional Japanese-English translators working separately in different rooms and a computer with GPT4 in another room, and all supplied with the same Japanese text (a novel, a presentation, a manual), would will always be able to tell the difference between the work of the human translators and GPT?

The issues here are "expert in the loop" and "hallucinations" etc, but I just wanted to put this question to the honyaku group.

Kevin Kirton

Canberra, Australia

Joe Jones

unread,

May 17, 2023, 7:29:05 PM5/17/23

to Honyaku E<>J translation list

On Thursday, May 18, 2023 at 5:56:47 AM UTC+9 Kevin Kirton wrote:

I feel that perhaps GPT has reached a level where it would pass a translation test equivalent to the Turing test. That is, given a handful of professional Japanese-English translators working separately in different rooms and a computer with GPT4 in another room, and all supplied with the same Japanese text (a novel, a presentation, a manual), would will always be able to tell the difference between the work of the human translators and GPT?

We are already at a point where AI can produce a translation indistinguishable in quality from a professional translation -- just not consistently. Kind of like how Tesla's self-driving cars work 99% of the time but then get confused when there's road construction or a small animal crosses the street. The failure rate is gradually improving but still not really "reliable" enough for many purposes.

In particular, at least for the foreseeable future, I think human translators are going to have a huge advantage whenever contextual knowledge is required, since AI is still incapable of possessing that.

Tom Gally

unread,

May 17, 2023, 9:41:37 PM5/17/23

to hon...@googlegroups.com

Interesting question, Kevin.

My guess is that it would depend on who is judging the human and GPT translations and what they have been told about the task.

Skilled and sensitive human translators, I think, would be able to tell the difference between top-level human translations and GPT translations, especially if they have spent some time looking at GPT translations and have become familiar with their characteristics. A paragraph-length hallucination, for example, would be a dead giveaway that a translation was GPT produced. (I’ve seen such hallucinations in GPT translations only a couple of times, and they are obvious when they occur.) Grammatical errors and typos would be equally strong evidence of a human translation. Judgments based on translation accuracy, naturalness, and overall coherence would be more difficult, but, with practice, it should be possible for translators to identify the GPT translations pretty reliably.

The results would be more mixed, I suspect, if the translators were just asked to evaluate the quality of the translations without being told that some were produced by GPT. In some cases, translators might very well rate the GPT translations higher. As I mentioned in one of my videos where I discussed translation of literary texts, some of the differences between human and GPT translations come down to a matter of taste, especially when the human translator has had to make a judgment call about whether to be more accurate or more natural. I didn’t say so in the video, but I personally didn’t like some of the choices made by the human translator of the Natsume Soseki novel—vocabulary footnotes!—and preferred the GPT versions.

If we think about how translation is used in the real world, I suspect that, in many cases, the Turing test had already been passed by MT even before GPT appeared. How many people shopping on ecommerce sites like Alibaba, for example, are aware that the product descriptions have been translated by machine? It’s obvious to us, but most people probably don’t even think about it.

Kevin’s question would be an interesting problem for some academic to study systematically and write a paper about. I’ve retired from my research post, so I’ll pass.

Tom Gally

Herman

unread,

May 18, 2023, 9:28:53 AM5/18/23

to hon...@googlegroups.com

The answer depends on how one interprets the intent of the Turing Test.

The Turing Test (originally called the Imitation Game), was proposed on
the assumption that, given the large number of possible input-output
pairs in human conversation, nobody is going to be able to specifically
predict what the conversation inputs in the test will be or to
pre-program a machine with such a large corpus of examples that it will
likely include whatever will be inputted in the test, and that
therefore, if a machine is able to imitate conversation, it must have a
function akin to intelligence, i.e., a function of generating a large
number of appropriate outputs on the basis of a small amount of input data.

However, subsequent developments in the field of information processing
have to a significant extent defeated the implicit assumption of the
Turing Test: a corpus of a large language model may now contain a
sufficient number of examples to pass the test solely on the basis of
statistical pattern matching, without any function analogous to
intelligence.

Thus, if the intent of the Turing Test, as applied to translation, is
understood to be the demonstration of an ability to produce a
translation on the basis of a small amount of input from which the
translation output cannot be predicted by e.g. statistical means, then
GPT has not passed the Turing Test for translation.

If the intent of the Turing Test for translation is understood as the
ability to output a translation that sometimes or oftentimes may be
deemed a valid translation and/or taken by some judge to be a human
translation, regardless of how the translation was generated, then GPT
probably has passed the Turing Test.

Herman Kahn

Reply all

Reply to author

Forward