MT Talk - ProZ

100 views
Skip to first unread message

Jeff Slenker

unread,
Mar 1, 2023, 8:46:28 AM3/1/23
to hon...@googlegroups.com
Excuse me if this has been discussed already, but I have one particular issue about MT based on a talk given yesterday on a ProZ podcast featuring Jay Marciano from SDL. Jay mentioned billions of TM units being used to create the neural MT we have today. My question is how all of those TM units came together as the source material for the statistical and then neural MTs and do we as the sources of those TM units receive any monetary benefit from the neural MT that will increasingly take our work. I know every agreement has a clause in it about copyright belonging to the client. I am guessing that this clause means we just grin and bear this use of our TM material in this way, but I guess I am hoping that in some way this violates some legal statute somewhere, and that all of the years we spent using Trados and other TMs is not ultimately the reason for the end of our work. If this has been discussed, I apologize for missing it.
--

Christiane Feldmann-Leben

unread,
Mar 1, 2023, 9:01:38 AM3/1/23
to hon...@googlegroups.com

Hi Jeff,

this is something that angers me massively, as well. On conference for translators in Germany in 2012, a guy was giving a talk about the deal between Google (translate) and the European Patent Office to use the existing translations of patents to feed into the Google translation machine. And this guy had the nerve to tell a full room of translators that their services wouldn't be needed any more in 10 years time and praise this as a great progress and benefit. I then stood up and told him that he should at least thank us for our efforts to make his machine work and to put us out of business. He just goggled. He didn't even understood my point. I was exasperate, but well, there was nothing I or we were able to do about it. I am still in business doing patent translations, but with the arrival of DeepL and now ChatGPT, I really doubt that it will last until I will retire (another ten years).

Now, our TM input might belong to the client. But my client wasn't SDL or RWS, but others. But my work was often published (patents, some websites and so on) and this was probably also used by those Machine translation trainers. They live on our work but we do not see a single penny, but earn less and less. I have no idea what could be done about this. Nothing I suppose. Quite frustrating.

My ten cents of thoughts on the matter,
greetings from Germany,
Christiane

--
You received this message because you are subscribed to the Google Groups "Honyaku E<>J translation list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to honyaku+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/honyaku/CAOkdc680e6800meyt6U7ORHxSBbC4G8Ke5ucwTieY8%3DDYKQgfA%40mail.gmail.com.
--



Dr. Christiane Feldmann-Leben
Chemieübersetzerdienst
Böcklerstr. 1
D-76275 Ettlingen

Phon +49 (0)7243/217218
Internet: www.chemieuebersetzerdienst.de
e-mail: in...@chemieuebersetzerdienst.de

Datenschutz: https://chem-ued.de/datenschutzerklaerung/


Diese E-Mail enthält vertrauliche und / oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.

This e-mail may contain confidential and / or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

jo...@johnfry.org

unread,
Mar 1, 2023, 9:34:57 AM3/1/23
to Honyaku E<>J translation list
Jeff and Christiane: FYI this interesting topic is discussed at length in the book Who Owns the Future? (2013) by Jaron Lanier.

Matthew Schlecht

unread,
Mar 3, 2023, 11:19:15 AM3/3/23
to hon...@googlegroups.com
On Wed, Mar 1, 2023 at 9:01 AM Christiane Feldmann-Leben <cle...@web.de> wrote:

Hi Jeff,

this is something that angers me massively, as well. On conference for translators in Germany in 2012, a guy was giving a talk about the deal between Google (translate) and the European Patent Office to use the existing translations of patents to feed into the Google translation machine. And this guy had the nerve to tell a full room of translators that their services wouldn't be needed any more in 10 years time and praise this as a great progress and benefit.


I think I remember that talk! I guess the news of our professional demise was a bit premature.
 

I then stood up and told him that he should at least thank us for our efforts to make his machine work and to put us out of business. He just goggled. He didn't even understood my point.


Maybe he had never thought of it in that perspective.
 

I was exasperate, but well, there was nothing I or we were able to do about it. I am still in business doing patent translations, but with the arrival of DeepL and now ChatGPT, I really doubt that it will last until I will retire (another ten years).


My own retirement will come at the end of this year, but I empathize with the uncertainties facing the translation community in the next years. On the other hand, there are uncertainties facing the developers and users of AI-driven translation. When you take humans out of the loop, who (or what) has the ultimate responsibility for quality? How will they fare in the transition from the user driving the engine to the engine driving the user?
 

Now, our TM input might belong to the client. But my client wasn't SDL or RWS, but others. But my work was often published (patents, some websites and so on) and this was probably also used by those Machine translation trainers.


Ever since the development of computer-aided translation and translation memories/corpora, our glossaries and translation content have generally (and contractually) become the property of the client. Even if the translator doesn't provide the TM, all the client need do is align the target segments with the source segments, and they have captured that content for future use. They even go so far as to write into contracts the provision that no TM content developed in projects they have commissioned can be used in a job for a different client, although I don't know if any enforcement of that has taken place.
This expropriation of our work product bothered me for a long time but, alas, there isn't much that can be done about it.

One item of note is that, given the fundamental differences between East Asian languages and European languages, AI-driven translation in East Asian/European language pairs more frequently runs into problems. That's because the MT engines are based on English. When some clever AI researcher effectively develops an AI that "thinks" in Japanese, Chinese, or Korean, and gets it to hold hands with the English-based systems, the competitive edge over human translation will increase.

Matthew Schlecht, PhD
Word Alchemy Translation, Inc.
Newark, DE, USA
wordalchemytranslation.com

Jon Johanning

unread,
Mar 3, 2023, 1:50:40 PM3/3/23
to Honyaku E<>J translation list
I agree that this expropriation of glossaries and such is something we can't really expect to get compensation for. In times of difficulty getting work like these days, we'd like to grab every penny we can get, I know. The solution, as I see it, is just to work harder at rounding up more work sources and getting work from the sources we have. (I've been surprised lately by the old agencies I haven't heard from for a long time that are turning up these days, though usually not with big hunks of raw meat.)

I have always seen problems in MT between Japanese and English. Heck, there are problems with old-fashioned non-MT translation, of course. I'm following the discussion about new AI revolutionizing translation, but I don't see it helping me much in what I'm doing right now, so I'm not spending a lot of time studying the subject. It seems to me that what it can do right now is basically the same as rewriting what you write in your first drafts, and I think I can do that better than some damn machine. But of course that's what the carriage and buggy-whip makers used to say!

Jon Johanning

Matthew Schlecht

unread,
Mar 8, 2023, 4:18:43 PM3/8/23
to hon...@googlegroups.com
On Fri, Mar 3, 2023 at 1:50 PM Jon Johanning <jjoha...@igc.org> wrote:
It seems to me that what it can do right now is basically the same as rewriting what you write in your first drafts, and I think I can do that better than some damn machine. But of course that's what the carriage and buggy-whip makers used to say!

Quite.
I made the point about the effects of technological innovation in my ATA talk last October by drawing upon the events at the Battle of Hampton Roads on March 9, 1862, during the American Civil War.
That was the first meeting in combat of ironclad warships, the USS Monitor and CSS Virginia (referred to in Northern textbooks as the Merrimack, which the Confederate Navy had salvaged, refurbished, and renamed).
Once these two vessels met in battle, every other navy in the world based on wooden vessels became obsolete overnight. It didn't much affect the merchant marine or other vessels, but the state of the art in the naval warfare sector had changed forever.

Herman

unread,
Mar 8, 2023, 4:46:04 PM3/8/23
to hon...@googlegroups.com
But in what sense did the state of the art change? Is the relationship
between an ironclad warship and missiles capable of destroying an
ironclad warship (as well as a wooden one) fundamentally different from
the relationship between a wooden warship and missiles capable of
destroying a wooden warship (but not an ironclad warship)?

Herman Kahn

Matthew Schlecht

unread,
Mar 8, 2023, 5:00:54 PM3/8/23
to hon...@googlegroups.com
Ironclads could repulse most cannon-fired projectiles from most angles, and could thus closely approach wooden ships or land-based fortifications and pulverize them with their own cannon fire while suffering far less damage.
So, in most battles between ironclads and wooden ships/land-based fortifications, wooden ships/land-based fortifications lost.
The same cannonballs that would punch holes in wooden ships/land-based fortifications would usually bounce off ironclads.

Furthering the analogy, it took commanders a while to realize the strengths and weaknesses of the new technology, and effectively incorporate them into tactics and strategies.

Herman

unread,
Mar 8, 2023, 5:24:32 PM3/8/23
to hon...@googlegroups.com
On 3/8/23 14:00, Matthew Schlecht wrote:
> On Wed, Mar 8, 2023 at 4:46 PM 'Herman' via Honyaku E<>J translation
> list <hon...@googlegroups.com <mailto:hon...@googlegroups.com>> wrote:
>
> On 3/8/23 13:18, Matthew Schlecht wrote:
> > On Fri, Mar 3, 2023 at 1:50 PM Jon Johanning <jjoha...@igc.org
> <mailto:jjoha...@igc.org>
I am aware of the above. I was pointing to the fact that the development
of ironclads was soon followed by the development of projectiles capable
of destroying them, thus negating the initial advantage they gave and
sort of returning the situation to what it was before (although
eventually, with the emergence of long-range hypersonic anti-ship
missiles, perhaps the whole idea of ironclads, or the utility of big
powerful ships, has been largely negated).

So the revolutionary impact of a great technical innovation remains
revolutionary and great only insofar as the problem which it addresses
remains the same, but it cannot be assumed that the problem will
necessarily remain the same indefinitely. So for example, in the case of
the emergence of an AI system to do translation or whatever, there is
the possibility if not inevitability that an anti-AI system will emerge
to defeat the AI (Well, I suppose this depends on the state of the world
to some degree. If you (the AI developer) control the whole world, maybe
you can imagine that something like that will never happen and you and
your AI will reign supreme for eternity...)

Herman Kahn


Matthew Schlecht

unread,
Mar 8, 2023, 8:20:17 PM3/8/23
to hon...@googlegroups.com
On Wed, Mar 8, 2023 at 5:24 PM 'Herman' via Honyaku E<>J translation list <hon...@googlegroups.com> wrote:

So the revolutionary impact of a great technical innovation remains
revolutionary and great only insofar as the problem which it addresses
remains the same, but it cannot be assumed that the problem will
necessarily remain the same indefinitely. So for example, in the case of
the emergence of an AI system to do translation or whatever, there is
the possibility if not inevitability that an anti-AI system will emerge
to defeat the AI

While I concede your points in the broader question of whether we will control AI or whether AI will control us, I don't see translation content evolving in such a way as to thwart AI translation capability.
AI, at least in a translation context, is merely a tool. Within its constraints, it can process certain translation content faster.

In another analogy, the cotton gin was a tool that was developed to separate cotton fibers from their seeds, and enabled significant increases in the rate and volume of cotton production. Cotton did not then evolve to become more difficult to separate and defeat the cotton gin's advantage, and so the cotton gin provided a robust and lasting improvement in that sector of agricultural production. People still picked apples by hand, but from then on cotton was processed predominantly using the new tool.

Perry E. Gary

unread,
Mar 9, 2023, 3:54:30 AM3/9/23
to hon...@googlegroups.com
OT, but some historians say the first ironclads were the Korean "turtle
boats," which had metal superstructures and which Admiral Yi deployed
successfully against an invading Japanese fleet. I seem to recall reading
somewhere that there have been scattered other instances, less decisive, in
naval warfare prior to 1862.

Perry Gary
--
You received this message because you are subscribed to the Google Groups
"Honyaku E<>J translation list" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to honyaku+u...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/honyaku/0bd69362-e404-de5d-0876-793e53b28f75%40lmi.net.

Herman

unread,
Mar 10, 2023, 4:15:09 PM3/10/23
to hon...@googlegroups.com
In the case of ship hulls vs anti-ship missiles, there is a 矛盾 type
relationship, whereas the relationship between cotton gins and cotton is
a complementary one, so cotton cannot really be said to be an anti-gin
system/force/factor that would somehow evolve to defeat the advantages
of a cotton gin. If you look at the case of Eli Whitney's cotton gin,
which I assume you have in mind, prior to its invention, only Indian
(Hindustani) type cotton could be grown in a commercially viable manner
in the US, and it was processed by hand or using an Indian type cotton
gin, but this variety of cotton could only be grown on a limited acreage
near the coast, and the US remained a minor producer. Whitney developed
a gin that could process upland cotton (which could not be efficiently
cleaned by hand or with an Indian gin), thus allowing for a huge
increase in cotton acreage and making the US the leading producer of
cotton, namely, of low-grade, cheap, upland cotton. So it was not a
situation of the same cotton being produced and processed using a
different tool, but rather the cotton itself, in terms of both input and
output, became qualitatively and quantitatively different from the
cotton produced before. So by analogy, I would say, it cannot be assumed
that the introduction of AI-based MT tools into the translation field
will necessarily lead to a situation where the same things, in terms of
quality and quantity, are still being translated, but only using a
different tool. The nature of the things being translated, or even what
it means to "translate" something, is itself likely to change.

What I meant by "anti-AI system", is for example, currently, an AI
system takes a corpus of a large volume of human-generated source and
target texts, and a new text in a source language, and generates a
translation for the new text based on patterns extracted from the corpus
of human-generated translations. Well, if translations come to be done
using AI, then the corpus of translations constituting the input to the
future AI system may itself come to consist mostly of AI-generated
translations, and so if the purpose of the AI translation system is to
generate "human-type" translation on the basis of analysis of
human-generated texts, that purpose may be defeated by the fact that the
input texts to the AI are now mostly AI-generated. Not to mention that,
unless there is a single AI system ruling the world under some sort of
global dictatorship, there could be multiple competing AIs out there,
which, as part of their competition, may come up with ways to undermine
the operation of other AIs.

Herman Kahn





Matthew Schlecht

unread,
Mar 10, 2023, 5:16:20 PM3/10/23
to hon...@googlegroups.com
On Fri, Mar 10, 2023 at 4:15 PM 'Herman' via Honyaku E<>J translation list <hon...@googlegroups.com> wrote:

Well, if translations come to be done
using AI, then the corpus of translations constituting the input to the
future AI system may itself come to consist mostly of AI-generated
translations, and so if the purpose of the AI translation system is to
generate "human-type" translation on the basis of analysis of
human-generated texts, that purpose may be defeated by the fact that the
input texts to the AI are now mostly AI-generated.

This is an important point!
An MT engine that ends up consuming substantial quantities of unrevised output from itself or other MT engines will necessarily start trending away from human-type translation content. The non-sentient leading the non-sentient.
I believe that developers are aware of this potential flaw, and endeavor to devise strategies to avoid the deleterious effects.
I have read that GoogleTranslate (and possibly other commercial engines) places tokens in any content beyond a certain minimum word count. These tokens (which are designed to be "invisible") signal the GT MT engine to exclude such content from its learning curve.

Herman

unread,
Mar 10, 2023, 10:05:40 PM3/10/23
to hon...@googlegroups.com
Assume for argument's sake that all or most published translations at
some point come to be implemented using MT. If the MT system operates
based on pattern extraction from human translations, it will no longer
be possible to do that, since there are no new human translations
available.

Assume, as a separate parallel assumption, that human language use
continues to evolve more or less like it has done so far, i.e., that
people do not just keep saying exactly the same things and using exactly
the same words, expressions, grammar, etc., as they did a thousand or a
hundred years ago, and that translation itself is just a special case or
context of language use.

In this case, there arises a clear contradiction between the two
assumptions. I.e., if human language use continues to evolve as it has
done, then human-like translation will become impossible through MT,
unless the MT can itself predict, on the basis of previous human
language input, what the future development of human languages will be,
or, alternatively, unless the development of human language use itself
follows MT output.

Herman Kahn


Jon Johanning

unread,
Mar 11, 2023, 12:29:31 PM3/11/23
to Honyaku E<>J translation list
Matthew,

That sounds fine for Google Translate, but are all the other AI outfits doing the same thing? And with who knows how many "AI experimenters" pushing their experiments out onto the internet, naturally without tokens of the kind GT is using? This whole California-Gold-Rush stampede of irresponsible people panning for gold in the AI river is already out of control, I'm afraid. I'm seeing many warnings that the internet will inevitably turn before long into a free-for-all in which we cannot tell whether anything we see on it is actually written by human beings and not "generated" by "generative" AT, which generates as much "hallucination" as sanity. 

That glorious "internet" thing was great while it lasted, but anyone who is interested in old-fashioned truth and reality will need to turn back to pre-internet methods of communication.

Jon Johanning

Tom Gally

unread,
Mar 12, 2023, 9:02:57 PM3/12/23
to hon...@googlegroups.com
Herman Kahn wrote:

> If the MT system operates based on pattern extraction from human translations...

That is how dedicated MT systems have worked. However, I don’t think that ChatGPT’s translations are primarily derived from human-translated bilingual corpora. While such corpora were no doubt present in its training set, it seems to produce translations based on an “understanding” of the input text’s “meaning.” 

Note the scare quotes! I’m not saying that it really understands meaning!

Tom Gally
Yokohama, Japan

Herman

unread,
Mar 12, 2023, 10:42:57 PM3/12/23
to hon...@googlegroups.com
It uses an seq2seq approach, but I wouldn't say it even "understands"
meaning in quotes, in that, when given a difficult text to translate, it
may spew out complete nonsense, suggesting that the system is not
parsing the text into some sort of representation analogous to meaning,
but in any case, even it was generating an intermediate representation
analogous to meaning, it is I think unlikely that it would do so in a
manner so similar to human translators that, over time, the translations
produced by a such a system would not diverge significantly from what
human translation would be, were it to continue to be practiced and not
be replaced by AI.

Herman Kahn


Reply all
Reply to author
Forward
0 new messages