Feedback Cycles in Machine Learning/Machine Translation

172 views
Skip to first unread message

Warren Smith

unread,
Nov 27, 2017, 11:32:01 PM11/27/17
to hon...@googlegroups.com
I sent this to a client today when I turned down a job. Thought I
would pass it along. (Remember, the thread starts at the bottom...)

Warren


-------------------
.

As someone with a doctorate (Harvard) in technology strategy, and as a
working translator, I am extremely interested in the *observing* what
is going on with machine translation (in case some day exponential
improvements in AI suddenly produce a useful product without my
noticing), so I am glad you sent this to me. Nevertheless, at present
the English in MT in general, and in this document in particular, is
much more difficult for me to understand than the Japanese (so reading
the English would be more laborious and error prone than reading the
Japanese), and once I know what I want to say, rearranging the English
bits to get them in the right order and grammatically correct would be
much more difficult than just typing.

Frankly, as someone trained in the field of technology strategy, I am
not particularly optimistic about AI translation *ever* reaching the
point of being a useful tool for a skilled translator (although, of
course, for an unskilled translator having this English reference
would be a huge boon). The reason for this is that the AI learning
algorithm for machine translation is based on input from human
translators, and we have already reached the point that the MT tools
have allowed an influx of unskilled human translators into the
industry, reducing the quality of the product that is being used as
the basis for machine learning, in turn *reducing the quality of
future product.* That is, I see evidence that machine learning has
peaked in its quality and is currently declining. This is very clear
when I use reference tools like Weblio, where bad translations
(submitted to the Japan Patent Office) serve as references for a
usage-example dictionary, which has fed back into worse translations
in a vicious cycle, and now the Weblio dictionary is becoming
hopelessly corrupt. My conversations with a senior Google Translate
insider (an engineer who worked for Google Translate at the time)
indicate that this feedback cycle is at work even at Google, and that,
in patent translation at least, Google has already used up the
existing bilingual corpus, so he too does not anticipate any further
improvement.

So the bottom line is that I would very much like to monitor what sort
of machine psudotranslated documents are floating around the industry,
but I don't anticipate it making sense to involve myself in a
post-editing work flow.

Does this make sense?

Warren Smith


________________________________
[REDACTED]
Sent: Monday, November 27, 2017 2:56 PM
To: Warren Smith
Subject: [REDACTED]t

Thanks, Warren, for taking a look.

For the future, are you interested or skilled in the past, for MTPE jobs JP>EN?


Best regards,
[REDACTED]

On Mon, Nov 27, 2017 at 02:40 pm, <warren...@comcast.net> Warren Smith wrote:

Thank you, but this would take much longer to edit than to retranslate
from scratch.

In this text, it looks like many of the right elements are in the
sentences, but they are in the wrong order, so don't make a lot of
sense. It's kind of like if someone built a building with mostly the
right shape, but not quite on the foundation, and with all the doors
and windows, electrical outlets, piping, bathroom facilities, etc., in
random locations. It kind of looks like a building, and kind of
functions like a building (with walls and a roof), but to make it into
finished, habitable domicile, the only real choice is to tear it down
and start over, which is actually more expensive than starting from
scratch.

So thank you for sending this to me, but I think I had better let it pass.

Warren


________________________________
From: [REDACTED]
Sent: Monday, November 27, 2017 1:26 PM
To: Warren Smith
Subject: [REDACTED]

Dear Warren,

I hope you are well!

Our client is looking for a human-assisted machine translation of the
captioned published Japanese patent application, which is attached,
along with the EN translation completed.

Our MTPE rate for per-EN word is 65% of your regular translation rate.
Does this work for you?

We are looking for an edit of the below at the earliest possible time
without incurring rush charges.

Thank you and we look forward to hearing from you!


Best regards,
[REDACTED]

Jon Johanning

unread,
Nov 28, 2017, 12:37:07 PM11/28/17
to hon...@googlegroups.com
Warren,

I like your house metaphor very much. I’ll keep it in mind when I need to respond to this stuff.

About Weblio: I use it quite often, but very cautiously when it comes to the examples of English contexts they throw in. I often wonder where the heck they get them.

Also, great caution is needed when it comes to the main definitions themselves, which include generous helpings of ridiculous English. I feel very sorry for any Japanese-speaking users who are trying to get correct English from it. But native English speakers can avoid those potholes.

Jon Johanning
Japanese-to-English translation
jjoha...@igc.org
www.jcjtrans.com (this Web site is not functioning right now, but I’ll try to get it working soon)
> --
> You received this message because you are subscribed to the Google Groups "Honyaku E<>J translation list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to honyaku+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Warren Smith

unread,
Nov 28, 2017, 3:20:52 PM11/28/17
to hon...@googlegroups.com
I also didn't mention another factor that is damaging the quality of
machine translation, that of the corpus for machine learning for
machine translation including documents from many translators with
many disparate and fundamentally incompatible styles. Even something
as straightforward as a list of chemical names (with no grammar to
have to worry about) in Google Translate comes back with a hodgepodge
of translations that follow a variety of different chemical naming
conventions, requiring extreme editing. Think about it this way -- if
you fed an AI algorithm Beethoven, Black Sabbath, and Snoop Dog and
had it try to compose a song randomly combining elements of the
musical phrases it had examined, could we expect a good result? What
about creating art that randomly combines elements of Rembrandt and
Picasso? No -- the styles are incompatible, so machine translation
based on the mixed styles produces a mess...

Looking at patents filed in the US Patent Office, I get the impression
that the quality of translation is falling rapidly. I think that firms
that are relying on cheap translators who overly rely on tools such as
machine translation are going to regret it, and anticipate a new
market opening up for the more skilled translators, that of supporting
all of the lawsuits that are going to result from the bad
translations...


Warren

Herman

unread,
Nov 28, 2017, 4:20:39 PM11/28/17
to hon...@googlegroups.com
On 28/11/17 09:37, Jon Johanning wrote:
> Warren,
>
> I like your house metaphor very much. I’ll keep it in mind when I need to respond to this stuff.
>

I agree with the gist of what Warren is saying, but don't think the
house metaphor is particularly apt, because in the case of a house,
being able to reuse poorly installed materials would generally be
cheaper than purchasing the materials anew.

By contrast, in the case of revising a translation, a skilled translator
obtains the materials for a new translation for free simply by looking
at the source text in the course of the revision work, and if the
revisions are extensive and the translator is a good typist, it would
generally be easier to just type a new sentence than to attempt to
recreate this sentence by reusing elements of the existing translation
text.

In addition, in situations when extensive revisions are required to
produce a text of acceptable quality, the extra difficulty of doing the
job as a revision will likely lead to a lower quality final product.

Herman Kahn

Andy

unread,
Nov 28, 2017, 4:42:33 PM11/28/17
to Honyaku E<>J translation list
I love the analogy as well, but in our typical case, it is not a house but a bookcase.
We apply a statistical MT for relatively simple documents and only for sentences that have lower than 95% match (96-99% we use fuzzy). Most of the MTed sentences look like a poorly assembled bookcase, with some of the shelves being way uneven, some not even in the frame. Still, there are patterns of dishelvement, and most can very quickly reuse the misaligned shelves....mainly because the sentences are short and terms are all correct. SMT also works great for software strings.
On the other hand, 95% of documents going thru our statistical and neural MT engines are put there by non-translators--people who cannot read Japanese--and for them anything is better than nothing.
We don't use MT for complex documents, which I guess are closer to the house analogy...

Andy
Belmont, CA

Tom Gally

unread,
Dec 10, 2017, 8:57:15 PM12/10/17
to hon...@googlegroups.com
Warren Smith wrote:

The reason for this is that the AI learning
algorithm for machine translation is based on input from human
translators, and we have already reached the point that the MT tools
have allowed an influx of unskilled human translators into the
industry, reducing the quality of the product that is being used as
the basis for machine learning, in turn *reducing the quality of
future product.*

That's a good point, though I don't think it's wise to be sanguine about the potential for AI-driven MT to continue improving quickly. The frightening analogy is Google's AlphaGo and AlphaGo Zero. AlphaGo started with input from human go players--previously played professional games--and over the course of a couple of years became strong enough to defeat the strongest human players. AlphaGo Zero, in contrast, played only against itself, with no human input, and in a little over a month was stronger than any go player in history, human or machine. If you play go, it's staggering to watch that progress: the first games of AlphaGo Zero against itself look like they were played by idiot children; three days later, the machine was stronger than nearly all professionals.

As long as MT continues to rely on human-translated input, as Warren points out, there will be a limit to how good it can get. But as machine-learning software continues to probe deeply into the huge amounts of text, images, audio, and video that are now available online, it will start extracting patterns of correspondence between text and phenomena in the real world. As that happens, it will gradually be adapted to take into account what we think of as the meaning of the text, the author's presumed intention, and the reader's anticipated response--things that can be handled now only by human translators.

It will take a huge amount of processing resources to churn through all that data--just AlphaGo Zero, which dealt with a vastly simpler problem, was apparently very resource intensive--but the incentives for Google and others to invest those resources go far beyond better machine translation.

I started working as a freelance translator in 1986, when I was twenty-eight, and continued until 2005, when I took an academic position. It was a good career. But if a twenty-eight-year-old now asked me if he or she should consider a career as a translator, I would say no.

Tom Gally
Yokohama, Japan

Fred Uleman

unread,
Dec 11, 2017, 6:18:04 AM12/11/17
to hon...@googlegroups.com
That said, Tom, I wonder if there are any careers we can safely recommend to someone just starting out. And I suspect that, rather than careers, the advice would be to learn as much as you can about as many things as you can, including how the parts connect and interact (i.e., keep feeding that intellectual curiosity) so you will have the ability to respond flexibly to your situation as it evolves.

- -- --- ---- ----- ---- --- -- -
Fred Uleman, translator emeritus

Warren Smith

unread,
Dec 11, 2017, 2:32:19 PM12/11/17
to Honyaku E<>J translation list
The difference between learning translation and learning Go is that the target function of the Go algorithm is Boolean (that is, win vs. lose), enabling very rapid learning (at "machine speed"). In translation, however, evaluating whether or not a translation is good is more difficult. The best you can do is try to evaluate how "similar" the translation provided by the algorithm is to an existing translation -- but that requires some definition of "similarity" of translation -- not an easy task. 

I don't think the speed of learning of the "well structured problem" of winning a Go game maps well onto the poorly structured problem of producing a high-quality translation (whatever that may be),. 


Kevin Kirton

unread,
Dec 11, 2017, 4:46:28 PM12/11/17
to honyaku
Tom Gally wrote:

 > But if a twenty-eight-year-old now asked me if he or she should consider a career as a translator, I would say no.

I would definitely advise young people to start learning a second language. It will be fascinating either way.

On the one hand, even if AGI (artificial general intelligence) advances enough to be capable of translation comparable to human translators, it will still be only very competent bilinguals who will be able to judge (and trust) the results. Unless we ask Hal or Siri to evaluate translations for us. ("Siri, who is Murakami's best translator?")
On the other hand, if AI simply does not advance enough to be capable of accurate, dynamic translation (which is its current state, and who knows how long it will take considering it's been "just around the corner" since the 1980s), then the work done by human translators will have to be given some serious respect, surely.

Computers aren't really that good yet at monolingual tasks such as being an adaptive, helpful telephonist or being able to pass the simple Turing test, so to do well at these tasks in a bilingual environment would be all the more difficult.

An exciting time to be alive.

Kevin Kirton
Canberra, Australia

R Freeman

unread,
Dec 11, 2017, 4:55:37 PM12/11/17
to hon...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "Honyaku E<>J translation list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to honyaku+unsubscribe@googlegroups.com.

Dan Lucas

unread,
Dec 11, 2017, 7:20:57 PM12/11/17
to hon...@googlegroups.com
As long as MT continues to rely on human-translated input, as Warren points out, there will be a limit to how good it can get. But as machine-learning software continues to probe deeply into the huge amounts of text, images, audio, and video that are now available online, it will start extracting patterns of correspondence between text and phenomena in the real world. As that happens, it will gradually be adapted to take into account what we think of as the meaning of the text, the author's presumed intention, and the reader's anticipated response--things that can be handled now only by human translators.

Tom, take a look at the comments in this article by Andreas Zollmann, who was working on Google Translate at the time and who suggested that "the idea that more and more data can be introduced to make the system better and better is probably a false premise". They have already sucked up those textual resources. ("We are now at this limit where there isn't that much more data in the world that we can use," he admits - and this was back in 2010.)

I don't think anybody is exactly sanguine about the future of MT. Indeed, the discussion surrounding the application of AI in any field seems to be a combination of sky-high expectations on the part of its proponents and hysteria on the part of those who believe themselves about to be "replaced". The conversation could do with some sanguine input. It isn't getting much.

What strikes me is that it is taken as read by many people in the translation industry that the promises made for AI will be fulfilled without fuss. Have you not noticed, for example, that your statements above are promissory in nature (consider your use of "will") rather than phrased as part of a hypothesis? We are talking about the future. Making accurate predictions is extremely difficult, otherwise we would all be making heaps of money in the stock markets (full disclosure: I did that for a couple of decades - I didn't make heaps of money) and in currencies. Our language when discussing the future should reflect that lack of certainty, otherwise we simply contribute to the hype.

I have made the point, here and elsewhere, that there was a similar AI boom at the end of the 1970s and in the early 1980s that ended in a profound recession for that sector when the reality failed to live up to the promises that had been made for artificial intelligence. Note that I am not making a statement about the absolute level of capability of modern AI, which is in some ways greatly improved over its ancestor. I am making a statement about the tendency of people to get unreasonably excited about the latest thing in technology.

This is a function not of the nature of the technology itself, but a function of the way human greed drives the response of the crowd, at the expense of rationality, when large numbers of people are faced with something that looks like a sure thing. It is the root of the kind of bubbles that have been appearing, swelling and bursting for hundreds of years.

There is nothing new here. The fact that the target of this bubble mentality is an apparent inevitability (artificial intelligence) doesn't necessarily make it any less of a bubble, or indeed inevitable. I am fairly sure that we will look back in five years and see some areas in which AI has done well (closed systems with clearly defined success/failure criteria), and others in which current iterations of AI have failed dismally. Maybe translation will be one of those dismal failures. More likely it will be a success, but success of a limited kind.

Having said all that, I would not advise a young person with linguistic capabilities to become a translator straight out of university. But then I would not have advised that even if the MT issue did not exist. I would have  told them to go and make themselves a career in industry or in one of the professions first, and consider translation as a second career in a couple of decades, if they are so inclined. One of the other people in this thread pointed out that many other professions have their own problems. It's not as if translation is uniquely challenged. And being more than successful than the average in any competitive profession is just hard.

As for those of us who are already in the business of translation, I tend to agree with those who argue that the key issue for successful translators going forward will be expert knowledge of the subject matter. Ideally that would be allied to a very high level of linguistic skill in the target language, but it would probably be deployed in roles that increasingly tilt towards editing and proofreading.

That combination of subject matter knowledge, source language chops and highly refined target language ability is an unusual skill set. If AI actually enables a surge in (imperfect) translation of documents that hitherto could not be justified because it would have been too costly, we could see tremendous demand for people with those capabilities.

Sure, translators may no longer have near-exclusive ownership of document translation, or spend as many hours on each document, but the number of documents requiring oversight could increase by orders of magnitude. I think there are reasons to be warily optimistic. Let's hope I'm right.

Regards
Dan Lucas

Doreen Simmons

unread,
Dec 11, 2017, 7:50:21 PM12/11/17
to hon...@googlegroups.com
My former main employer, originally set up to assist foreign journalists in Japan,  recently had a reunion. All very enjoyable -- until  the former chief translator brought us up to date. In my time he had been the  head of a full-time staff of  four translators and with several other members of  staff able to work in English -- after which I did the final editing. So what's he doing now in his early sixties? He's a cleaner in the building of a  big company. He starts at 3:00 a.m. pushing a large implement along the floors and he finishes the whole task as the office staff are about to arrive for the day's work. He said he doesn't like the hours but the job pays better than any translation work he can get these days.

Doreen Simmons, still with three part-time editing jobs in Nagatacho plus the sumo commentary.

John Fry

unread,
Dec 12, 2017, 12:33:07 AM12/12/17
to Honyaku E<>J translation list

To raise everyone's morale after Doreen's story of the translation->janitorial career path, let me list a few points that make me optimistic about our future vs. MT/AI.

 

(1) By closely reading and understanding the source text, we offer a lot of value beyond translation. For example, we can propose alternative wordings to the client, and we can spot henkan errors and other mistakes in the source and correct them.

 

(2) Translating J<->E is difficult. MT rarely does a good job of translating (for example) , 対応, and the construction XYとする. This often takes a lot of experience, creativity, and world knowledge to do well. 

 

(3) Virtually all the work I do is paid for by multinational corporations, and they have lots of money (they are literally hoarding trillions of dollars), and they are willing to pay for quality. 

 

(4) The translation volume continues to grow substantially every year.

 

(5) Every year we get better and faster.

 

John Fry

Boise, ID USA


Geoffrey Trousselot

unread,
Dec 12, 2017, 3:32:05 PM12/12/17
to hon...@googlegroups.com
I recently went to a presentation on MT at the JTF Translation Festival. I was left with the impression that the discrepancy between hype and reality is actually holding back the pace of MT. I think with such high expectations, the limitations are considered as temporary and the end goal as some perfect system. If technology concentrated on current technological abilities, I think much more efficient and useful systems can be developed.
Another impression was that there is not a reliable timeline that one can use to guess the advancement of MT. Advancements that may have taken 5 years could occur in a matter of months.
Yet another impression was that there was not much discussion about integrating the technology within single languages. I am sure Google is learning the relationships between phrases and learning to better interpret the intent of web searches. Surely the translation results could be analyzed with such technology to find more probable results.
I also think controlled languages will have a role to play.

I think I was first attracted to translation as a way of working anywhere and for only a few hours a day. Ironically, I find it a very time intensive job. I feel pretty confident I will be able to continue working in the industry for another 20 years or so, but only because of the writing skills that I have already developed over the last 20 years. My perspective toward translation has changed from a means to an end to a dedication to the means. I think choosing a translation career nowadays, requires a dedication to the process and craft rather than simply seeing it as another way to earn money. 

I gave a presentation to checkers in my company about dangling constructions. While preparing the presentation, I had the opportunity to consider the elliptical phrases in both Japanese in English and how each language omits different parts. I think this is ultimately the type of expertise that a translator will offer. The knowledge of all the gaps in the logical processes.

There is the concept of a computer beating a go player. But what if the go player used the computer as a tool, and could change the rules whenever? I think that is more the analogy of a translator. Translators can use all the tools at their disposal, including MT.

Tom Gally

unread,
Dec 12, 2017, 7:45:52 PM12/12/17
to hon...@googlegroups.com
For the sake of furthering this interesting discussion, let's imagine a distinction between two career focuses: "translation proper" and "bespoke translation."

Translation proper is when a translator is given a text in one language to translate into another, with little consideration given to how the translation is to be used. 

Bespoke translation is translation that is done specifically to meet a particular client's needs, usually in negotiation with the client. Bespoke translation often blurs into copywriting, technical writing, business consulting, and other activities very different from translation proper. For the sake of the argument, let's include the translation of novels, poetry, film dialogue, and the like with bespoke translation.

When I was a translator, I occasionally did translation proper, usually for agencies and especially when I was starting out, but it didn't account for more than a tenth or so of my workload over my two-decade career. From discussions on Honyaku years ago, however, I got the impression that some translators made their living primarily from translation proper. A few even took pride in that fact and would turn down bespoke jobs. They seemed to prefer working at home by themselves and didn't like the bother of interacting with other people.

My advice to a twenty-eight-year-old today would be to not pursue a career in translation proper, as that is already being threatened by MT and the threat will grow only stronger. Bespoke translation, especially where it blurs into other fields that require a lot of person-to-person interaction, should be less vulnerable (I hope!).

My analogy to the game of go was indeed imperfect. As Warren pointed out, unlike translation, go has an unambiguous goal and the relative level of a go player can be assessed clearly and objectively. On the other hand, while AlphaGo Zero quickly became stronger than any human who has ever lived, the cost and speed advantages of MT mean that it does not need to be as good as the best human "proper" translators in order to take work away from them or to drive down prices so much that the career is no longer attractive.

Dan Lucas quoted a pessimist about the future of MT from 2010. I was a pessimist then, too, and I also regarded rosy predictions about the future usability of MT as hype. My attitude changed a year ago, when I saw the sudden improvement in the output of Google Translate. That improvement was the result of a major change shift in approach to AI, one that has shown similar quick and often unexpected improvements in other fields, not only board games.

Tom Gally


--

David J. Littleboy

unread,
Dec 12, 2017, 8:26:46 PM12/12/17
to hon...@googlegroups.com

>From: Geoffrey Trousselot
>
>There is the concept of a computer beating a go player. But what if the go
>player used the computer as a tool, and could change the rules whenever? I
>think that is more the analogy of a translator. Translators can use all the
>tools at their disposal, including MT.

Changing the rules isn't of much interest: computers will deal with new
rules better and faster than people.

But use as tools is largely what's happened with chess. Nowadays, when even
a smartphone is stronger than a grandmaster, us humans still find pleasure
and value in playing each other, and find the computer a lovely tool for
showing us where our weaknesses lie.

The current best commercial run-on-a-PC Go programs are actually quite good.
I'm rusty from 30 years of not playing (translation'll do that to you,
oops), and can only win with a 2-stone handicap if I'm really careful, but
the damn thing is really good at seeing moves that control space effectively
and is truly vicious about jumping on one's strategic mistakes. Now that I'm
semi-retired I'm hoping that in a year or so I can catch up with the current
programs and that at that point new versions of the software and the next
generation of PCs (according to Intel themselves, the per-processor
improvement in computational capabilities of their chips has only gone up
about 30% in the last _TEN_ years, but the i9 will make a 16- or
18-processor PC merely a tad pricey and 4 times faster than current PCs)
will make them challenging again.

Unlike the chess programs, though, not much thought has been given to
integrating learning with the things, and the current generation is limited
stylistically (i.e. it's got a rather boring style and doesn't challenge you
with the sorts of things the stronger players at a Go club use to beat up on
us weaker players.)

My impression is that Google really wanted their Go program to win, so they
gave the group essentially unlimited computational resources to use. But the
commercial cost of such resources means that it won't be accessible to
strong amateurs and pros for training purposes. I could blather on and on
about Go and games programming (comp. sci. and AI were my main things from
1970 to 1990), but the bottom line is that it's extremely special-purpose
programming that bears no relation to anything else in the real world, and
even its relevancy to computer science and mathematics is limited, and the
current crop of existential angst over AI is completely misplaced.

(Oh, yes. Don't get me started talking about neural networks and machine
learning. These are ideas that were, quite correctly, discarded in the early
days of AI*. Most neural network models can't recognize, in mathematical
principle, if a figure is open (e.g. a bunch of wavy lines) or closed (e.g.
a starfish). You can fix this problem, so since you can, the neural networks
fans claim it's not a problem. And then go blithely on working with models
with the problem. And, of course, neurons don't work like that. For
starters, the average neuron in the mammalian brain has 8,000 to 10,000 or
so connections, not the 6 or 10 in neural networks.)

* Minsky and Pappert covered this extensively in a seminar I sat in on
spring term, 1973.

--
David J. Littleboy
Tokyo, Japan


Reply all
Reply to author
Forward
0 new messages