Use of CAT for Japanese>English

284 views
Skip to first unread message

Jon Johanning

unread,
Jun 22, 2010, 12:52:00 PM6/22/10
to hon...@googlegroups.com
As a neophyte in the world of CAT, I have what I know is a stupid
question about using CAT apps for Japanese>English.

Specifically, I am using OmegaT+ at the moment. It is quite helpful in
handling the enormous repetition you find in a lot of patents, but these
documents also have very long sentences, often broken up by chemical or
mathematical formulas, etc. When the CAT app segments them, small parts
of the sentences turn up in separate segments, and since Japanese word
order has no relationship to English word order, of course, I find that
I have to rearrange everything in the word-processor file that is the
ultimate product.

Is there a way of reducing the amount of labor this requires, or is it a
problem one just has to live with?

Jon Johanning // jjoha...@igc.org

Matthew Schlecht

unread,
Jun 22, 2010, 1:09:27 PM6/22/10
to hon...@googlegroups.com
     In my experience, this is a fact of life when doing JA>EN patent translations.  It is particularly troublesome in the claims.  Usually, the first word/phrase in the English version will correspond to the last moji string in the Japanese version.
     I use Trados, and get by with a few tricks.  One is to remove many/most of the hard returns within a single claim, which improves the logical flow a bit.  Another is to put in some easily recognized symbol (I use red double asterisks, for no special reason) as a bookmark at positions where I know that I will later need to come back and reformulate the text to make sense in English.  I do this re-stitching together during my final proof, and then reintroduce the hard returns where they make sense to mimic the structure of the source claim.
     The same goes for structures or figures that appear within a claim.  These can be moved to the end leaving a marker like 化1 (allowing the claim sentence to flow freely), and reintroduced in their proper location in the final English version.
     If you don't do any reshaping of the source text, then the GIGO effect creeps in and you will lose much of the benefit of the TM since the linguistic elements within the source and target segments will not really be equivalent.

Matthew Schlecht

Jon Johanning

unread,
Jun 22, 2010, 4:05:25 PM6/22/10
to hon...@googlegroups.com
On 6/22/10 1:09 PM, Matthew Schlecht wrote:
> In my experience, this is a fact of life when doing JA>EN patent
> translations. It is particularly troublesome in the claims. Usually,
> the first word/phrase in the English version will correspond to the
> last moji string in the Japanese version.
> I use Trados, and get by with a few tricks. One is to remove
> many/most of the hard returns within a single claim, which improves
> the logical flow a bit. Another is to put in some easily recognized
> symbol (I use red double asterisks, for no special reason) as a
> bookmark at positions where I know that I will later need to come back
> and reformulate the text to make sense in English. I do this
> re-stitching together during my final proof, and then reintroduce the
> hard returns where they make sense to mimic the structure of the
> source claim.
> The same goes for structures or figures that appear within a
> claim. These can be moved to the end leaving a marker like 化1
> (allowing the claim sentence to flow freely), and reintroduced in
> their proper location in the final English version.
> If you don't do any reshaping of the source text, then the GIGO
> effect creeps in and you will lose much of the benefit of the TM since
> the linguistic elements within the source and target segments will not
> really be equivalent.
Thanks for the info.

Looks very similar to the procedure I have arrived at (I use // as a
"bookmark"). What I do with the OCR'ed pdfs I often deal with is to fix
up the text file from the OCR to some extent before I feed it to the
CAT. Obviously one has to use some judgment about what jobs are worth
the trouble.

So far I haven't done any CAT jobs sent from agencies (usually they
require Trados and I can get out of these jobs that way), but there is
an agency I work for who will start giving me jobs using another CAT app
at some point, and I will see what happens with their stuff when it arrives.

Jon Johanning // jjoha...@igc.org

Charles Aschmann

unread,
Jun 22, 2010, 4:24:46 PM6/22/10
to hon...@googlegroups.com
The best solution to this problem is to use a CAT tool that allows you
to join and split segments (DVX, MemoQ and others. Felix allows you to
highlight and open a segment of any length or expand to what you want.
Older versions of Trados allowed you to expand segments to a certain
extent, but there were limitations. (I am not sure about the 2009
version on this score.) There has been a loud demand for joining and
splitting segments in OmegaT, but I do not think it has been
implemented. At one time at least, it was difficult to do. I am not sure
now.
I use DVX and can freely combine segments across line breaks and then
rearrange the line breaks to fit the English pattern I want. This
combined with on-the-fly addition of phrases to your glossary, so that
you can use chunks of text from the glossary also helps shorten the
time. I cannot remember if OmegaT has implemented on-the-fly glossary
entry, but I do know of a work-around for it using Auto-It scripts. If
you contact me privately, I will be glad to show it to you. (I did a
presentation on it a few years back at an IJET, and I think I still have
all the information.)

Not having a CAT tool with the above features, many of Matthew
Schlecht's suggestions will work. In fact, it is a very good to
pre-process all files a little before feeding them to a CAT tool.
Another thing to do with chemical patent claims is to rearrange them
into the English order (generally just bringing the ending phrase in
Japanese forward does the trick, even though the Japanese does not make
sense that way. If you then have glossary phrases, you can just pop them
in from you glossary in most CAT tools. (I am not sure if transfer from
glossary to target segment is available in OmegaT, but I think you can
copy and paste at least.)

At the very least, you can remove the unnecessary line breaks and
graphics to put them back later. Remove the breaks so the entire claim
is segmented together.

When there are long lists of chemicals, you can rearrange and then
separate the chemicals into single units to make you TM more effective
later, but I have found that having a good glossary is even better.
(Some CAT tools have assembly features that will bring the lists
together if you have a really good glossary.)

Charles Aschmann

Charles Aschmann

unread,
Jun 22, 2010, 4:25:02 PM6/22/10
to hon...@googlegroups.com
--
You received this message because you are subscribed to the Honyaku Mailing list.
To unsubscribe from this group, send email to honyaku+u...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/honyaku?hl=en?hl=en

Uwe Hirayama

unread,
Jun 22, 2010, 5:06:24 PM6/22/10
to hon...@googlegroups.com
Dear Jon,

it feels good to read that others use a similar s/w equipment and
share the same problems.

I have been trying to migrate from OmegaT to Across which is
another cheap way of using TM and may be that it allows to
change segment borders after segmentation which would
solve the problem.

However, in my point of view Across is as function overloaded as
Trados and Transit (STAR AG) and OmegaT is much easier to
handle than Across and other "bolides".

So I have changed my attitude and do not consider no longer the
pre- and postediting as problem, but as part of a defined translation
process.

My steps are:

1st: Copying the text from the source into the Windows editor
2nd: Make sure that all headers appear in separate lines
3rd: deleting unneeded information which does not appear in
the corresponding PDF-File
4th: Copying the text from Editor to Open Office
5th: Presegmentation (add "new line" after each "maru"),
replacing 2-byte numbers by one-byte numbers, assigning
formates (templates) and other preediting work when
needed.
6th: translating using OmegaT.
7th: post editing (restauraton of paragraphs if they
contained more than one snetence ending with a maru; others
like adding Tabs or NewLine where needed)
8th: Spell check and proof reading

There is some work for which writing macros may be a solution
but I am not that familfamiliar with writing macros.

Best regards,

Uwe Hirayama

Jean-Christophe Helary

unread,
Jun 22, 2010, 8:44:28 PM6/22/10
to hon...@googlegroups.com

On 23 juin 10, at 01:52, Jon Johanning wrote:

> It is quite helpful in handling the enormous repetition you find in a lot of patents, but these documents also have very long sentences, often broken up by chemical or mathematical formulas, etc. When the CAT app segments them, small parts of the sentences turn up in separate segments, and since Japanese word order has no relationship to English word order, of course, I find that I have to rearrange everything in the word-processor file that is the ultimate product.
>
> Is there a way of reducing the amount of labor this requires, or is it a problem one just has to live with?

Change the segmentation rules to fit the patterns in the documents.

> Specifically, I am using OmegaT+ at the moment.

Is there a specific reason why you use OmegaT+ and not OmegaT ?


Jean-Christophe Helary
----------------------------------------
fun: http://mac4translators.blogspot.com
work: http://www.doublet.jp (ja/en > fr)
tweets: http://twitter.com/brandelune

Charles Aschmann

unread,
Jun 22, 2010, 11:18:43 PM6/22/10
to hon...@googlegroups.com
On 2010/06/22 20:44, Jean-Christophe Helary wrote:
> Change the segmentation rules to fit the patterns in the documents.
>
In the instance mentioned, this will not work. The claims and other
sections contain breaks that need to be breaks elsewhere in the same
document. Changing the segmentation rules will just mess up another part
of the document. The pre-processing or CAT tool that can combine and
split segments are usable options. If OmegaT is the tool of choice, the
former is necessary for these patent instances.

Charles Aschmann

Jean-Christophe Helary

unread,
Jun 23, 2010, 12:41:37 AM6/23/10
to hon...@googlegroups.com

Since I almost never encounter segments that require merging or splitting without being able to find a quick round trip I can't really say, plus I don't do patents... But I'd like to see one of those documents where there are formulas that break the structure.

Also, Anaphraseus (a Wordfast equivalent for OpenOffice.org/NeoOffice) works well to in such contexts. I don't use it very much so I don't know if it does splitting merging but that would sure be with a try.

Minoru Mochizuki

unread,
Jun 22, 2010, 9:44:52 PM6/22/10
to hon...@googlegroups.com
The problem presented here is not specific to the use of CAT tools in J2E
translation. It naturally occurs E2J translation as well.
Some clients who have only limited knowledge about the differences between
languages assume that all words, phrases and clauses in the original
document language should be and could be matched with those in the target
language. This happens more frequently in so-called "localization" jobs in
which computer savvy desktop printers just assume that they can force
translators to obey their orders so that they could mechanically reassemble
words, phrases and clauses to reconstruct a sentence in a target language.

The problem is of course not because of any specific CAT tool.

Minoru Mochizuki

Akiko Sato

unread,
Jun 23, 2010, 10:57:43 PM6/23/10
to hon...@googlegroups.com
Dear All,

The West Japan Committee, Japan Translation Federation would like to let you
know the following event.

---------------------
The 2nd JTF West Japan Seminar on July 2, 2010 (Fri.)

Theme: Translation from Japanese into English of Business Documents and
Advanced International Business
Seminar Instructor: Shintaro Tominaga, Cross-Cultural Business Consultant
http://jp.linkedin.com/in/feilong
MC: Naomi Kaminaga, President, Fan Works Co., Ltd.
http://abac.asia/business/

Date: July2, 2010 (Fri.)  Time: from 2:00 p.m. to 5:00 p.m.
Venue: The Consortium of Universities in Osaka
Campus Port Osaka , Room E, 4th Floor, Osaka Ekimae Daini Building,
1-2-2-400, Umeda, Kita-ku, Osaka City
TEL:06-6344-9560 FAX:06-6344-956 
http://www.consortium-osaka.gr.jp/about/access.html

Operated by: JTF West Japan Committee
Sponsored by: Kansai Bureau of Economy, Trade and Industry, Ministry of
Economy, Trade and Industry

Summary:
Part 1: To Work on Japanese-English Translation of International Telephone
Quarterly Meeting of a Japanese company.
Part 2: To concentrate on Japanese-English Translation of a brochure of
ASAHI INTECC, a company manufacturing medical appliances.
Part 3: To address developing translation into international business.
Part 4: To make use of The Nikkei Weekly for Japanese-English Translation.

Admission fees:
JTF member:2,500 yen
Non member:3,500 yen

Application Deadline
June 29, 2010 (Tue.)
Application will close when it will reach the fixed number.

Please make your application on the following site:
http://www.jtf.jp/west_seminar/index_w.do?fn=search
------------

Akiko Sato,
JTF Director

Jon Johanning

unread,
Jun 24, 2010, 1:18:00 PM6/24/10
to hon...@googlegroups.com
On 6/22/10 5:06 PM, Uwe Hirayama wrote:
> I have been trying to migrate from OmegaT to Across which is another
> cheap way of using TM and may be that it allows to change segment
> borders after segmentation which would solve the problem.
>
> However, in my point of view Across is as function overloaded as
> Trados and Transit (STAR AG) and OmegaT is much easier to handle than
> Across and other "bolides".
That's the problem I am discovering with CAT apps; they're like word
processors in that, to get some feature or other (like combining
segments) that you want, you have to take a lot of functions you'll
probably never need. Also, you will have to wade through a manual or
"help" that seems to have been written by and for advanced sentient
beings on some other planet, and pay through the nose (Swordfish, I'm
looking at you).

> So I have changed my attitude and do not consider no longer the pre-
> and postediting as problem, but as part of a defined translation process.
>
> My steps are:
>
> 1st: Copying the text from the source into the Windows editor
> 2nd: Make sure that all headers appear in separate lines
> 3rd: deleting unneeded information which does not appear in the
> corresponding PDF-File
> 4th: Copying the text from Editor to Open Office
> 5th: Presegmentation (add "new line" after each "maru"), replacing
> 2-byte numbers by one-byte numbers, assigning formates (templates)
> and other preediting work when
> needed.
> 6th: translating using OmegaT.
> 7th: post editing (restauraton of paragraphs if they contained
> more than one snetence ending with a maru; others
> like adding Tabs or NewLine where needed)
> 8th: Spell check and proof reading

I'm using much the same process, translated into Mac terms.

Jon Johanning // jjoha...@igc.org

Jon Johanning

unread,
Jun 24, 2010, 1:20:48 PM6/24/10
to hon...@googlegroups.com
On 6/22/10 8:44 PM, Jean-Christophe Helary wrote:
> Change the segmentation rules to fit the patterns in the documents.
That would be a good idea if Japanese patent writers followed
"patterns." Patterns? It would be a big help if they would even write
correct Japanese.

>> Specifically, I am using OmegaT+ at the moment.
>>
> Is there a specific reason why you use OmegaT+ and not OmegaT ?
>

I just find it more comfortable to use, personally, in various respects.
But it's not much different.

Jon Johanning // jjoha...@igc.org

Mark Spahn

unread,
Jun 24, 2010, 3:41:23 PM6/24/10
to hon...@googlegroups.com

>> However, in my point of view Across is as function overloaded as
>> Trados and Transit (STAR AG) and OmegaT is much easier to handle than
>> Across and other "bolides".

Is "bolide" a term of art in software jargon meaning
"a bloated monstrosity overloaded with seldom-used functions"?
A Google search on "define:bolide" yields only entries like
"an exploding or fragmenting meteor or fireball".
Curious,
Mark Spahn (West Seneca, NY)

Jean-Christophe Helary

unread,
Jun 24, 2010, 8:03:28 PM6/24/10
to hon...@googlegroups.com

On 25 juin 10, at 02:20, Jon Johanning wrote:

> On 6/22/10 8:44 PM, Jean-Christophe Helary wrote:
>> Change the segmentation rules to fit the patterns in the documents.
> That would be a good idea if Japanese patent writers followed "patterns." Patterns? It would be a big help if they would even write correct Japanese.

:)

>> Is there a specific reason why you use OmegaT+ and not OmegaT ?
>
> I just find it more comfortable to use, personally, in various respects. But it's not much different.

Would you mind giving details ?

Wolfgang Bechstein

unread,
Jun 24, 2010, 9:00:08 PM6/24/10
to hon...@googlegroups.com
Mark Spahn wrote:

> Is "bolide" a term of art in software jargon meaning
> "a bloated monstrosity overloaded with seldom-used functions"?
> A Google search on "define:bolide" yields only entries like
> "an exploding or fragmenting meteor or fireball".

No, it's just another one of Uwe's "Germanisms" (his messages are
peppered with them). In German, "Bolide" (pronounced "bo-leed-ay")
refers to a high-powered sports car and is also sometimes used for other
powerful, robust, and/or slightly menacing pieces of machinery or
systems (high-end amplifiers weighing a ton and costing a fortune are
another example). It derives from the same root as English, namely a
fireball or meteor, and it still has that meaning in German, too, but
the other meaning could be rendered in English as "over-engineered".

Wolfgang Bechstein

Uwe Hirayama

unread,
Jun 24, 2010, 10:50:38 PM6/24/10
to hon...@googlegroups.com
Dear Mark and Wolfgang,

thanks for making me aware of another "Germanism".

@Wolfgang

> No, it's just another one of Uwe's "Germanisms" (his messages are
> peppered with them).

You can be glad that you did not have the chance to hear my
accent. To put it in other words: (NES, please close your eyes)
Ze mor my Japanees improofs the badder my English becoms :-)

Well, I am glad that this is a mailing list for translators who work
with Japanese but not necessarily with English. (BTW this is
the reason why I usually write JP2GER TRSL in the last line.)

>> Is "bolide" a term of art in software jargon meaning
>> "a bloated monstrosity overloaded with seldom-used functions"?
>> A Google search on "define:bolide" yields only entries like
>> "an exploding or fragmenting meteor or fireball".

This, however, prooves that the image (methapher?) works
somehow.

OmegaT may appear like the "Nano" of Tata but it
does what a translation memory should do.

I once happened to use a "satellite" version of the
product of STAR AG (Was it called Transit?) and
it did its job very fine and in particular the dictionary
functions were excellent. But for me as a fan of
shortcuts operating the software resulted in a kind
of "apparatus gymnastics" for the fingers (as far as
I remember).

Across may be an alternative, however, registring
vocabulary took to many steps and I always felt
a kind of uncertainty about its cooperation with the
underlying database (Microsoft SQL Server):

When I started the program sometimes it prompted
me for entering a password which I never ever had
registered before. So I felt always uncertain whether
I will be able to start it at the next session.

Kind regards,

Uwe Hirayama
hira...@t-online.de
JP2GER TRSL

Jon Johanning

unread,
Jun 25, 2010, 1:13:08 PM6/25/10
to hon...@googlegroups.com
On 6/24/10 8:03 PM, Jean-Christophe Helary wrote:
> Would you mind giving details ?
>
>
To be honest, Jean-Christophe, I can't remember now exactly why I
switched, but I found OmegaT+ easier to use for my purposes for some
reason.

One thing I recall was that, since I was mainly using OmegaT on jobs
that I received as pdfs and had to OCR to get a text file, the OCR
software did lots of strange things to the Japanese, which produced huge
numbers of completely unnecessary tags when I got it into OmegaT. I
eventually figured out how to eliminate a lot of the tags by
manipulating the text files before I fed them to OmegaT, but there were
still quite a few, which were quite a nuisance.

Of course, this was not OmegaT's fault, but I had to just ignore the
tags and bypass the tag reconciliation step at the end of the job, and
if I recall correctly the only way I found of getting an actual text
file which I could dump into a word processor and finish up with was to
run my cursor over the whole translation in OmegaT, select it, and copy
and paste into the word processor.

At any rate, when I use OmegaT+ now, I find that I don't have the
useless tag problem, and when I get through the job, I can just
"generate translation," convert that file from .odt to .rtf, and clean
up the .rtf file.

Jon Johanning // jjoha...@igc.org

Reply all
Reply to author
Forward
0 new messages