Difficulty using CAT tools in J>E

165 views
Skip to first unread message

igc

unread,
Oct 20, 2017, 11:08:45 PM10/20/17
to Honyaku E<>J translation list
I have been using memoQ for some time with jobs sent to me in pdf’s or Word docs, which are relatively easy for me to set up and put into memoQ when that would be advantageous (not all jobs would be, of course).

Lately, however, I have tried a couple of jobs that were sent to me by agencies who set them up in Trados or whatever, and they threw me for a loop, because of my great naivete and inexperience with this kind of job. Apparently, the source texts were thrown into a CAT tool by someone who knew nothing about Japanese, but was handling them as though Japanese were a European language. (I suspect that these tools were originally developed by people who only knew European languages.)

Thus, many of the segments were all out of order and full of tags, such that I could not get a proper English translation. I ended up with a completed job that I could not export and send back, apparently because it was full of “errors” (as memoQ understands “errors”) that I could have avoided or ignored if I had set the job up myself as I am used to doing, and that took me a very long time to clean up, after which the thing still couldn’t be exported. As far as I could tell, there was no way I could alter the files sent to me; the segmenting and everything else was set in stone. 

When I go to the ATA conference next week, I plan to confer with the memoQ people there to see if there is a solution to this. But I would like to pick the brains of this learned group also. Has anyone else had this kind of experience, and am I missing some sort of trick for overcoming this problem? The only way I can see to handle such jobs is just to copy the source of each segment into the target slot, stick in as many English words as I can in between the tags, and go on to the next segment. Probably this results in something that the people back in the agency can make no sense of, but I don’t know anyway to produce a proper translation in this kind of situation.

TIA,

Jon Johanning
jjoha...@igc.org

igc

unread,
Oct 20, 2017, 11:13:42 PM10/20/17
to Honyaku E<>J translation list

Herman

unread,
Oct 21, 2017, 3:12:43 AM10/21/17
to hon...@googlegroups.com
As far as exporting goes, you could test the matter by copying the
source to target for all the segments and trying to export that. Even if
this works, there could still be some bug in the software that would
prevent exporting the file with the translation.

More generally though, because CAT software of this sort presupposes
that the basic structure/format remains the same between source and
target texts, if the final translated text would need to be structured
differently due to underlying difference between the languages, there is
in principle no solution - beyond making the appropriate changes to some
ultimate non-bilingual file format.

For example, if you had a patent claims formatted as a table with five
rows as follows, the English translation would need to have only four
rows, and there would be no way to represent such a situation in a
bilingual file.


Aをする装置において、 A device which does A, comprising:
X、 X;
Y、 Y; and
Z Z.
を備える装置。


Herman Kahn

Andy

unread,
Oct 21, 2017, 11:08:32 AM10/21/17
to Honyaku E<>J translation list
Hi,
We encounter this situation using the CAT Wordbee. We found many causes, including poor formatting, conversion from one file type to another, double-byte characters that are not properly recognized, and so on.
In the case of Wordbee--and I'm pretty sure in MemoQ, too--most of these issues are resolved by pre-processing the file to fix formatting, changing the text extraction rules (e.g. to minimize tags), changing segmentation rules, and other tweaks.
Sometimes we forget and send out a document such as you mentioned. It is easy to re-analyze, so I suggest sending the job back and asking them to set it up again.
Andy

Andy Jones
San Francisco, CA 


On Saturday, October 21, 2017 at 12:12:43 AM UTC-7, sl...@lmi.net wrote:
On 20/10/17 20:13, igc wrote:
> I have been using memoQ for some time with jobs sent to me in pdf’s or
> when that would be advantageous (not all jobs would be, of course).
> Thus, many of the segments were all out of order and full of tags, such
> that I could not get a proper English translation. I ended up with a

Jon Johanning

unread,
Oct 21, 2017, 12:24:16 PM10/21/17
to hon...@googlegroups.com
Andy,

Thanks for that suggestion. This confirms my suspicion that the agencies involved just don’t know anything about the Japanese language and are just routinely following a process they have established for European-language jobs. (The main company I have had problems with is a European one.) The pre-processing I do is based on my knowing a fair amount about the language and how it differs from English, of course, and if no one at the company knows about anything but European languages it spells trouble.

I am going to routinely ask every new company that contacts me for a possible relationship to ask if they plan to send CAT files, and if so, if anyone on their staff knows the problems involved in simply dumping Japanese projects into CAT tools. I expect that will sharply cut down on the jobs I get from these places, and if so, that’s fine with me, as long as I get enough work in ordinary pdf’s, Word files, etc.

Jon Johanning
jjoha...@igc.org

Herman

unread,
Oct 21, 2017, 3:26:53 PM10/21/17
to hon...@googlegroups.com
On 21/10/17 09:24, Jon Johanning wrote:
>
> Thanks for that suggestion. This confirms my suspicion that the agencies involved just don’t know anything about the Japanese language and are just routinely following a process they have established for European-language jobs. (The main company I have had problems with is a European one.) The pre-processing I do is based on my knowing a fair amount about the language and how it differs from English, of course, and if no one at the company knows about anything but European languages it spells trouble.

Well, they may not not anything about the Japanese language, but the
only difference between Japanese and European languages from the
standpoint of a CAT tool or other software is the codeset.

If you have a situation where you have to export the file in order to
deliver it and you cannot export it because of e.g. missing tags, that
is an inherently problematic situation. For instance, if you had a long
document with a whole bunch of missing tags or whatever, and had to
export it for delivery immediately in an unfinished state, you wouldn't
be able to do that.

That sort of issue is basically one of software compatibility, and can
probably only be resolved by using the same software as your customer,
so as to avoid the need to convert or export files, unless the customer
is aware of whatever compatibility issues there may be between the
customer's software and your software and ensures that they files sent
to you will be fully compatible with your software.

Note that "supporting xlf" does not ensure compatibility, because xlf is
an extensible format and can include proprietary formatting tags. The
existence of proprietary tags, different segmentation rules/limitations
and the like between different CAT software packages makes it highly
likely that some sort of problem will at times arise for that reason alone.

Herman Kahn

Dan Lucas

unread,
Oct 21, 2017, 3:30:42 PM10/21/17
to hon...@googlegroups.com
Jon, most translators who use CAT tools have had to deal with the
problem of excessive or inconveniently positioned tags at some point. As
one of that number, I sympathise with your plight.

Nevertheless, I really don't think this is likely to be caused primarily
by a lack of appreciation of Japanese on the part of the client. The
issues that Andy raises ("poor formatting, conversion from one file type
to another, double-byte characters that are not properly recognized")
can arise even when the people generating the document understand
Japanese.

I do a lot of work with certain Japanese agencies and they also
occasionally supply a file with lots of tags. In such cases it usually
means that the client has failed to correctly preprocess the source
material, just as Andy says. A document generated by an OCR process will
often have a large number of unnecessary formatting tags, for example.
If such files are not cleaned up, they can be hell to work with.

I think you would make a better impression if, rather than assuming that
the problem lies in a poor understanding of Japanese, you were to
explain to new clients that you only accept CAT tool files once you have
inspected them and that you will reject files with unusually large
numbers of tags. Such an approach would also, in my opinion, be more
likely to result in you receiving an acceptable file in the first place.

Regards,
Dan Lucas
> --
> You received this message because you are subscribed to the Google Groups
> "Honyaku E<>J translation list" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to honyaku+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Rene

unread,
Oct 21, 2017, 5:24:14 PM10/21/17
to hon...@googlegroups.com
On Sat, Oct 21, 2017 at 12:13 PM, igc <jjoha...@igc.org> wrote:
Thus, many of the segments were all out of order and full of tags, such that I could not get a proper English translation. I ended up with a completed job that I could not export and send back, apparently because it was full of “errors” (as memoQ understands “errors”) that I could have avoided or ignored if I had set the job up myself as I am used to doing, and that took me a very long time to clean up, after which the thing still couldn’t be exported. As far as I could tell, there was no way I could alter the files sent to me; the segmenting and everything else was set in stone. 

​You are probably talking about tag errors. You might have gotten around it by exporting the file, remove all tags, and then work on a clean file.

That said, I don´t see how any CAT program can deal with Japanese (or Chinese, or Thai etc) the same way as with European ​
.
​languages, as long as there are no spaces between words to break down sentences.

Maybe a clever programmer can write a program to insert spaces into Japanese texts; then we would have a different situation.

Rene​


Michael J.W. Beijer

unread,
Oct 22, 2017, 1:55:00 PM10/22/17
to Honyaku E<>J translation list
Not sure how it does it, but I am pretty sure CafeTran handles Japanese just fine.

Felix is also much loved among Japanese translators, as far as I remember.

Maybe worth looking into.

Michael
--
*<This email was dictated using Dragon Professional Individual 15. Please
excuse any typos!>*

Rene

unread,
Oct 22, 2017, 2:14:47 PM10/22/17
to hon...@googlegroups.com
On Sun, Oct 22, 2017 at 11:32 PM, Michael J.W. Beijer <mic...@beijer.uk> wrote:
Not sure how it does it, but I am pretty sure CafeTran handles Japanese just fine.
Felix is also much loved among Japanese translators, as far as I remember.
Maybe worth looking into.

​Depends what you mean by "handles". No CAT program can handle non-segmented languages in the same way as segmented languages.

​Just take a European language text, remove all the spaces, feed it into Trados or MemoQ or Cafetran, and see how milage you get.

Ironically, the concept of spaces between is not unknown in Japan... just go to a Japanese karaoke box, and voila, there they actually use them in the jimaki.

Rene

Jon Johanning

unread,
Oct 22, 2017, 4:10:18 PM10/22/17
to hon...@googlegroups.com
Dan,

Good point about communicating with companies. I’ll keep that in mind. I still think they probably don’t have anyone who knows about 日本語. Perhaps there is an assumption that if a computer system can handle Unicode (which all of them can at this point, I think) that’s all one needs to worry about. But just being able to display the text is only the start.

Jon Johanning
Japanese-to-English translation
jjoha...@igc.org
www.jcjtrans.com

Jon Johanning

unread,
Oct 22, 2017, 4:18:56 PM10/22/17
to hon...@googlegroups.com
> That said, I don´t see how any CAT program can deal with Japanese (or Chinese, or Thai etc) the same way as with European ​.​languages, as long as there are no spaces between words to break down sentences.

I don’t think that’s a problem. When I use memoQ on my own source texts, of course there are no spaces between the Japanese words, but that’s fine. The problem I see is that the segmenting often gets screwy because of the very different syntaxes of the languages.

For example, in a recent job which was a Powerpoint presentation about the advantages of a certain drug, most of the slides contained graphs and charts showing the results of clinical trials, and these slides had titles over the graphs and charts. The titles were in a large size of type, such that the longer ones were broken into two lines. Of course, the breaks came whenever there wasn’t enough space left on that line—in the middle of words or wherever. No problem at all for a human being who could read Japanese, but the program that did the segmenting put each line into a separate segment, and the human being overseeing that process apparently couldn’t read Japanese and didn’t see the problem.

When I got the job, I had no way to change the segmenting so that the titles were in single segments; if I were setting the whole thing up myself, I would just join those segments.

Because of the huge syntactical differences between the languages, translating those segments as they were given to me would result in nonsense English. In such a case, I guess, one would just have to hope that the people back at the agency could figure out what was going on.

Jon Johanning

unread,
Oct 22, 2017, 4:20:43 PM10/22/17
to hon...@googlegroups.com
Michael,

I don’t think that what I am worried about would be helped by switching programs.

Any program that can use Unicode can display the Japanese. The problem is with the syntactical differences in the languages messing up the segmenting.

Jon Johanning
Japanese-to-English translation
jjoha...@igc.org
www.jcjtrans.com


Rene

unread,
Oct 22, 2017, 4:35:08 PM10/22/17
to hon...@googlegroups.com
On Mon, Oct 23, 2017 at 5:18 AM, Jon Johanning <jjoha...@igc.org> wrote:
I don’t think that’s a problem. When I use memoQ on my own source texts, of course there are no spaces between the Japanese words, but that’s fine. The problem I see is that the segmenting often gets screwy because of the very different syntaxes of the languages.

​It is not "problem" but it is not fine, either. To see the full power of a CAT program, you´d have try and use it between two European languages.
 
For example, in a recent job which was a Powerpoint presentation about the advantages of a certain drug, most of the slides contained graphs and charts showing the results of clinical trials, and these slides had titles over the graphs and charts. The titles were in a large size of type, such that the longer ones were broken into two lines. Of course, the breaks came whenever there wasn’t enough space left on that line—in the middle of words or wherever.

​That is a completely different issue. Of you course you have wrong segmentation if a formatted file (not even to mention an OCRed file) is dumped into a CAT program. You have the same problem in all languages, that has nothing to do with Japanese.
​Rene ​

Michael Joseph Wdowiak Beijer

unread,
Oct 22, 2017, 4:35:56 PM10/22/17
to hon...@googlegroups.com
Hmm, OK.

However, keep in mind that in Felix, you segment your source text on-the-fly. That is, it does have built in segmentation rules, but you can just as easily select any expanse of characters and make that your segment. This is possible because it works directly inside MS Word, PowerPoint and Excel. Please have a look: http://felix-cat.com

As far as I know, the developer is a Japanese>English translator himself.




Michael

********************************
MICHAEL JOSEPH WDOWIAK BEIJER
Dutch-English translator
& e-terminologist
Hastings, United Kingdom.
Mob. +44 (0)747 5771720
​Tel. +44 (0)1424 430250
Email: mic...@beijer.uk
Beijer.uk
********************************

> To unsubscribe from this group and stop receiving emails from it, send an email to honyaku+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Honyaku E<>J translation list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/honyaku/HCo9XqrbkQw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to honyaku+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Herman

unread,
Oct 22, 2017, 7:10:23 PM10/22/17
to hon...@googlegroups.com
On 22/10/17 13:18, Jon Johanning wrote:

> For example, in a recent job which was a Powerpoint presentation about the advantages of a certain drug, most of the slides contained graphs and charts showing the results of clinical trials, and these slides had titles over the graphs and charts. The titles were in a large size of type, such that the longer ones were broken into two lines. Of course, the breaks came whenever there wasn’t enough space left on that line—in the middle of words or wherever. No problem at all for a human being who could read Japanese, but the program that did the segmenting put each line into a separate segment, and the human being overseeing that process apparently couldn’t read Japanese and didn’t see the problem.
>
> When I got the job, I had no way to change the segmenting so that the titles were in single segments; if I were setting the whole thing up myself, I would just join those segments.
>
> Because of the huge syntactical differences between the languages, translating those segments as they were given to me would result in nonsense English. In such a case, I guess, one would just have to hope that the people back at the agency could figure out what was going on.

Some CAT tools don't allow joining two paragraphs (i.e. lines separated
by a line feed/carriage return character) into a single segment.

In a case where a single sentence is separated into multiple paragraphs
in this manner, one would just translate the two sequential source
segments such that the corresponding sequential target segments yield a
valid sentence. The target text in such a case will often not correspond
in its content to the source text, which goes against the point of using
CAT, but there shouldn't be any problem in the final translated document.

The inability to freely join segments is clearly disadvantageous in some
cases, but can have some advantages as well. For example, in a situation
of multi-party collaboration where multiple translator each work on a
section of a document or the like, if each translator was able to freely
alter the segmentation, it may make it impossible to integrate their
work into a single document.

I too at first had a problem with improperly segmented documents and
would complain vociferously about it, but eventually I got used to it,
so I would say this is a problem that can be easily dealt through minor
adaption of one's translation work process.

Herman Kahn

Rene

unread,
Oct 22, 2017, 10:16:54 PM10/22/17
to hon...@googlegroups.com
On Mon, Oct 23, 2017 at 5:32 AM, Michael Joseph Wdowiak Beijer <mic...@beijer.uk> wrote:
Hmm, OK.

However, keep in mind that in Felix, you segment your source text on-the-fly. That is, it does have built in segmentation rules, but you can just as easily select any expanse of characters and make that your segment. This is possible because it works directly inside MS Word, PowerPoint and Excel.

​... which is completely irrelevant, if your customer sends you a pre-segmented xliff file...

​Rene​

Jon Johanning

unread,
Oct 23, 2017, 11:45:29 AM10/23/17
to Honyaku E<>J translation list
Again, my problem is with jobs in which the agency sets it all up and sends it to me as a CAT file that I can’t modify at all.

I can segment the source text in memoQ any way I want as long as I am starting with the source text. What I am having trouble with is dealing with CAT files from agencies which I can’t modify.

Jon Johanning
Japanese-to-English translation
jjoha...@igc.org
www.jcjtrans.com


Jon Johanning

unread,
Oct 23, 2017, 11:48:23 AM10/23/17
to hon...@googlegroups.com
> On Mon, Oct 23, 2017 at 5:18 AM, Jon Johanning <jjoha...@igc.org> wrote:
> I don’t think that’s a problem. When I use memoQ on my own source texts, of course there are no spaces between the Japanese words, but that’s fine. The problem I see is that the segmenting often gets screwy because of the very different syntaxes of the languages.

Rene:

> It is not "problem" but it is not fine, either. To see the full power of a CAT program, you´d have try and use it between two European languages.

I’m not working between two European languages. That’s irrelevant to my situation.

My comment:

> ​For example, in a recent job which was a Powerpoint presentation about the advantages of a certain drug, most of the slides contained graphs and charts showing the results of clinical trials, and these slides had titles over the graphs and charts. The titles were in a large size of type, such that the longer ones were broken into two lines. Of course, the breaks came whenever there wasn’t enough space left on that line—in the middle of words or wherever.

Rene:

> That is a completely different issue. Of you course you have wrong segmentation if a formatted file (not even to mention an OCRed file) is dumped into a CAT program. You have the same problem in all languages, that has nothing to do with Japanese.

In other words, there is no solution to my problem other than trying to get the agencies to do things right.

Jon Johanning

unread,
Oct 23, 2017, 11:55:07 AM10/23/17
to hon...@googlegroups.com

> On Oct 22, 2017, at 7:10 PM, Herman <sl...@lmi.net> wrote:
>
> I too at first had a problem with improperly segmented documents and would complain vociferously about it, but eventually I got used to it, so I would say this is a problem that can be easily dealt through minor adaption of one's translation work process.

Sure, I realize that the agency is probably working with a project in which I am only one translator involved, and in quite a few cases, probably, the end client wants the same text translated into several languages, in fact, so that the Japanese translation is only a part of it.

I don’t know whether I can get used to this sort of situation. The only adaptation to my problems that I can see is to throw in English words wherever I can, regardless of what nonsense it makes, and hope that the agency can figure out what I did.

The ultimate solution may be just to avoid CAT-formatted jobs altogether, unless I can work out a solution with the agencies. And try to get as much work as possible from agencies that don’t give me CAT-formatted jobs (which is mostly what I am doing now).

Rene

unread,
Oct 23, 2017, 11:38:01 PM10/23/17
to hon...@googlegroups.com
On Tue, Oct 24, 2017 at 12:48 AM, Jon Johanning <jjoha...@igc.org> wrote:

In other words, there is no solution to my problem other than trying to get the agencies to do things right.

​Yep. If you get an xliff or equivalent file full of wrong segmentation and tags from hell, you are in a bad place. This is not a problem that is limited to Japanese.

Rene von REntzell

Rene

unread,
Oct 23, 2017, 11:38:09 PM10/23/17
to hon...@googlegroups.com
On Tue, Oct 24, 2017 at 12:55 AM, Jon Johanning <jjoha...@igc.org> wrote:


The ultimate solution may be just to avoid CAT-formatted jobs altogether, unless I can work out a solution with the agencies. And try to get as much work as possible from agencies that don’t give me CAT-formatted jobs (which is mostly what I am doing now).


​There is no such thing as "CAT formatted jobs". There are CAT files that include formatting information which are properly prepared and easy to work with. And then there CAT files from hell, probably prepared by lazy nincompoops, which is apparently which are looking at now.

Rene von Rentzell​

Reply all
Reply to author
Forward
0 new messages