Translate whole TMX or XLIFF file with Google Translate in one pass

Hans list

unread,

Dec 24, 2014, 3:41:53 AM12/24/14

to cafetra...@googlegroups.com

For people with limited internet access: is it possible to translate a whole TMX or XLIFF file with Google Translate in one pass, on those moments where they do have access to the internet?

Rene

unread,

Dec 24, 2014, 3:43:44 AM12/24/14

to cafetra...@googlegroups.com

Don´t know about Goggle translate, but the "MyMemory" site (which I suppose links to Goggle) offers the option of uploading a file and getting a machine translated TMX back.

Rene

On Wed, Dec 24, 2014 at 5:41 PM, Hans list <hans...@gmail.com> wrote:

For people with limited internet access: is it possible to translate a whole TMX or XLIFF file with Google Translate in one pass, on those moments where they do have access to the internet?

--
You received this message because you are subscribed to the Google Groups "CafeTranslators" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cafetranslato...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hans van den Broek

unread,

Dec 24, 2014, 3:52:25 AM12/24/14

to Cafetran support

On 24 Dec 2014, at 15:43, Rene <Yoi...@gmail.com> wrote:

Don´t know about Goggle translate, but the "MyMemory" site (which I suppose links to Goggle) offers the option of uploading a file and getting a machine translated TMX back.

Wouldn’t that require the original source file? TMX and XLF are bilingual, and I’d call them “Projects”, rather than “Documents”, and I don’t think you can upload them as such to an MT.

Cheers,

Hans

--

Hans van den Broek

Schrijf-, vertaal- en redigeerwerk

Yogyakarta

Indonesia

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Hans van den Broek

unread,

Dec 24, 2014, 3:57:39 AM12/24/14

to Cafetran support

On 24 Dec 2014, at 15:51, Hans van den Broek <ir...@indo.net.id> wrote:

...and I don’t think you can upload them as such to an MT.

You can of course convert the TMX/XLF, and upload the SL part.

Hans list

unread,

Dec 24, 2014, 4:05:24 AM12/24/14

to cafetra...@googlegroups.com, yoi...@gmail.com

Das wäre über Download a TMX und würde eine Anmeldung erfordern?

Hans van den Broek

unread,

Dec 24, 2014, 4:07:33 AM12/24/14

to Cafetran support

On 24 Dec 2014, at 15:56, Hans van den Broek <ir...@indo.net.id> wrote:

You can of course convert the TMX/XLF, and upload the SL part.

And there’s an additional problem with that workflow, I think: You can only use it for manual search. Anything else will trigger only perfect matches. Like the TMX as a result of Total Recall. You could of course write a simple macro so the MT translated TMX progresses with the project.

Rene

unread,

Dec 24, 2014, 4:18:03 AM12/24/14

to cafetra...@googlegroups.com

I don´t understand the logic here. If your TMX and SDLXIFF are "bilingual", why do you need to translate them in the first place?

From the question, I assumed that they are untranslated, so only contain the source language, aka monolingual and not bilingual.

So just save them as "source" in the CAT of your choice, and you have the "original source file". Which you upload to MyMemory and then get a translated, aka bilingual file back.

What am I missing here?

Hans van den Broek

unread,

Dec 24, 2014, 4:25:52 AM12/24/14

to Cafetran support

On 24 Dec 2014, at 16:17, Rene <Yoi...@gmail.com> wrote:

I don´t understand the logic here.

Few people seem to agree with me, including SDL’s PaulF.

If your TMX and SDLXIFF are "bilingual", why do you need to translate them in the first place?

The structure is bilingual. When you finished the project, you end up with a bilingual TMX and XLF file. When you start a project, chances are there’s nothing on the TL side of the files.

From the question, I assumed that they are untranslated, so only contain the source language, aka monolingual and not bilingual.

But you’ll have to convert them to a monolingual (source) file, unless you have the source document. I don’t think there’s an MT that can handle TMX/XLF as such.

Rene

unread,

Dec 24, 2014, 4:34:36 AM12/24/14

to cafetra...@googlegroups.com

Does it matter`? It is just one additional step. Sausage factory type CATs require a whole bunch of additional steps in the xlation process anyway, so since when is one single step an issue?

Merry Xmas
Rene

--

Hans van den Broek

unread,

Dec 24, 2014, 4:39:24 AM12/24/14

to Cafetran support

On 24 Dec 2014, at 16:17, Rene <Yoi...@gmail.com> wrote:

...so only contain the source language, aka monolingual and not bilingual.

This is an example of a CT XLF file that only contains SL segments. I’d still call it a “bilingual” file because of the structure.

Rene

unread,

Dec 24, 2014, 4:55:02 AM12/24/14

to cafetra...@googlegroups.com

Well, just remove all the gobblydegook. What is the problem?

Fwiw, I just checked, and it is possible to upload an SDLXLIFF file to Wordfast Anywhere. It gets automatically converted to a bilingual Wordfast/Trados type file.

That, you can download, and save the source from Wordfast, Trados, or simply from Word by writing a small macro. Et voila, a text source file without additional gunk.

Rene

--

Hans van den Broek

unread,

Dec 24, 2014, 5:19:10 AM12/24/14

to Cafetran support

On 24 Dec 2014, at 16:54, Rene <Yoi...@gmail.com> wrote:

Well, just remove all the gobblydegook. What is the problem?

No problem. CT can do it in a flash. But the answer to the original question would still be No.

Fwiw, I just checked, and it is possible to upload an SDLXLIFF file to Wordfast Anywhere. It gets automatically converted to a bilingual Wordfast/Trados type file.

But does Wf Anywhere provide MT (MpreT)?

That, you can download, and save the source from Wordfast, Trados, or simply from Word by writing a small macro. Et voila, a text source file without additional gunk.

Too many steps, Rene, too many. As compared to a CT procedure. But the macro part may sound attractive to some.

Rene

unread,

Dec 24, 2014, 5:27:43 AM12/24/14

to cafetra...@googlegroups.com

->>No problem. CT can do it in a flash. But the answer to the original question would still be No.

CT does this in a flash? Nice. I am glad I just renewed my license in order to get the latest version. One of these days I will have to learn how to use the beast.

->>But does Wf Anywhere provide MT (MpreT)?

It does. And the layout is ergonomical, like Wf Classic. I am not a fan of this whole Klout translation thing, but if you are stuck somewhere without your laptop but with internet access (eg hotel lobby), it is a feasible option to get some work done, using both your TMs and MT.

Rene

Hans van den Broek

unread,

Dec 24, 2014, 5:48:13 AM12/24/14

to Cafetran support

On 24 Dec 2014, at 17:27, Rene <Yoi...@gmail.com> wrote:

->>No problem. CT can do it in a flash. But the answer to the original question would still be No.

CT does this in a flash? Nice. I am glad I just renewed my license in order to get the latest version. One of these days I will have to learn how to use the beast.

It does.

Good to know. You can use MT in CT, of course, but as far as I know, you can’t “pretranslate” an XLF or TMX file against an MT.

Hans list

unread,

Dec 24, 2014, 6:24:51 AM12/24/14

to cafetra...@googlegroups.com

On Wednesday, December 24, 2014 11:48:13 AM UTC+1, Hans van den Broek wrote:

Good to know. You can use MT in CT, of course, but as far as I know, you can’t “pretranslate” an XLF or TMX file against an MT.

You wouldn't be asking for a feature here, would you?

Rene

unread,

Dec 24, 2014, 6:35:39 AM12/24/14

to cafetra...@googlegroups.com

I don´t think he was, but alas he has given an idea to the usual suspect(s).

Rene

--

Hans list

unread,

Dec 24, 2014, 6:46:28 AM12/24/14

to cafetra...@googlegroups.com, yoi...@gmail.com

On Wednesday, December 24, 2014 12:35:39 PM UTC+1, Rene wrote:

I don´t think he was, but alas he has given an idea to the usual suspect(s).

:) I already had the idea to post a request before I asked for a solution here. However, I didn't send such a request to dear Igor. He's probably busy with http://www.zuhause.de/weihnachtsbaum-schmuecken-die-besten-tipps/id_60911336/index

As normal people do, instead of lonely working in an empty office ;).

Hans van den Broek

unread,

Dec 24, 2014, 6:46:33 AM12/24/14

to Cafetran support

On 24 Dec 2014, at 18:24, Hans list <hans...@gmail.com> wrote:

You wouldn't be asking for a feature here, would you?

Nope.

Michael Beijer

unread,

Dec 24, 2014, 7:17:24 AM12/24/14

to cafetra...@googlegroups.com

Not sure about what formats it can import and export, but the Google Translator Toolkit can translate entire documents ... for people with limited internet access, to use when they do have access to the internet.

Another idea might be to set up CT to insert only Google Translate results and pretranslate the entire document when you still have an internet connection.

Michael

—
Sent from Mailbox on my iPad

On Wed, Dec 24, 2014 at 8:41 AM, Hans list <hans...@gmail.com> wrote:

For people with limited internet access: is it possible to translate a whole TMX or XLIFF file with Google Translate in one pass, on those moments where they do have access to the internet?

--

Michael Beijer

unread,

Dec 24, 2014, 7:23:17 AM12/24/14

to cafetra...@googlegroups.com

PS: Three generations of my wife's family (my wife, her mother, and my wife's grandmother) downstairs in the kitchen cooking Chrtistmas goodies! Since my wife is pregnant, there is actually a 4th generation present too: our baby girl Juniper)!

Hope you all are having a good holiday!

Michael

—
Sent from Mailbox on my iPad

--

Hans van den Broek

unread,

Dec 24, 2014, 7:49:45 AM12/24/14

to Cafetran support

On 24 Dec 2014, at 19:17, Michael Beijer <mic...@beijer.uk> wrote:

Not sure about what formats it can import and export

I just tried TMX and XLF, the latter consisting of two SL sentences. No results for the (project) TMX, which is not surprising, because it was still empty. The results for the XLF in GT:

<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:text="urn:cafetran:text" xmlns:bill="urn:cafetran:bill" xmlns:property="urn:cafetran:property" xmlns:filter="urn:cafetran:filter">
  <file original="/Users/hans/Desktop/Example for RvR/XLF example for RvR_en-GB.docx" source-language="en-GB" target-language="nl-NL" datatype="x-docx/xml" date="2014-12-24T16:29:10Z" tool-id="CafeTran">
    <header>
      <skl>
        <external-file href="/Users/hans/Desktop/Example for RvR/XLF example for RvR_en-GB.docx">
        </external-file>
      </skl>
      <phase-group>
        <phase phase-name="Default" process-name="Review">
        </phase>
      </phase-group>
      <tool tool-id="CafeTran" tool-name="CafeTran">
      </tool>
      <tool tool-id="CafeTran-OpenOffice" tool-name="CafeTran and OpenOffice">
      </tool>
    </header>
    <body><iframe src="https://translate.google.com/translate_un?hl=en&prev=_t&sl=en&tl=nl&lang=en&usg=ALkJrhjFz2J15t8MdljxMwEsmtdGNOKTJA" width=0 height=0 frameborder=0 style="width:0px;height:0px;border:0px;display:none;"></iframe>
      <trans-unit id="1">
        <source xml:lang="en-GB" text:position="0" text:length="38">
          <x id="1">
          </x>
          This is an example for RvR.Dit is een voorbeeld RvR.
        </source>
        <target state="new" xml:lang="nl-NL">
          <x id="1">
          </x>
        </target>
      </trans-unit>
      <trans-unit id="2">
        <source xml:lang="en-GB" text:position="38" text:length="100">
          <x id="2" ctype="x-break" equiv-text=" ">
          </x>
          The XLF version should make the bilingual character clear.De XLF-versie moet het tweetalige karakter duidelijk maken.
        </source>
        <target state="new" xml:lang="nl-NL">
          <x id="2" ctype="x-break" equiv-text=" ">
          </x>
        </target>
      </trans-unit>
    </body>
  </file>
</xliff>

GT doesn’t understand the format, but for reasons I cannot possible grasp, it does show both the SL and the TL.

Hans van den Broek

unread,

Dec 24, 2014, 7:57:13 AM12/24/14

to Cafetran support

The export of the project as a “bilingual” document and HTML file:

Cheers,

Hans

--

Hans van den Broek

Schrijf-, vertaal- en redigeerwerk

Yogyakarta

Indonesia

Selcuk Akyuz

unread,

Dec 24, 2014, 8:20:01 AM12/24/14

to cafetra...@googlegroups.com

Grats Michael,

What will be the planned release date of 4th G, oops Juniper?

Selcuk

Rene

unread,

Dec 24, 2014, 8:32:40 AM12/24/14

to cafetra...@googlegroups.com

No, not clear conceptually to me. OK, there is an empty column. Ie. instead of a line return, there is tab and line return. Whats the big deal about it? It is still a mononlingual source file.

--

Rene

unread,

Dec 24, 2014, 8:35:13 AM12/24/14

to cafetra...@googlegroups.com

<oodies! Since my wife is pregnant, there is actually a 4th generation present too: our baby girl Juniper)!>

Nice! Please eat a lot of Xmas pudding and make lots of babies.

That will keep your mind from thinking up new features for a certain software.

Ie, win-win for everywone!

Merry Xmas

Rene

Hans van den Broek

unread,

Dec 24, 2014, 8:38:22 AM12/24/14

to Cafetran support

On 24 Dec 2014, at 20:32, Rene <Yoi...@gmail.com> wrote:

No, not clear conceptually to me. OK, there is an empty column. Ie. instead of a line return, there is tab and line return. Whats the big deal about it? It is still a mononlingual source file.

So you think you can get away with it quantum-mechanically? When you look at it, it’s a bilingual file, when you don’t it’s monolingual. Or the other way around, of course.

Will Helton

unread,

Dec 24, 2014, 9:22:13 AM12/24/14

to cafetra...@googlegroups.com

Big congrats, Michael!

A Merry Christmas to all!

Will

Sent from my iPhone

Verstuurd vanaf mijn iPhone

Igor Kmitowski

unread,

Dec 24, 2014, 9:53:34 AM12/24/14

to cafetra...@googlegroups.com

Congrats Micheal and Merry Xmas Everyone,
Igor

--
Igor Kmitowski
Translator and Java developer
CafeTran website: http://www.cafetran.com
CafeTran support: cafetran...@gmail.com

Michael Beijer

unread,

Dec 24, 2014, 10:31:39 AM12/24/14

to cafetra...@googlegroups.com

Hi Selcuk,

Thanks! Juniper’s planned release date is April!

Michael

Date: Wed, 24 Dec 2014 05:20:01 -0800
From: turkisht...@gmail.com
To: cafetra...@googlegroups.com
Subject: Re: Translate whole TMX or XLIFF file with Google Translate in one pass

Hans van den Broek

unread,

Dec 24, 2014, 6:27:35 PM12/24/14

to Cafetran support

On 24 Dec 2014, at 18:46, Hans list <hans...@gmail.com> wrote:

:) I already had the idea to post a request before I asked for a solution here.

So this is yet another case of you asking a question to which you already think you know the answer and/or even requested a “feature” for?

I have very unnatal thoughts about this.

Hans list

unread,

Dec 25, 2014, 2:38:51 AM12/25/14

to cafetra...@googlegroups.com

No and no. Like I wrote: I don't need this solution. I was asking for someone else. I won't ask Igor to add this to ct.

Hans list

unread,

Dec 27, 2014, 11:51:01 AM12/27/14

to cafetra...@googlegroups.com

Here is a solution from Proz that I've just tested:

No need to use complex conversions when you use the GT engine that you pay for.

Just download and install the Okapi Rainbow free utility software for translators (http://okapi.opentag.com). You drag the TMX or xliff to the first tab, on one of the last tabs you define the languages and encodings (very important), and in the utilities menu choose Edit/Execute pipeline. In the pipeline you need to ad first the following step: Raw document to filter event, then the Leveraging step - here you define the TM or MT engine, it can be Google Translate, there are also other options, and towards the end of the Leveraging dialog there is an option to generate a tmx file.

When you have defined and added the leveraging step, add the final step Filter events to raw document, save the pipeline (that's what they call the sequence of steps in Okapi Rainbow) for future use, and you are ready to execute it.

In your CAT tool set a penalty for your MT translation memory, this will remind you to check the machine translation thoroughly and correct it when you get a match.

Regards,

Piotr Bienkowski

PS The downside is that on a Mac CafeTran requires Java 6 and Okapi requires Java 7. But hey, the solution is beautiful. Impressive stuff this Okapi animal breed kind of things.

Rene

unread,

Dec 27, 2014, 12:13:47 PM12/27/14

to cafetra...@googlegroups.com

How do you install the thing? It comes as a zip file and when I unzip and try to run the executables, I get an error message.

--

Hans list

unread,

Dec 27, 2014, 12:14:45 PM12/27/14

to cafetra...@googlegroups.com

Create a CafeTran project.
Drag the CafeTran XLIFF file onto the first tab of Rainbow.
Adjust the settings (note that you have to enter your Google Translate key):

Hans LIST

unread,

Dec 27, 2014, 12:17:48 PM12/27/14

to CafeTran Google Group

> On 27 12 2014, at 18:13, Rene <Yoi...@gmail.com> wrote:
>
> How do you install the thing? It comes as a zip file and when I unzip and try to run the executables, I get an error message.

I even didn't get the error messages on Mac. Do you have Java 1.7 or higher? Seems to be crucial.

Hans van den Broek

unread,

Dec 27, 2014, 6:37:10 PM12/27/14

to Cafetran support

> On 28 dec. 2014, at 00:17, Hans LIST <hans...@gmail.com> wrote:
>
>
> I even didn't get the error messages on Mac. Do you have Java 1.7 or higher? Seems to be crucial.

I have 0.23 which still works under Java 1.6. So if you don't want to install 1.7, download 0.23, or tell me to send it. Not sure the version works under Java 1.6 Windows.

Hans list

unread,

Dec 28, 2014, 4:33:51 AM12/28/14

to cafetra...@googlegroups.com

I've written an article:

http://cafetran.wikidot.com/creating-a-machine-translation-of-an-xliff-file

About the last step: If you want to use other TMs from which you want Exact Matches to be inserted automatically, what would the best way to prevent the pseudo-EMs from the Google Translate TM to be inserted automatically too? I can only think of adding some characters to the source segments in the Google Translate TM (to make the matches less than exact).

Is it possible to set a penalty for a TM?

Hans van den Broek

unread,

Dec 28, 2014, 4:50:51 AM12/28/14

to Cafetran support

On 28 dec. 2014, at 16:33, Hans list <hans...@gmail.com> wrote:

Is it possible to set a penalty for a TM?

I don’t know, but I have a few related questions:

How likely is it an agency will send you an XLF or TMX file that doesn’t contain any target segments without sending you the original document as well?
If they indeed do send you an XLF or TMX file that does contain target segments without sending you the original document as well, what will GT do?
What on earth is wrong with converting the TMX or XLF file to text or HTML in CT, and uploading that file to https://translate.google.com/?hl=en? Too simple?

Hans van den Broek

unread,

Dec 28, 2014, 4:54:51 AM12/28/14

to Cafetran support

On 28 dec. 2014, at 16:33, Hans list <hans...@gmail.com> wrote:

what would the best way to prevent the pseudo-EMs from the Google Translate TM to be inserted automatically too?

Set them to “manual", which is what I do for Total Recall TMX files. If not, you’ll get only 100% resource matches. That means you can only search GT generated TMs.

Hans LIST

unread,

Dec 28, 2014, 5:14:25 AM12/28/14

to CafeTran Google Group

On 28 12 2014, at 10:49, Hans van den Broek <ir...@indo.net.id> wrote:

I don’t know, but I have a few related questions:

Interesting questions ... Note however, that this article (and the whole wiki) is just my personal notepad. It is by no means a Help that has been approved by CafeTran's developer or claims to be generic or valid to all users. Take from it what you can use. Write your own notes for your own tasks. Publish them (like I did) or keep them for yourself, just as you prefer.

Hans van den Broek

unread,

Dec 28, 2014, 5:23:36 AM12/28/14

to Cafetran support

On 28 dec. 2014, at 17:14, Hans LIST <hans...@gmail.com> wrote:

Interesting questions …

Interesting answers, any?

Note however, that this article (and the whole wiki) is just my personal notepad. It is by no means a Help that has been approved by CafeTran's developer

True, but the CT Menu does refer to your personal wiki, and it’s the only resource we have. And I bet there will be poor sods who are going to try this. Holy Moses, it should be forbidden!

Hans List

unread,

Dec 28, 2014, 5:45:41 AM12/28/14

to cafetra...@googlegroups.com

>And I bet there will be poor sods who are going to try this. Holy Moses, it should be forbidden!

By whom? The Hans vd B Wiki Police?

You are a funny bloke :)

Hans van den Broek

unread,

Dec 28, 2014, 5:51:33 AM12/28/14

to Cafetran support

On 28 dec. 2014, at 17:45, Hans List <hans...@gmail.com> wrote:

By whom? The Hans vd B Wiki Police?

Sorry for that, I had to include Moses in the message somehow. Did you watch the screencast?

Hans LIST

unread,

Dec 28, 2014, 6:10:30 AM12/28/14

to CafeTran Google Group

On 28 12 2014, at 11:50, Hans van den Broek <ir...@indo.net.id> wrote:

On 28 dec. 2014, at 17:45, Hans List <hans...@gmail.com> wrote:

By whom? The Hans vd B Wiki Police?

Sorry for that, I had to include Moses in the message somehow. Did you watch the screencast?

Nope. Must write another article first. There is also another MT engine that you can configure yourself (Apertium?). I've seen some results a couple of months ago. Frightening good. Luckily this hasn't been set up as a web service.

Hans van den Broek

unread,

Dec 28, 2014, 7:35:29 AM12/28/14

to Cafetran support

On 28 dec. 2014, at 18:10, Hans LIST <hans...@gmail.com> wrote:

Nope. Must write another article first.

I have this strange feeling it’ll be related to the previous one. Very much so.

Michael Beijer

unread,

Dec 28, 2014, 7:58:08 AM12/28/14

to cafetra...@googlegroups.com

As far as I know, Apertium is great, but not ready in "our" languages (i.e. DE/NL/EN).

Michael

Rene

unread,

Dec 28, 2014, 8:01:53 AM12/28/14

to cafetra...@googlegroups.com

Writing new articles is better than dreaming up new features, I say.

--

Hans van den Broek

unread,

Dec 28, 2014, 8:05:00 AM12/28/14

to Cafetran support

On 28 dec. 2014, at 19:57, Michael Beijer <mic...@beijer.uk> wrote:

As far as I know, Apertium is great, but not ready in "our" languages (i.e. DE/NL/EN).

As it is well-known, many different SMT and RBMT systems emerged in the past ten years. Probably the most developed ones are: Moses (Koehn, 2007) and Apertium (Armentano, 2006). At the beginning, Apertium was devoted to the lan- guages of Spain. Since Moses is well-documented and is better- known in the localization industry, it was chosen for our effort.

I do hope Moses provides Ditch. I’ll have to look up this first.

Michael Beijer

unread,

Dec 28, 2014, 8:24:34 AM12/28/14

to cafetra...@googlegroups.com

I am completely puzzled by this entire thread. The original question was:

"For people with limited internet access: is it possible to translate a whole TMX or XLIFF file with Google Translate in one pass, on those moments where they do have access to the internet?"

Wouldn't it be much easier if we asked Igor to make it possible to pretranslate a project using one of the MT providers (rather than start messing around with Okapi stuff)? It seems exceedingly simple to me: when the translator with patchy internet has internet for a few minutes, he simply clicks a button and CT quickly pretranslates all the segments in a project using, e.g., Google Translate (or Mymemory, Apertium, or Microsoft Translator). Problem solved. Or am I missing something important?

If you didn't want to run the selected MT provider on your actual documents in this new system, you could of course also just set up a dummy project, run it on that, export all segments to a TMX, and use that as your MT-derived TMX, setting its priority to whatever you want.

Michael

Michael Beijer

unread,

Dec 28, 2014, 8:31:50 AM12/28/14

to cafetra...@googlegroups.com

Such a creative soul, our Rene. If it were up to him, we’d all by standing by the side of the road, moaning (because we'd be getting nowhere) next to our square-wheeled cars.

Michael

Hans van den Broek

unread,

Dec 28, 2014, 8:35:40 AM12/28/14

to Cafetran support

On 28 dec. 2014, at 20:23, Michael Beijer <mic...@beijer.uk> wrote:

Wouldn't it be much easier if we asked Igor

No need for Igor. It’s dead easy.

Hans LIST

unread,

Dec 28, 2014, 9:02:47 AM12/28/14

to CafeTran Google Group

On 28 12 2014, at 13:34, Hans van den Broek <ir...@indo.net.id> wrote:

On 28 dec. 2014, at 18:10, Hans LIST <hans...@gmail.com> wrote:

Nope. Must write another article first.

I have this strange feeling it’ll be related to the previous one. Very much so.

Yes. Still 'researching'. I'd like to create a starter's glossary, say DE>FR. I already have a list of the 40,000 most frequent German words from a German research institute (publicly available).

When I create a plain text file with 4 English words, run it through Rainbow, I get:

<?xml version="1.0" encoding="UTF-8"?>

<tu>

</tu>

<tu>

</tu>

<tu>

<tuv xml:lang="nl-nl"><seg>molen</seg></tuv>

</tu>

<tu>

<tuv xml:lang="en-gb"><seg>flower</seg></tuv>

<tuv xml:lang="nl-nl"><seg>bloem</seg></tuv>

</tu>

</body>

</tmx>

With four words this can be done in the web interface too. But with 40,000 or 100,000 words?

I could stop here (and use CafeTran to convert TMX to tab-del) but I'd like to make the procedure as simple as:

- One column file with source language in

- Two column file with source and target language out

Have to bother some other people first, to achieve this.

Michael Beijer

unread,

Dec 28, 2014, 9:07:40 AM12/28/14

to cafetra...@googlegroups.com

Do you mean doing it the Rainbow way (is easy)?

Much simpler would be: Translation > Pretranslate all segments … Select MT provider from list + Click OK

Michael

Hans van den Broek

unread,

Dec 28, 2014, 9:20:36 AM12/28/14

to Cafetran support

On 28 dec. 2014, at 21:06, Michael Beijer <mic...@beijer.uk> wrote:

Do you mean doing it the Rainbow way (is easy)?

No. Dammit. It’s crazy.

THIS is simple.

The wiki entry starts with "In CafeTran create a new translation project and import your source file(s). If you have the source doc, upload it to GT - https://translate.google.com/?hl=en - and wait a minute or so.
If you don’t have the source file, in CT, go Menu | Project | Convert Project. Click HTML or Text (I take it GT doesn’t accept a TMX, but I’m not sure of that). Continue as above.

Copy/paste SL and TL in say, Excel. If you don’t use AA, you can import the XLSX in a TMX.

Done.

Cheers,

Hans

--

Hans van den Broek

Schrijf-, vertaal- en redigeerwerk

Yogyakarta

Indonesia

Hans van den Broek

unread,

Dec 28, 2014, 9:23:33 AM12/28/14

to Cafetran support

On 28 dec. 2014, at 21:02, Hans LIST <hans...@gmail.com> wrote:

With four words this can be done in the web interface too. But with 40,000 or 100,000 words?

As I said above, use the upload functionality in GT. Then you don’t have to convert the resulting TXT file to TMX either, because that would make sense.

Hans van den Broek

unread,

Dec 28, 2014, 9:26:10 AM12/28/14

to Cafetran support

On 28 dec. 2014, at 21:19, Hans van den Broek <ir...@indo.net.id> wrote:

Copy/paste SL and TL in say, Excel.

That reminds me, and I can’t check it yet, do the new Excel features also work for OOo/LibreOffice? That would be nice for all the Linux users.

Hans van den Broek

unread,

Dec 28, 2014, 9:30:18 AM12/28/14

to Cafetran support

On 28 dec. 2014, at 21:24, Hans van den Broek <ir...@indo.net.id> wrote:

That reminds me, and I can’t check it yet, do the new Excel features also work for OOo/LibreOffice?

Stupid question. In OOo/LibreOffice you can save the Calc as XLSX.

Michael Beijer

unread,

Dec 28, 2014, 10:04:05 AM12/28/14

to cafetra...@googlegroups.com

Interesting.

One issue, however, might be segmentation (unless you generate an HTML or bilingual DOCX from CT first). Otherwise you'll end up with Google Translate's segmentation.

Michael

--
You received this message because you are subscribed to the Google Groups "CafeTranslators" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cafetranslato...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Sent from MetroMail

Hans van den Broek

unread,

Dec 28, 2014, 10:12:28 AM12/28/14

to Cafetran support

On 28 dec. 2014, at 22:04, Michael Beijer <mic...@beijer.uk> wrote:

One issue, however, might be segmentation (unless you generate an HTML or bilingual DOCX from CT first). Otherwise you'll end up with Google Translate's segmentation.

This IS the bilingual version, but there’s no target language, so there’s no problem. If there are translated fragments (much more likely, because why would an agency send you and XLF or TMX file that don’t contain TL segments and no original source document?), the TL should be removed, of course. Which is one of my earlier questions - unanswered, of course - about the Rainbow approach, what do the Thieves of Mountain view do with partly translated TMX/XLF files?

Hans van den Broek

unread,

Dec 28, 2014, 10:17:51 AM12/28/14

to Cafetran support

On 28 dec. 2014, at 22:11, Hans van den Broek <ir...@indo.net.id> wrote:
the TL should be removed, of course. Which is one of my earlier questions

SL and TL. Can be done easily in an Excel file (sort on TL, delete populated rows).

Hans van den Broek

unread,

Dec 28, 2014, 10:26:08 AM12/28/14

to Cafetran support

On 28 dec. 2014, at 22:04, Michael Beijer <mic...@beijer.uk> wrote:

One issue, however, might be segmentation

You’re right, if you use the source document (which wasn’t the case in the original question). If you’re worried about that, use the convert procedure in all cases. That makes it even easier, since then there’s only one solution.

Hans van den Broek

unread,

Dec 28, 2014, 10:42:32 AM12/28/14

to Cafetran support

On 28 dec. 2014, at 22:24, Hans van den Broek <ir...@indo.net.id> wrote:

That makes it even easier, since then there’s only one solution.

Which would be:

Open or create your project
Convert it to HTML
Copy SL (and TL if any TL) segments into an Excel file
(If there are TL segments, sort on TL, and delete rows (to avoid any GT problems))
Upload to GT*
Paste TL in Excel file
Import in CT TMX

*There’s a file size limit, but it’ll not exceed a day’s work (probably several days)

No Rainbow, no API, no money. Simple.

Hans LIST

unread,

Dec 28, 2014, 11:04:46 AM12/28/14

to CafeTran Google Group

Yes. It is simple. Didn't know that you can upload a file. However, only a few words of the 40 K file with frequent German terms were translated. Hmm.

On 28 12 2014, at 15:19, Hans van den Broek <ir...@indo.net.id> wrote:

THIS is simple

Hans LIST

unread,

Dec 28, 2014, 11:06:59 AM12/28/14

to CafeTran Google Group

From the test file with 40 K words, only a few (throughout the file) were translated.

Hans LIST

unread,

Dec 28, 2014, 11:16:31 AM12/28/14

to CafeTran Google Group

On 28 12 2014, at 16:11, Hans van den Broek <ir...@indo.net.id> wrote:

what do the Thieves of Mountain view do with partly translated TMX/XLF files?

You can instruct Rainbow to only upload SL segments for which no TL segment exists:

Do not query if there is already a candidate with a score equals to or above — Select the no-query threshold. If an entry has already a candidate with a score equals to or above the given value, no query is done for it. This allows you, for example, to no use a for-fee translation resource if the entry has already a candidate good enough to use. Use 101 to always allow the query to be done.

BTW: Google Translate Toolkit handles TMX.

And: https://support.google.com/translate/toolkit/answer/147829?hl=en DOCX etc.

Hmm

Hans LIST

unread,

Dec 28, 2014, 11:18:08 AM12/28/14

to CafeTran Google Group

zh-Hans!

https://support.google.com/translate/toolkit/answer/147854

Hans list

unread,

Dec 28, 2014, 11:42:53 AM12/28/14

to cafetra...@googlegroups.com

On Sunday, December 28, 2014 3:02:47 PM UTC+1, Hans list wrote:

Yes. Still 'researching'. I'd like to create a starter's glossary, say DE>FR.

Just forget it. The translations of isolated words by Google Translate and Bing Translator are useless.

Hans list

unread,

Dec 28, 2014, 3:32:42 PM12/28/14

to cafetra...@googlegroups.com

On Sunday, December 28, 2014 5:42:53 PM UTC+1, Hans list wrote:

Just forget it. The translations of isolated words by Google Translate and Bing Translator are useless.

Except for very specific categories / collections: http://cafetran.wikidot.com/automatically-creating-glossaries-with-rainbow

Quiz question: Describe the categories for which MT of isolated terms will probably work.

Michael Beijer

unread,

Dec 28, 2014, 4:06:48 PM12/28/14

to cafetra...@googlegroups.com

Simple my ass.

My solution (blue) is much simpler than yours (red):

Translation > Pretranslate all segments … Select MT provider from list + Click OK.

On Sun, Dec 28, 2014 at 3:41 PM, Hans van den Broek <ir...@indo.net.id> wrote:

On 28 dec. 2014, at 22:24, Hans van den Broek <ir...@indo.net.id> wrote:

That makes it even easier, since then there’s only one solution.

Which would be:

Open or create your project
Convert it to HTML
Copy SL (and TL if any TL) segments into an Excel file
(If there are TL segments, sort on TL, and delete rows (to avoid any GT problems))
Upload to GT*
Paste TL in Excel file
Import in CT TMX

*There’s a file size limit, but it’ll not exceed a day’s work (probably several days)

No Rainbow, no API, no money. Simple.

Cheers,

Hans

Double Cheers,

Michael

Michael Beijer

unread,

Dec 28, 2014, 4:14:05 PM12/28/14

to cafetra...@googlegroups.com

"My" solution would allow this to be done with ease. One click and CT could use Google Translate to translate a massive list of German terms. Pretty cool if you ask me.

Incidentally, have you tried running your list of 40,000 terms through Google Translator Toolkit?

Of just do it in memoQ. Just import your long list of terms and select "Use machine translation" in the pretranslate settings.

Preparation (tab) > Pre-Translate > Scope and lookup > "Use machine translation"

Michael

Michael Beijer

unread,

Dec 28, 2014, 4:17:28 PM12/28/14

to cafetra...@googlegroups.com

EU crap?

Legal?

Medical?

Michael

--

Hans van den Broek

unread,

Dec 28, 2014, 4:55:38 PM12/28/14

to Cafetran support

On 28 dec. 2014, at 23:04, Hans LIST <hans...@gmail.com> wrote:

Yes. It is simple. Didn't know that you can upload a file.

I mentioned it only a couple of times in this thread, dozens of times before this thread, on this and other forums, and in private conversations. JLearnit, remember?

Cheers,

Hans

--

Hans van den Broek

Schrijf-, vertaal- en redigeerwerk

Yogyakarta

Indonesia

Hans van den Broek

unread,

Dec 28, 2014, 4:57:18 PM12/28/14

to Cafetran support

On 29 dec. 2014, at 04:06, Michael Beijer <mic...@beijer.uk> wrote:

My solution (blue)

That’s not your solution. It’s Igor’s solution I hope he won’t implement. Waste of time and investors money.

Hans van den Broek

unread,

Dec 28, 2014, 6:02:19 PM12/28/14

to Cafetran support

On 28 dec. 2014, at 21:02, Hans LIST <hans...@gmail.com> wrote:

Yes. Still 'researching'. I'd like to create a starter's glossary, say DE>FR.

I uploaded JLearnit to my public dropbox in 2013. Public folder. You should be able to create a decent DE-FR termbase out of it. I can’t open and check it, because it’s a Windows* file.

https://dl.dropboxusercontent.com/u/2184204/JLearnIt.exe

http://www.jlearnit.com

"JLearnIt is a multilingual dictionary sorted by categories that helps you learn the vocabulary of another language progressively (each word has a level of use). The languages available are English, French, Spanish, Dutch, German, Italian, Hebrew, Portuguese, Swedish, Danish, Norwegian, Hungarian, Russian, Latin and Czech.”

Just use Excel or another spreadsheet app to create a bilingual termbase. Or use Rainbow, combined with sed, awk, a simple regex, Yves’ latest, Python and Perl to do the same in less than five years.

*Now it says it’s cross-platform: http://www.jlearnit.com/download.html

Will Helton

unread,

Dec 28, 2014, 6:19:00 PM12/28/14

to cafetra...@googlegroups.com

I've never been able to get this to work on any doc of more than one page/less than 200-250 words.

If anyone knows how to make this work with larger docs, I'm all ears.

Will

Sent from my iPhone

Verstuurd vanaf mijn iPhone

Hans van den Broek

unread,

Dec 28, 2014, 6:37:44 PM12/28/14

to Cafetran support

On 29 dec. 2014, at 06:18, Will Helton <will....@gmail.com> wrote:

I've never been able to get this to work on any doc of more than one page/less than 200-250 words.

You continue to amaze me. I just uploaded an EN document of 7,422 words, and got it back translated into Dutch in a split second.

Need proof? I’ll send you both the original file and the GT version.

The last sentences:

Cheers,

Hans

--

Hans van den Broek

Schrijf-, vertaal- en redigeerwerk

Yogyakarta

Indonesia

Hans van den Broek

unread,

Dec 28, 2014, 7:34:51 PM12/28/14

to Cafetran support

On 28 dec. 2014, at 23:16, Hans LIST <hans...@gmail.com> wrote:

BTW: Google Translate Toolkit handles TMX.

And: https://support.google.com/translate/toolkit/answer/147829?hl=en DOCX etc.

But I think that goes for the paid version. I was referring to the free, anonymous site: https://translate.google.com/?hl=e

No pay, no login, just upload a document and gaan met die banaan. https://support.google.com/translate/answer/2534559?hl=en&ref_topic=2534563

No TMX, "Simply click the translate a document link and submit your file as a PDF, TXT, DOC, PPT, XLS or RTF. Alternatively, you can simply drag your file into the browser window."

Hans List

unread,

Dec 29, 2014, 12:40:17 AM12/29/14

to cafetra...@googlegroups.com

Op 28 dec. 2014 om 22:56 heeft Hans van den Broek investors money.

Care to elaborate?

Hans List

unread,

Dec 29, 2014, 12:47:12 AM12/29/14

to cafetra...@googlegroups.com

Incidentally, have you tried running your list of 40,000 terms through Google Translator Toolkit?

Yep. And the result is bogus. When you send segments with isolated words, no textual context is taken into account. Say you want to translate:

Rhein

Mosel

Oder

Neisse

Guess how GTr translates 'Oder'?

Yep, as 'Or'. Perhaps some quality can be added by making longer segments?

Hans List

unread,

Dec 29, 2014, 12:49:23 AM12/29/14

to cafetra...@googlegroups.com

Nope:

image1.jpeg

Rene

unread,

Dec 29, 2014, 12:49:54 AM12/29/14

to cafetra...@googlegroups.com

Well, would the result be any different if you send context-less isolated words to a human translator?

Or look up isolated words in a general dictionary?
In fairness, that is hardly an MT issue.

Rene

--

Hans van den Broek

unread,

Dec 29, 2014, 12:52:49 AM12/29/14

to Cafetran support

On 29 dec. 2014, at 12:40, Hans List <hans...@gmail.com> wrote:
Op 28 dec. 2014 om 22:56 heeft Hans van den Broek investors money.

Care to elaborate?

Of course not. It’s BS.

In the meantime, I extracted a DE-FR glossary from JLearnit: https://dl.dropboxusercontent.com/u/2184204/CT/JLearnit%20DE-FR.xlsx.zip (100 kB, with a few mistakes, no doubt)

It’s “only” 4k words, but 40k is a lot. See the Wikipedia:

Native-language vocabulary size

Native speakers' vocabularies vary widely within a language, and are especially dependent on the level of the speaker's education. A 1995 study shows that junior-high students would be able to recognize the meanings of about 10,000–12,000 words, whereas for college students this number grows up to about 12,000–17,000 and for elderly adults up to about 17,000 or more.^[10]

Foreign-language vocabulary

The effects of vocabulary size on language comprehension

The knowledge of the words deriving from the 3000 most frequent English word families and the 5000 most frequent words provides a comprehension of 95% of word use,^[11] and knowledge of 5000 word families is necessary for 99.9% word coverage

Hans List

unread,

Dec 29, 2014, 12:53:33 AM12/29/14

to cafetra...@googlegroups.com

> Op 29 dec. 2014 om 06:49 heeft Rene <Yoi...@gmail.com> het volgende geschreven:
>
> Well, would the result be any different if you send context-less isolated words to a human translator?
> Or look up isolated words in a general dictionary?
> In fairness, that is hardly an MT issue.
>

You are probably right. Remember: I just started to use MT so I don't have an intuition here.

My naive assumption was:

Translate the 100 German words which are most frequent and get the corresponding 100 English words. Not.
>

Hans List

unread,

Dec 29, 2014, 12:54:43 AM12/29/14

to cafetra...@googlegroups.com

Thanks. I'll probably make some guys happy. Much appreciated

Op 29 dec. 2014 om 06:51 heeft Hans van den Broek <ir...@indo.net.id> het volgende geschreven:

On 29 dec. 2014, at 12:40, Hans List <hans...@gmail.com> wrote:
Op 28 dec. 2014 om 22:56 heeft Hans van den Broek investors money.

Care to elaborate?

Of course not. It’s BS.

In the meantime, I extracted a DE-FR glossary from JLearnit: https://dl.dropboxusercontent.com/u/2184204/CT/JLearnit%20DE-FR.xlsx.zip (100 kB, with a few mistakes, no doubt)

<Screen Shot 2014-12-29 at 11.22.54.png>

It’s “only” 4k words, but 40k is a lot. See the Wikipedia:

Native-language vocabulary size
Native speakers' vocabularies vary widely within a language, and are especially dependent on the level of the speaker's education. A 1995 study shows that junior-high students would be able to recognize the meanings of about 10,000–12,000 words, whereas for college students this number grows up to about 12,000–17,000 and for elderly adults up to about 17,000 or more.^[10]
Foreign-language vocabulary
The effects of vocabulary size on language comprehension
The knowledge of the words deriving from the 3000 most frequent English word families and the 5000 most frequent words provides a comprehension of 95% of word use,^[11] and knowledge of 5000 word families is necessary for 99.9% word coverage

Cheers,

Hans

--

Hans van den Broek
Schrijf-, vertaal- en redigeerwerk
Yogyakarta
Indonesia

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

--
You received this message because you are subscribed to a topic in the Google Groups "CafeTranslators" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cafetranslators/241iLTxh2I4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cafetranslato...@googlegroups.com.

Hans van den Broek

unread,

Dec 29, 2014, 12:58:02 AM12/29/14

to Cafetran support

On 29 dec. 2014, at 12:49, Rene <Yoi...@gmail.com> wrote:

Well, would the result be any different if you send context-less isolated words to a human translator?
Or look up isolated words in a general dictionary?
In fairness, that is hardly an MT issue.

You’re not going to tell me that in the HansL sequence:

Rhein

Mosel

Oder

Neisse

you would translate Oder with “Or”, would you?

Actually, one of the few times I uploaded a doc to GT was to see if it would yield better results than the separate segments sent by CT to GT (in those days) because of the availability of context. No difference, no difference at all.

Cheers,

Hans

--

Hans van den Broek

Schrijf-, vertaal- en redigeerwerk

Yogyakarta

Indonesia

Rene

unread,

Dec 29, 2014, 1:03:20 AM12/29/14

to cafetra...@googlegroups.com

Err.... aren´t the "100 German words which are most frequent" a tad different, depending on what population group and what topic you look at?

Just sayin....

--

Hans van den Broek

unread,

Dec 29, 2014, 1:21:57 AM12/29/14

to Cafetran support

On 29 dec. 2014, at 13:02, Rene <Yoi...@gmail.com> wrote:

Err.... aren´t the "100 German words which are most frequent" a tad different, depending on what population group and what topic you look at?

Miscommunication. Or maybe you cannot follow HansL’s formidable speed of thinking. After machine-translating the n most frequent words failed miserably, he tried to send related words to GT, hoping for a better result. Not a stupid idea, it’s just that GT is stupid (like BT/MT, MyMemory, and SDL. I just tried).

Hans list

unread,

Dec 29, 2014, 5:29:01 AM12/29/14

to cafetra...@googlegroups.com

This is content, Google Groups moderator!

Playing around some more:

Hans list

unread,

Dec 29, 2014, 5:31:12 AM12/29/14

to cafetra...@googlegroups.com

Try once more :(

Isn't that plural, photos? So it should be possible to drag more photos, isn't it?

Will Helton

unread,

Dec 29, 2014, 8:48:25 AM12/29/14

to cafetra...@googlegroups.com

You used Rainbow to get GT to translate a 7k+ file and it did so without bombing out?

If that's so, I'd love to have the steps for how you set that up. I stopped using it about 2+ years ago because every time I tried to process a Word file of more than 300-500 words it would hang and never compete.

Asking both in the Okapi and OmegaT forums yielded (basically) "your mileage may very" replies.

I would rarely need to do this, but it would be a good option to have.

Haven't tried the Google Toolkit since it was introduced accuracy reasons. Maybe I should have another look when I'm back in the UK.

Will

Sent from my iPhone

Verstuurd vanaf mijn iPhone

On 28 Dec 2014, at 17:36, Hans van den Broek <ir...@indo.net.id> wrote:

On 29 dec. 2014, at 06:18, Will Helton <will....@gmail.com> wrote:

I've never been able to get this to work on any doc of more than one page/less than 200-250 words.

You continue to amaze me. I just uploaded an EN document of 7,422 words, and got it back translated into Dutch in a split second.

<Screen Shot 2014-12-29 at 06.28.35.png>

Need proof? I’ll send you both the original file and the GT version.

The last sentences:

<Screen Shot 2014-12-29 at 06.34.59.png>

Cheers,

Hans

--

Hans van den Broek
Schrijf-, vertaal- en redigeerwerk
Yogyakarta
Indonesia

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

--

Will Helton

unread,

Dec 29, 2014, 8:51:03 AM12/29/14

to cafetra...@googlegroups.com

Wait - are you talking about using the Google Translate web page? I just noticed the icons on your screenshot.

If so, what good is that for importing anything into CT (unless you copy and paste each individual sentence)?

Sent from my iPhone

Verstuurd vanaf mijn iPhone

On 28 Dec 2014, at 17:36, Hans van den Broek <ir...@indo.net.id> wrote:

On 29 dec. 2014, at 06:18, Will Helton <will....@gmail.com> wrote:

I've never been able to get this to work on any doc of more than one page/less than 200-250 words.

You continue to amaze me. I just uploaded an EN document of 7,422 words, and got it back translated into Dutch in a split second.

<Screen Shot 2014-12-29 at 06.28.35.png>

Need proof? I’ll send you both the original file and the GT version.

The last sentences:

<Screen Shot 2014-12-29 at 06.34.59.png>

Cheers,

Hans

--

Hans van den Broek
Schrijf-, vertaal- en redigeerwerk

Yogyakarta
Indonesia

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

--

Hans van den Broek

unread,

Dec 29, 2014, 9:21:16 AM12/29/14

to Cafetran support

On 29 dec. 2014, at 20:50, Will Helton <will....@gmail.com> wrote:

Wait - are you talking about using the Google Translate web page?

Yes.

If so, what good is that for importing anything into CT (unless you copy and paste each individual sentence)?

Read the procedure I wrote down a day or two ago.

Hans list

unread,

Dec 29, 2014, 10:44:36 AM12/29/14

to cafetra...@googlegroups.com

Strange: Even the context in the same sentence isn't considered, hence the wrong translation of 'Ventil'.

I'd say: Kinderschuhe!

They now have the iOS interface for the web version too. :?

Will Helton

unread,

Dec 29, 2014, 10:50:29 AM12/29/14

to cafetra...@googlegroups.com

Will try to find it. On iPhone, so pretty useless for that sort of thing.

But unless you're talking about some sort of batch import (of 4 steps or less), it wouldn't be that useful for someone like me (no offence or criticism intended).

Will

Sent from my iPhone

Verstuurd vanaf mijn iPhone

--

Hans van den Broek

unread,

Dec 29, 2014, 5:27:36 PM12/29/14

to Cafetran support

On 29 dec. 2014, at 22:50, Will Helton <will....@gmail.com> wrote:

it wouldn't be that useful for someone like me

It won’t be. It is a solution to the problem as stated in the subject line, and a hell of an easier solution than the originally proposed one. But I wouldn’t use it. Over my dead body.

Will Helton

unread,

Dec 31, 2014, 12:54:54 PM12/31/14

to cafetra...@googlegroups.com

If anyone has tried running a Rainbow pipeline with MyMemory as the MT
resource as described here:

http://cafetran.wikidot.com/creating-a-machine-translation-of-an-xliff-file

be aware that this does not currently work for MyMemory.

Here is what Yves on the Okapi list has to say:

============================

There are possibly two issues:

a) I've tested the MyMemory connector and it seems that in some cases one of the return values from the Web service now varies
between being a Double or a Long and that causes an error in the current connector (which expects a Double). I've fixed that problem
and a new snapshot is available here:http://okapi.opentag.com/snapshots/ (Note that only the one for Windows 64-bit has been
updated at this time. I'll build and post the other platforms latertoday).

b) The Leveraging step has a threshold below which the matches are not copied into the TMX output. That threshold is 95 by default,
but different connectors come with different match values. The MyMemory match is often lower than 95. Try setting it to something
like 80 or 85 to see if you get something.

I hope this helps,
-yves

============================

Hopefully this will be working later today.

Cheers,

Will

Michael Beijer

unread,

Dec 31, 2014, 1:12:59 PM12/31/14

to cafetra...@googlegroups.com

Maybe I'm missing something, but why would you want to do this? That is, "run a Rainbow pipeline with MyMemory as the MT resource". It all looks very interesting, but what's the point? Is this to be able to use MT output at a later time when you might no longer have internet access?

When I have no internet access, I don't translate. Period. After all, you might have your resources, and possibly your MT output ("pre-gathered", using e.g. this method), and of course your brain – but you can't Google stuff (!!!), which is pretty much a show-stopper for me.

Michael

******************************************************
MICHAEL BEIJER
NL>EN translator & terminologist
24 Oakfield Rd., Hastings,
TN35 5AX, East Sussex, UK.
Tel. +44 (0)1424 435830
Mob. +44 (0)747 57717 20
Email: mic...@beijer.uk
Skype/Twitter: michaelbeijer
Beijer.uk (translation/terminology work)
Proz.com profile: www.proz.com/translator/652138
Acronymbook.com (open source acronyms/abbreviations)
nederbrackets.com (Adventures in Dutch bracket (ab)use)
Wordbook.nl (terminology resources, focusing on NL/EN)
Beijerdeas.com (CAT tools + AHK scripts)
Cafetranhelp.com/changelog
CafeTran mailing list: CafeTranslators
******************************************************

--
You received this message because you are subscribed to the Google Groups "CafeTranslators" group.

To unsubscribe from this group and stop receiving emails from it, send an email to cafetranslators+unsubscribe@googlegroups.com.

Will Helton

unread,

Dec 31, 2014, 3:08:26 PM12/31/14

to cafetra...@googlegroups.com

Maybe I'm missing something, but why would you want to do this? That is, "run a Rainbow pipeline with MyMemory as the MT resource". It all looks very interesting, but what's the point? Is this to be able to use MT output at a later time when you might no longer have internet access?

Yep, that's it exactly. I sometimes have to take the odd project just before I'm going out of town and won't have internet access.

When I have no internet access, I don't translate. Period. After all, you might have your resources, and possibly your MT output ("pre-gathered", using e.g. this method), and of course your brain – but you can't Google stuff (!!!), which is pretty much a show-stopper for me.

I can dig that. Much of what I do, though, is snappy marketing speech, so mainly relies on jargon and being "clever" with phrasing, so some stuff doesn't rely on me being able to google stuff.

I wouldn't need to do this often, just every once in a while.

Will

To unsubscribe from this group and stop receiving emails from it, send an email to cafetranslato...@googlegroups.com.

Hans list

unread,

Jan 1, 2015, 3:02:37 AM1/1/15

to cafetra...@googlegroups.com

Yves reported that he fixed it. But the bad news is that the fix requires Java 7, whereas CT still requires Java 6 on Mac. This lack of support for the latest Java version is getting into our way now. For safety reasons the program that I use 95 % of my working hours should always support the most recent version and patches of Java.

Hans van den Broek

unread,

Jan 1, 2015, 3:28:54 AM1/1/15

to Cafetran support

> On 1 jan. 2015, at 15:02, Hans list <hans...@gmail.com> wrote:
>
> Yves reported that he fixed it. But the bad news is that the fix requires Java 7, whereas CT still requires Java 6 on Mac. This lack of support for the latest Java version is getting into our way now.

Just use another, simpler, better solution.

> For safety reasons

???