Google AutoML Translation models - integration with CAT tools

maria c laguardia

unread,

Feb 15, 2019, 11:40:41 AM2/15/19

to Google Cloud Translation API

Hi

I have trained a translation model EN-ES in Google AutoML, and I would like to test the output directly in our translation tools, which already have built-in integration for the standard Google Translate API:

Is it just a question of replacing the the standard API key in those tools, for the one corresponding to the AutoML project in the GCP?

Also, are there any other charges involved apart from the training time and character usage? Engine hosting charges, for example?

Thanks

M C Laguardia

Kan Qiao

unread,

Feb 15, 2019, 12:12:06 PM2/15/19

to Google Cloud Translation API

No. It is not that simple. The translating tool needs to be integrated with AutoML translate client.

What is the tool you are using? Memsource has AutoML integrated.

Ilia Shifrin

unread,

Feb 15, 2019, 12:41:05 PM2/15/19

to Google Cloud Translation API

Hi Maria,

You likely have the basic untrained NMT model requests hardcoded to be used in your CAT tool as part of the original Google MT plug-in. The requests from such a plug-in are sent in such a way that they are explicitly requesting the use of the untrained NMT model even if you plug in the API access token from the custom-trained model.

You will likely not be able to change this within the CAT tool unless you know the developer/product owner and they are somehow willing to make custom changes into the Google MT plug-in embedded into the CAT tool and offer you a custom-built application. Changing/reverse-engineering the original plug-in is also quite a bit of a hassle unless we are here talking about some open-source product. Most of plug-ins released along with the tool require to be signed by the CAT tool developer. Without the signature, modified plug-ins will not be recognized by the CAT tool. To make major CAT product managers believe there is a business use case in developing customizable solutions for AutoML and have them prioritize this over other long-pending feature requests will likely take some time even if you know them in person.

To expedite the testing, you will need to create a simple client (standalone app) or a script that will parse your data one request at a time from a pre-configured file and then writing the responses into some local output file. The format of the file is really up to you and will depend on how well you know C++/Java/Python, etc. and how much time you have to prepare the data.

Hope this helps.

Thanks,

Ilia Shifrin

maria c laguardia

unread,

Feb 15, 2019, 1:02:48 PM2/15/19

to Google Cloud Translation API

Hi Kan, thank you very much for your response. We use Crowdin and XTM mainly. I will check the MT settings to see if they have an integration with AutoML.

Ilia Shifrin

unread,

Feb 15, 2019, 1:05:12 PM2/15/19

to Google Cloud Translation API

Hi Maria,

With XTM, contact Shamus or your designated Solutions Architect directly, they should help you out. There may be a surcharge for this service though.

Thanks,

Ilia Shifrin

maria c laguardia

unread,

Feb 15, 2019, 1:12:14 PM2/15/19

to Google Cloud Translation API

Hi Ilia, thank you very much for your response. If our CAT tools do not have built-in AutoML support, I will test via Python script. I have seen some preliminary results en-es and the output is very good for the domain. I am quite keen to test it with a bigger volume of segments and extend to other language pairs.

Kind regards

M C Laguardia

Ilia Shifrin

unread,

Feb 15, 2019, 1:21:39 PM2/15/19

to Google Cloud Translation API

Hi Maria,

Way to go! The results are indeed impressive across multiple domains considering you have good enough training corpora. Bigger volume testing still required for multiple languages and accounts to collect additional stats on what exactly is getting improved and how to organize the training corpora better and avoid extra MTPE/LQA costs as well as, naturally, NMT re-training costs. With standalone tools allowing for calculating editing distances between the versions as well as for automatic/customizable LQA checks and exporting of all available stats, the results are visualized easier and are tangible enough for C-level executives to much readily pull the trigger on the new solution.

Good luck!

Ilia Shifrin

maria c laguardia

unread,

Feb 16, 2019, 6:12:49 AM2/16/19

to Google Cloud Translation API

Thank you Ilia, do you know of any tools that we can use to measure edit distances? I created a very basic Python script for my own use, with the nltk edit.distance function and Pandas to interact with a spreadsheet of strings, but it is not exactly user-friendly tbh, plus I had some problems with csv encodings for Asian languages. So any readily available, slightly more user-friendly tool/script would be great to have!.

Ilia Shifrin

unread,

Feb 18, 2019, 11:27:43 AM2/18/19

to Google Cloud Translation API

Hi Maria,

Well I would use memoQ that I believe offers 2 types of editing distance calculation and is rather user-friendly in this regard. You would need to import your strings file (create a rather simple regex text filter if needed), configure a Google MT plugin with any available API access token for this project (n.b. it will be using the basic untrained NMT model even if you plug in API token for the one that is trained), pre-translate the content with MT option on. Create a snapshot (translation version "screenshot").

Separately, have the script you created work with CGP API to translate the same set of data using the trained model. When this is done done, export a bilingual RTF from memoQ, swap the data in the right-hand side column with the translation you have in the custom-trained model translated strings file, re-import the bilingual, which will update the segments that are different from the original basic NMT model output. Create a snapshot of this version. Right-click the strings file in the project, select History/Reports, select the two snapshots while pressing CTRL, click the Calculate edit distance link. Select which measurement you are interested in (Levenshtein/fuzzy) and click Calculate.

You can also export a change-tracked Word document if you are interested in a more granular analysis.

Hope this helps.

Thanks,

Ilia Shifrin

maria c laguardia

unread,

Feb 19, 2019, 8:07:52 AM2/19/19

to Google Cloud Translation API

Thank you very much for sharing a strategy, Ilia.

Kind regards

Maria C. Laguardia

m...@cpsl.com

unread,

Feb 27, 2019, 11:10:08 AM2/27/19

to Google Cloud Translation API

Hi Kan,

Maria's question is solved but I've just jumped in the conversation because we are trying to use our trained Google AutoML engines in Memsource and, once the engine has been created, we get the attached error message. Could you please help?