AutoML translation

Samuel Lacombe

unread,

Sep 6, 2018, 5:34:29 PM9/6/18

to Google Cloud Translation API

Hello,

Im really interested in trying to generate my costum translation model , but the prices are stopping me as I don't how much time Google will takes to process all my TMX files.

I have about 200 TMX files which containt between 300 to 1000 sentences. Averagly 400.

1)Any ideal what i should expect?

2)if the service get modified and themodels are not supported anymore, do I need to pay again to train the model?

Thank you!

Yingjie He

unread,

Sep 6, 2018, 5:52:24 PM9/6/18

to Google Cloud Translation API

Hi,

1. Is it possible to combine these TMX files?

2. Don't worry about the existing model compatibility problem, even we update our service, you can still use your trained models. But if you try to train a new model, you will still be charged.

Samuel Lacombe

unread,

Sep 6, 2018, 6:13:56 PM9/6/18

to Google Cloud Translation API

It's possible to combine the files, but it doesn't help me to figure out how much it would cost me. I'm a single individual and 300$ or 500$ isn't the same to me (or higher)

Thank you for your fast response
Best regards,
Samuel

Steeve

unread,

Sep 7, 2018, 1:20:04 PM9/7/18

to Google Cloud Translation API

I was able to find this public reference [1] stating that: "the cost for training a model is $76.00 per hour. The time required to train your model depends on the size and complexity of your training data." And the "usage of AutoML Translation is calculated in terms of how many characters you send for translation". Thus, 0 - .5 million characters is free and .5 - 5 million characters would be $80 per 1,000,000 characters. You can find more details by reading the link down below.

[1]: https://cloud.google.com/translate/automl/pricing

Samuel Lacombe

unread,

Sep 7, 2018, 1:41:10 PM9/7/18

to Google Cloud Translation API

Hello Steeve,

I seen that documentation, but yet it doesn't give a good idea of the execution time on big data. Should i be expecting 10h or 3h? The price difference is huge.

Thank you,
Sam

Yingjie He

unread,

Sep 7, 2018, 2:59:09 PM9/7/18

to Google Cloud Translation API

The training time depends on the size of your data set, on average, the total training time is about 3 ~ 4 hours if your data set has 10,000 - 100,000 sentences.

NI Localization

unread,

Sep 27, 2018, 6:48:03 PM9/27/18

to Google Cloud Translation API

Hi, I have the exact same concern as Samuel.

I want to try the AutoML Translation API and I'm about to train a dataset, but I'm worried about cost.

First, this is a beta program, is there really a cost associated with the training (documentation says $76/hr). How do you justify that on a BETA program?

Second, it says training is TYPICALLY 3 hours, but it doesn't define what typical conditions are. I'm about to train a dataset of nearly 500 000 sentences (English to French), with a validation and test set of 10 000 sentences each. Am I really looking at a 3hr training for about $250, or could it be a lot more than this?

I find it very risky to start the training without having a clue about the quality of the results and a clue about the cost. I really think this should be FREE until you are satisfied and deploy the system to production (aka not BETA).

At this point, I'm waiting for a better answer before proceeding.

Thank you

Michel Farhi-Chevillard

Senior Localization Manager at National Instruments

michael zhang

unread,

Sep 28, 2018, 2:29:14 AM9/28/18

to Google Cloud Translation API

Hi Michael,

First off, I think your concern is justified.

I spent about US$ 300+ in training my own model (domain is investment banking and language pair is English into Simplified Chinese).

But only to found that I can't use the Rest API and Python version as I expected WITHOUT any solid programming knowledge. The outcome is far from good because Google team didn't warn non-developer like me that the TM file should be as simple as their sample ( a bilingual tmx file from srt as subtitle file) otherwise there are many tag soups in your trained model, which is totally useless for practice.

Maybe, a best practice is that you may try the USD 300 credit free credit granted by GCP and use that credit to train or test the AutoML translation service otherwise it could be totally waste of your real money.

Good luck and happy training.

Best,

Michael Zhang

Kan Qiao

unread,

Sep 29, 2018, 12:08:35 AM9/29/18

to Google Cloud Translation API

Hi, Michel

To help you estimate the training budget, we plan to provide a training cost estimate (based on language pair, sentence count, etc) soon. For model quality, it is impossible to guarantee the model performance, since it depends on the training data you provide. We are working to provide tools that detect data quality issues. Hopefully this will reduce the possibility of producing a bad model.

Michael:

We understand the difficulty for people without programming experience to use the prediction API. We are working to further reduce this difficulty with more integrations.

One clarifying question: Can you elaborate on: “otherwise there are many tag soups in your trained model”? What is the input format you want us to support (if you think tsv/tmx are not good enough)?

michael zhang

unread,

Sep 29, 2018, 12:42:42 AM9/29/18

to Google Cloud Translation API

Hi Kan Qiao,

Thanks for your prompt reply and glad to know that you will low the bar for non-developer like me to use the Prediction API. More concrete and detailed info is expected.

As per your question, I uploaded my own TMX file onto your AutoML Translation UI to train models but only to find that the attempts often failed. At last, I found that the structure of tmx file Google uses as the sample is much simpler than mine.

Anyway, after I edited some structure of my TMX file it worked but there are many tags in the translated texts when I use the trained model to predict(aka translate) new texts. I know it is because there are many unnecessary tags in my own tmx file but maybe Google can flag that issue to me before training the model?

Thanks.

Michael

Steeve

unread,

Oct 1, 2018, 11:55:15 AM10/1/18

to Google Cloud Translation API

Hello,

I'm glad to hear that after changing your TMX file structure, it worked as expected. Thus, I would encourage you the following link here [1] to report an issue.

[1]: https://cloud.google.com/support/docs/issue-trackers#search_for_or_create_bugs_and_feature_requests_by_product

Thanks.

Kan Qiao

unread,

Oct 2, 2018, 1:42:57 PM10/2/18

to Google Cloud Translation API

Hi, Michael:

Yes. we will consider this feature request in our planning. Thanks!

NI Localization

unread,

Oct 5, 2018, 12:18:48 PM10/5/18

to Google Cloud Translation API

Ok Google (pun intended), so if I have removed all tags from my TMX training data (by doing an import of the original TMX in Trados as plain text and then re-exporting), am I good to go and should I start the training now and expect about a 3 hr charge with a corpus of about 500K TUs from English to French? Is there any other type of cleanup in the TMX that you recommend?

Kan Qiao

unread,

Oct 5, 2018, 12:24:55 PM10/5/18

to Google Cloud Translation API

Every <tu> unit (in tmx) should only contain plain text.

For 500k TUs, you should expect 2.5-5 hours

NI Localization

unread,

Oct 7, 2018, 10:37:34 AM10/7/18

to Google Cloud Translation API

Indeed, I confirm a timing of around 3.5 hours to train. Started at around 10:15AM and finished at 1:44PM. I guess I'll see on Monday in the billing console what the actual charges are.

French training.png

Reply all

Reply to author

Forward

AutoML translation - charges

Samuel Lacombe

Yingjie He

Samuel Lacombe

Steeve

Samuel Lacombe

Yingjie He

NI Localization

michael zhang

Kan Qiao

michael zhang

Steeve

Kan Qiao

NI Localization

Kan Qiao

NI Localization