Numbers & dates incorrectly translated

83 views
Skip to first unread message

Erik Chan

unread,
May 28, 2019, 3:28:55 PM5/28/19
to Google Cloud Translation API

Around 5% of numbers, digits, and dates are being incorrectly translated by my custom model. 

Are there any solutions to rectify this issue? I would be greatly appreciated if anyone could point me towards possible solutions.  Based on our research, users tend to fixate on incorrectly translated numbers and correlate them to low-quality translation.

Kan Qiao

unread,
May 28, 2019, 3:40:31 PM5/28/19
to Google Cloud Translation API
What are language pairs you are looking at?
We are working on improving number translation quality.

Erik Chan

unread,
May 28, 2019, 3:43:00 PM5/28/19
to Google Cloud Translation API
EN <-> ZH-CN

If I can help in any way do let me know!

Kan Qiao

unread,
May 28, 2019, 3:46:48 PM5/28/19
to Google Cloud Translation API
We are working on improving number translation quality of En->Zh.
Can you send us some data you want to evaluate on? We could produce some output for you using our current solution.

Erik Chan

unread,
May 28, 2019, 4:17:12 PM5/28/19
to Google Cloud Translation API
Attached are some EN examples where we've encountered incorrect ZH-CN translations
EN_SRC_test_numbers.txt

Kan Qiao

unread,
May 29, 2019, 9:55:06 AM5/29/19
to Google Cloud Translation API
And what is the model ID you are using?

Erik Chan

unread,
May 29, 2019, 11:55:09 AM5/29/19
to Google Cloud Translation API
Kan, 

I've attached a spreadsheet with problems from one of our models. I will provide examples from our other models soon!

thanks
Erik
Book1.xlsx

Erik Chan

unread,
May 29, 2019, 12:26:17 PM5/29/19
to Google Cloud Translation API
FYI, We previously also tried to split long numbers in our training data into separate tokens (eg 7950.10 -> 7+ 9+ 5+ 0+ .+ 1+ 0) and retrained a custom model to lessen variation. But results were all over the place, possibly because the base google nmt wasn't trained that way?

Erik Chan

unread,
May 30, 2019, 12:07:50 AM5/30/19
to Google Cloud Translation API
Kan, attached are a couple simpler and more obvious examples where the numbers are incorrect.

Book2.xlsx

Kan Qiao

unread,
May 30, 2019, 1:45:39 PM5/30/19
to Google Cloud Translation API
Erik, we also need to know the model ID. What is the custom model you used when you saw these errors?

Erik Chan

unread,
May 30, 2019, 1:50:18 PM5/30/19
to Google Cloud Translation API
The Model IDs are in the third column of the two spreadsheets, is that what you're looking for?

Kan Qiao

unread,
May 30, 2019, 1:54:33 PM5/30/19
to Google Cloud Translation API
Yeah, but how about the model id you use for the first file you sent?
EN_SRC_test_numbers.txt

Erik Chan

unread,
May 30, 2019, 2:33:03 PM5/30/19
to Google Cloud Translation API
Kan

The texts from that file is a mixed bag (mainly because we were trying to split numbers into separate digits on some models). Best to ignore that file for the time being. We can provide more and other inaccurate number issues if need so.

Erik

Erik Chan

unread,
Jun 18, 2019, 5:12:24 PM6/18/19
to Google Cloud Translation API
Kan, 

FYI we were originally hoping to log all number errors we encountered and see if any recurring patterns were fixable via pre/post-translation editing. 

So far it seems like the errors are erratic and differ between models. An issue with one model may work fine on another model and vice versa. Any suggestions?

Erik



Reply all
Reply to author
Forward
0 new messages