You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Google Cloud Translation API
Around 5% of numbers, digits, and dates are being incorrectly translated by my custom model.
Are there any solutions to rectify this issue? I would be greatly appreciated if anyone could point me towards possible solutions. Based on our research, users tend to fixate on incorrectly translated numbers and correlate them to low-quality translation.
Kan Qiao
unread,
May 28, 2019, 3:40:31 PM5/28/19
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Google Cloud Translation API
What are language pairs you are looking at?
We are working on improving number translation quality.
Erik Chan
unread,
May 28, 2019, 3:43:00 PM5/28/19
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Google Cloud Translation API
EN <-> ZH-CN
If I can help in any way do let me know!
Kan Qiao
unread,
May 28, 2019, 3:46:48 PM5/28/19
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Google Cloud Translation API
We are working on improving number translation quality of En->Zh.
Can you send us some data you want to evaluate on? We could produce some output for you using our current solution.
Erik Chan
unread,
May 28, 2019, 4:17:12 PM5/28/19
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Google Cloud Translation API
Attached are some EN examples where we've encountered incorrect ZH-CN translations
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Google Cloud Translation API
FYI, We previously also tried to split long numbers in our training data into separate tokens (eg 7950.10 -> 7+ 9+ 5+ 0+ .+ 1+ 0) and retrained a custom model to lessen variation. But results were all over the place, possibly because the base google nmt wasn't trained that way?
Erik Chan
unread,
May 30, 2019, 12:07:50 AM5/30/19
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Google Cloud Translation API
Kan, attached are a couple simpler and more obvious examples where the numbers are incorrect.
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Google Cloud Translation API
Erik, we also need to know the model ID. What is the custom model you used when you saw these errors?
Erik Chan
unread,
May 30, 2019, 1:50:18 PM5/30/19
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Google Cloud Translation API
The Model IDs are in the third column of the two spreadsheets, is that what you're looking for?
Kan Qiao
unread,
May 30, 2019, 1:54:33 PM5/30/19
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Google Cloud Translation API
Yeah, but how about the model id you use for the first file you sent?
EN_SRC_test_numbers.txt
Erik Chan
unread,
May 30, 2019, 2:33:03 PM5/30/19
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Google Cloud Translation API
Kan
The texts from that file is a mixed bag (mainly because we were trying to split numbers into separate digits on some models). Best to ignore that file for the time being. We can provide more and other inaccurate number issues if need so.
Erik
Erik Chan
unread,
Jun 18, 2019, 5:12:24 PM6/18/19
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Google Cloud Translation API
Kan,
FYI we were originally hoping to log all number errors we encountered and see if any recurring patterns were fixable via pre/post-translation editing.
So far it seems like the errors are erratic and differ between models. An issue with one model may work fine on another model and vice versa. Any suggestions?