Google translate BatchTranslation does not preserve quotation marks

333 views
Skip to first unread message

Rex

unread,
Mar 18, 2020, 8:09:25 PM3/18/20
to Google Cloud Translation API
Hi everyone,

I'm using BatchTranslation (from v3) to translate and English text file to another language (e.g. Italian). I have some words quoted in the sentences and would like to preserve them in the translation text as well. However, the output file omits those marks both in the original text and translated text.

For example google translate: 
I have an " apple " in my hand .  ------>  Ho una "mela" in mano.  

but BatchTranslation gives:      
Ho una mela in mano.


Changing the mime_type from "text/plain" to "text/html" did not help either.

I appreciate it if you know or have any ideas on how to fix this.

Thanks!

Ismail (Cloud Platform Support)

unread,
Mar 20, 2020, 12:08:44 PM3/20/20
to Google Cloud Translation API

Hi Rex, 


I tried reproducing your issue and it worked on my doing: 


  1. Created a txt file with: I have an “apple” in my hand. (See attached file)

  2. I used the following request body: 

curl -X POST \

-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \

-H "Content-Type: application/json; charset=utf-8" \

-d @request.json \

https://translation.googleapis.com/v3/projects/project-number-or-id/locations/us-central1:batchTranslateText


Note: You need to run the above to the ‘batchTranslateText’ endpoint. 


Try the above and let me know if you still need anything further.


translation.txt

Rex

unread,
Mar 20, 2020, 8:13:38 PM3/20/20
to Google Cloud Translation API
Thank you Ismail for your response.

What you did indeed works but if you change the file's extension to be '.tsv' instead of '.txt' the quotation marks will be gone in the final translation.

The reason I want to use '.tsv' is to provide indices for the sentences (in another column) so when they get shuffled during translation, I can re-order them again. 

I look forward to your response.

Thanks!

Ali T (Cloud Platform Support)

unread,
Mar 21, 2020, 11:48:08 AM3/21/20
to Google Cloud Translation API
Hey Rex,

Thanks for clarifying the file type being used. For TSV files, you would need to use escape characters for the quotation mark to not be ignored. As TSV files follow RFC 4180, your input sentence should be as follows:

I have an """ apple """ in my hand .

The RFC 4180 reference doc indicates that for double quotes to appear inside enclosed fields, they must be preceded with another double quote. In your case, “apple” is considered an enclosed field. If you want the quote to appear, two more should be added on each end: one indicating where the quotes should appear and another one as an escape character.  

Rex

unread,
Apr 6, 2020, 7:09:51 PM4/6/20
to Google Cloud Translation API
Hi Ali,

Thanks a lot for your response. It solved the problem!

Rex

unread,
Apr 20, 2020, 6:23:49 PM4/20/20
to Google Cloud Translation API
Hi, I have a follow-up question.
I was able to translate my sentences correctly and keep the quotations: I have an """ apple """ in my hand .  ------>  Ho una "mela" in mano.  

But now I want to be able to tell google translate, to not translate ONLY tokens enclose by quotation marks so I can have: I have an """ apple """ in my hand .  ------>  Ho una "apple" in mano.  

I am using a term set glossary for this purpose. I am also using triple quotes when creating the .csv file locally. I'm attaching my glossary files here: "default.csv" is the one I created locally, and "default_cloud.csv" is downloaded from cloud (it is created after I sent the request to create glossary).

However, the values inside quotation marks are still getting translated. Is there a fix for this? Should I generate my glossary differently?

Thanks for your help.
default.csv
default_cloud.csv

Rex

unread,
Apr 20, 2020, 7:56:52 PM4/20/20
to Google Cloud Translation API
To be more specific about the problem I just mentioned:

I can enforce google to not translate the values enclosed in quotation marks, but the problem is if I have the same token but not enclosed with quotation marks that parameter won't get translated too which is not what I want.

As an example if my sentence is this:  My friend whose name is apple have an """ apple """ in his hand .

the translation I get from google using the glossary I provided is this: Mi amigo cuyo nombre es apple tiene una "apple" en la mano.

But what I want it to be is: Mi amigo cuyo nombre es manzana tiene una "apple" en la mano.

So I need everything to be translated except for values within quotation marks. How can I achieve this? 

Thanks!

Konstantin Savenkov

unread,
Apr 22, 2020, 10:00:35 PM4/22/20
to Google Cloud Translation API
Hi Rex,

Is no-translate tags an option for you?

I.e. if you send something like that:

My friend whose name is apple have an <span translate=no>"apple"</span> in his hand.

you should get

Mi amigo cuyo nombre es manzana tiene una <span translate="no">"apple"</span> en la mano.

However, I didn't check it in the batch mode. But in the normal sync mode, I would you that, not glossaries.

cheers,
Konstantin.


Reply all
Reply to author
Forward
0 new messages