Re: Google Translate API - detect language + translate document (xlsx, csv)

217 views
Skip to first unread message

Carlos Recoder Moreno

unread,
Jul 30, 2021, 4:52:55 AM7/30/21
to Google Cloud Translation API

Hi there,

I would include the ‘detect_language’ function in the same script from the ‘batch_translate_text’ function, execute the first function inside the second one.

It would look something like this:   


from google.cloud import translate

def detect_language(project_id="YOUR_PROJECT_ID", content):

    """Detecting the language of a text string."""

    client = translate.TranslationServiceClient()

    location = "global"

    parent = f"projects/{project_id}/locations/{location}"


    # Detail on supported types can be found here:

    # https://cloud.google.com/translate/docs/supported-formats

    response = client.detect_language(

    #Here you should point to the content you want to translate

        content=content,

        parent=parent,

        mime_type="text/plain",  # mime types: text/plain, text/html

    )


    # Display list of detected languages sorted by detection confidence.

    # The most probable language is first.

    for language in response.languages:

    #Output the first language detected

             return language.language_code

        



And in the translate function you would need to change the course language:


def batch_translate_text(    

...

    # Execute the ‘detect_language’ function and assign it to a variable. Remember to pass the desired input parameter, in this case ‘input_configs_element’.

    source = detect_language(project_id="YOUR_PROJECT_ID", input_configs_element)

    # Supported language codes: https://cloud.google.com/translate/docs/language

    operation = client.batch_translate_text(

        request={

            "parent": parent,

           #The source now is the variable ‘source’ that you just created

            "source_language_code": source,

            "target_language_codes": ["ja"],  # Up to 10 language codes here.

            "input_configs": [input_configs_element],

            "output_config": output_config,

        }



This would be the main idea, but you would need to adapt the code to your use case.



On Tuesday, July 27, 2021 at 10:21:37 PM UTC+2 testu...@gmail.com wrote:
I'm trying to use Google Cloud Translation API for translating an excel (or csv) document that includes text in multiple languages and my target language is english. 

I would like to use "Translate text in batches (Advanced edition only)" code sample (link here: https://cloud.google.com/translate/docs/samples/translate-v3-batch-translate-text) but in the code sample is a line that defines the source language so there can only be one source language. 

But I need to detect the langugage first in the document and then translate the text to english. There is code sample for detecting language in a simple string of a text "Detecting languages (Advanced)" (link: https://cloud.google.com/translate/docs/advanced/detecting-language-v3) but I need to combine the first code sample that translates documents (but only has one source language defined) with the ability to detect language instead of having one source language defined. 

Is there this type of code sample in the resources? How could this be solved?

Here is the sample code in question:

from google.cloud import translate


def batch_translate_text(
    input_uri="gs://YOUR_BUCKET_ID/path/to/your/file.txt",
    output_uri="gs://YOUR_BUCKET_ID/path/to/save/results/",
    project_id="YOUR_PROJECT_ID",
    timeout=180,
):
    """Translates a batch of texts on GCS and stores the result in a GCS location."""

    client = translate.TranslationServiceClient()

    location = "us-central1"
    # Supported file types: https://cloud.google.com/translate/docs/supported-formats
    gcs_source = {"input_uri": input_uri}

    input_configs_element = {
        "gcs_source": gcs_source,
        "mime_type": "text/plain",  # Can be "text/plain" or "text/html".
    }
    gcs_destination = {"output_uri_prefix": output_uri}
    output_config = {"gcs_destination": gcs_destination}
    parent = f"projects/{project_id}/locations/{location}"

    # Supported language codes: https://cloud.google.com/translate/docs/language
    operation = client.batch_translate_text(
        request={
            "parent": parent,
            "source_language_code": "en",
            "target_language_codes": ["ja"],  # Up to 10 language codes here.
            "input_configs": [input_configs_element],
            "output_config": output_config,
        }
    )

    print("Waiting for operation to complete...")
    response = operation.result(timeout)

    print("Total Characters: {}".format(response.total_characters))
    print("Translated Characters: {}".format(response.translated_characters))
Reply all
Reply to author
Forward
0 new messages