Is there a way to translate documents in a batch without passing the source language code in Google Cloud

126 views
Skip to first unread message

Bindu Sri

unread,
Jul 29, 2021, 5:02:12 AM7/29/21
to Google Cloud Developers

I want to build a solution that should process batch documents which are in multiple languages. As per the Google documentation, it accepts list of documents but the source language code is mandatory and it accepts only one language code.

I want the application to be global and translate the batch documents auto detecting the language. Please suggest if there is any alternative to perform this action using cloud Translation API.

Below is the code from Google Documentation

from google.cloud import translate_v3beta1 as translate

def batch_translate_document( input_uri: str, output_uri: str, project_id: str, timeout=180):

client = translate.TranslationServiceClient() # The ``global`` location is not supported for batch translation location = "us-central1" 
  # Google Cloud Storage location for the source input. This can be a single file 
# (for example, ``gs://translation-test/input.docx``) or a wildcard # (for example, ``gs://translation-test/*``). 

gcs_source = {"input_uri": input_uri} 
 batch_document_input_configs = { "gcs_source": gcs_source, } 
gcs_destination = {"output_uri_prefix": output_uri} 
batch_document_output_config = {"gcs_destination": gcs_destination} 
parent = f"projects/{project_id}/locations/{location}" 
# Supported language codes: https://cloud.google.com/translate/docs/language operation = client.batch_translate_document( 
 request={ "parent": parent,
  "source_language_code": "en-US",
  "target_language_codes": ["fr-FR"], 
  "input_configs": [batch_document_input_configs], 
  "output_config": batch_document_output_config, } ) 
  print("Waiting for operation to complete...") 
response = operation.result(timeout) 

  print("Total Pages: {}".format(response.total_pages))
Reply all
Reply to author
Forward
0 new messages