Google Speech API for speech files of length more than one minute

602 views
Skip to first unread message

Innovation1 ILP

unread,
Aug 3, 2017, 9:03:46 AM8/3/17
to Google Cloud Developers
I have used google cloud speech to text api for converting a flac file (which is saved in google cloud bucket) to text. But, I'm not able to get the text which comes after 1 min in the speech.


I'm using the following python code for the conversion,


       
    def transcribe_gcs(gcs_uri):
      """Asynchronously transcribes the audio file specified by the gcs_uri."""
      from google.cloud import speech
      from google.cloud.speech import enums
      from google.cloud.speech import types
      client = speech.SpeechClient()

      audio = types.RecognitionAudio(uri=gcs_uri)
      config = types.RecognitionConfig(
        encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
        sample_rate_hertz=16000,
        language_code='en-IN',
        enable_word_time_offsets=True)

    operation = client.long_running_recognize(config, audio)

    retry_count = 100
    while retry_count > 0 and not operation.done():
        retry_count -= 1
        time.sleep(2)

    if not operation.done():
        print('Operation not complete and retry limit reached.')
        return

    alternatives = operation.result().results[0].alternatives
    for alternative in alternatives:
        print('Transcript: {}'.format(alternative.transcript))
        print('Confidence: {}'.format(alternative.confidence))

        for word_info in alternative.words:
            word = word_info.word
            start_time = word_info.start_time
            end_time = word_info.end_time
            print('Word: {}, start_time: {}, end_time: {}'.format(
                word,
                start_time.seconds + start_time.nanos * 1e-9,
                end_time.seconds + end_time.nanos * 1e-9))



According to google "long_running_recognize" should recognize speech files of length more than one minute.. What would be the possible reason for not getting text which comes after 1 min duration in the speech?? Thanks in advance.

George (Cloud Platform Support)

unread,
Aug 3, 2017, 9:52:30 AM8/3/17
to Google Cloud Developers
A copy of the flac faile in question would be needed to so we may try to reproduce the error here. 

Meanwhile you may compare your Python code with the samples provided in the GoogleCloudPlatform repository

Dariush Azimi

unread,
Jan 20, 2018, 5:30:17 PM1/20/18
to Google Cloud Developers
I am having the exact same issue.

I have uploaded my file to google cloud in wav format.
Using the sample transcribe_async.py file I pass the location of the wav file as a parameter but all I get is " Waiting for operation to complete"

Can u at least provide an example of how one is supposed to pass the URI location?

python transcribe_async_gcsUri.py gs://mybucketdaz12
/audio.raw



Here is the transcribe_async.py file I am using:


Example usage:
    python transcribe_async.py resources/audio.raw
    python transcribe_async.py gs://cloud-samples-tests/speech/vr.flac
"""

import argparse
import io

# [START def_transcribe_gcs]
def transcribe_gcs(gcs_uri):
    """Asynchronously transcribes the audio file specified by the gcs_uri."""
    from google.cloud import speech
    from google.cloud.speech import enums
    from google.cloud.speech import types
    client = speech.SpeechClient()

    audio = types.RecognitionAudio(uri=gcs_uri)
    config = types.RecognitionConfig(
        encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
        sample_rate_hertz=16000,
        language_code='en-US')

    operation = client.long_running_recognize(config, audio)

    print('Waiting for operation to complete...')
    response = operation.result(timeout=90)

    # Each result is for a consecutive portion of the audio. Iterate through
    # them to get the transcripts for the entire audio file.
    for result in response.results:
        # The first alternative is the most likely one for this portion.
        print('Transcript: {}'.format(result.alternatives[0].transcript))
        print('Confidence: {}'.format(result.alternatives[0].confidence))
# [END def_transcribe_gcs]


if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description=__doc__,
        formatter_class=argparse.RawDescriptionHelpFormatter)
    parser.add_argument(
        'path', help='File or GCS path for audio file to be recognized')
    args = parser.parse_args()
    if args.path.startswith('gs://'):
        transcribe_gcs(args.path)
    else:
        transcribe_file(args.path)
Reply all
Reply to author
Forward
0 new messages