start and end point of each word in TTS audio?

19 views

Skip to first unread message

David Epstein

unread,

May 17, 2021, 9:17:15 AM5/17/21

to Google Cloud Developers

Hi folks. Can the Cloud Text-to-Speech API provide additional information about the audio files it produces? For example, I'd appreciate an additional CSV file with the start and end point of each word in sample frames or seconds. Or alternatively, the start and end point of the spaces between words.

George (Cloud Platform Support)

unread,

May 19, 2021, 4:27:12 PM5/19/21

to Google Cloud Developers

Hello David,

A more detailed description of your use-case my help in determining your actual needs. A CSV file is not apparently needed, as you have the initial text file, which served as the basis for generating the audio file by the API. You may consider using the Speech to Text API to get some tags and metadata related to the audio file in question. If this is related to a programming project on your side, you may consider asking the same question in forums such as stackoverflow, where programmers are ready to help.

Reply all

Reply to author

Forward

0 new messages