Vocalizer TTS Voice V3.4.3 [Unlocked] [Latest]

81 views

Skip to first unread message

Alfonzo Liebenstein

unread,

May 5, 2024, 3:01:00 PM5/5/24

to clinoniner

Vocalizer is an integrated TTS engine that provides expressive and natural voices in more than 50 languages. Vocalizer enriches the user experience for a variety of applications on your device, such as GPS navigation, reading eBooks and assistive software.
Characteristics
* Support for over 120 voices in more than 50 different languages.
* Emoji support
* Easy customization of pronunciation through a user dictionary.
* Customization of the reading speed and height.
* Reading preferences for number and punctuation.
* And much more!

Speaker Recognition service is generally available (GA) now. Speech SDK APIs are available on C++, C#, Java, and JavaScript. With Speaker Recognition, you can accurately verify and identify speakers by their unique voice characteristics. For more information about this topic, see the documentation.

Vocalizer TTS Voice v3.4.3 [Unlocked] [Latest]

Download — https://t.co/1GTpl3kvqk

Personal voice is available in preview in the following regions: West Europe, East US, and South East Asia. With personal voice (preview), you can get AI generated replication of your voice (or users of your application) in a few seconds. You provide a one-minute speech sample as the audio prompt, and then use it to generate speech in any of the more than 90 languages supported across more than 100 locales.

Text to speech avatar converts text into a digital video of a photorealistic human (either a prebuilt avatar or a custom text to speech avatar) speaking with a natural-sounding voice. The text to speech avatar video can be synthesized asynchronously or in real time. Developers can build applications integrated with text to speech avatar through an API, or use a content creation tool on Speech Studio to create video content without coding.

The multilingual voices en-US-JennyMultilingualV2Neural and en-US-RyanMultilingualNeural auto-detect the language of the input text. However, you can still use the element to adjust the speaking language for these voices.

In order to speak in a language other than English, the current implementation of the en-US-JennyMultilingualNeural voice requires that you set the element. We anticipate that during Q4 calendar year 2023, the en-US-JennyMultilingualNeural voice will be updated to speak in the language of the input text without the element. This will be in parity with the en-US-JennyMultilingualV2Neural voice.

Ten new languages introduced - 20 new voices in 10 new locales are added into the neural TTS language list: Yan in en-HK English (Hongkong), Sam in en-HK English (Hongkong), Molly in en-NZ English (New Zealand), Mitchell in en-NZ English (New Zealand), Luna in en-SG English (Singapore), Wayne in en-SG English (Singapore), Leah in en-ZA English (South Africa), Luke in en-ZA English (South Africa), Dhwani in gu-IN Gujarati (India), Niranjan in gu-IN Gujarati (India), Aarohi in mr-IN Marathi (India), Manohar in mr-IN Marathi (India), Elena in es-AR Spanish (Argentina), Tomas in es-AR Spanish (Argentina), Salome in es-CO Spanish (Colombia), Gonzalo in es-CO Spanish (Colombia), Paloma in es-US Spanish (US), Alonso in es-US Spanish (US), Zuri in sw-KE Swahili (Kenya), Rafiki in sw-KE Swahili (Kenya).

Eleven new en-US voices in preview - 11 new en-US voices in preview are added to American English, they are Ashley, Amber, Ana, Brandon, Christopher, Cora, Elizabeth, Eric, Michelle, Monica, Jacob.

Five zh-CN Chinese (Mandarin, Simplified) voices are generally available - 5 Chinese (Mandarin, Simplified) voices are changed from preview to generally available. They are Yunxi, Xiaomo, Xiaoman, Xiaoxuan, Xiaorui. Now, these voices are available in all regions. Yunxi is added with a new 'assistant' style, which is suitable for chat bot and voice agent. Xiaomo's voice styles are refined to be more natural and featured.

Six new languages introduced - 12 new voices in 6 new locales are added into the neural TTS language list: Nia in cy-GB Welsh (United Kingdom), Aled in cy-GB Welsh (United Kingdom), Rosa in en-PH English (Philippines), James in en-PH English (Philippines), Charline in fr-BE French (Belgium), Gerard in fr-BE French (Belgium), Dena in nl-BE Dutch (Belgium), Arnaud in nl-BE Dutch (Belgium), Polina in uk-UA Ukrainian (Ukraine), Ostap in uk-UA Ukrainian (Ukraine), Uzma in ur-PK Urdu (Pakistan), Asad in ur-PK Urdu (Pakistan).

Five languages from preview to GA - 10 voices in 5 locales introduced in November now are GA: Kert in et-EE Estonian (Estonia), Colm in ga-IE Irish (Ireland), Nils in lv-LV Latvian (Latvia), Leonas in lt-LT Lithuanian (Lithuania), Joseph in mt-MT Maltese (Malta).

With this release, we now support a total of 142 neural voices across 60 languages/locales. In addition, over 70 standard voices are available in 49 languages/locales. Visit Language support for the full list.

Neural Text to speech now includes the viseme event. Viseme events allow users to get a sequence of facial poses along with synthesized speech. Visemes can be used to control the movement of 2D and 3D avatar models, matching mouth movements to synthesized speech. Viseme events are only available for en-US-AriaNeural voice at this time.

46 new voices in GA locales: Shakir in ar-EG Arabic (Egypt), Hamed in ar-SA Arabic (Saudi Arabia), Borislav in bg-BG Bulgarian (Bulgaria), Joana in ca-ES Catalan, Antonin in cs-CZ Czech (Czech Republic), Jeppe in da-DK Danish (Denmark), Jonas in de-AT German (Austria), Jan in de-CH German (Switzerland), Nestoras in el-GR Greek (Greece), Liam in en-CA English (Canada), Connor in en-IE English (Ireland), Madhur in en-IN Hindi (India), Mohan in en-IN Telugu (India), Prabhat in en-IN English (India), Valluvar in en-IN Tamil (India), Enric in es-ES Catalan, Kert in et-EE Estonian (Estonia), Harri in fi-FI Finnish (Finland), Selma in fi-FI Finnish (Finland), Fabrice in fr-CH French (Switzerland), Colm in ga-IE Irish (Ireland), Avri in he-IL Hebrew (Israel), Srecko in hr-HR Croatian (Croatia), Tamas in hu-HU Hungarian (Hungary), Gadis in id-ID Indonesian (Indonesia), Leonas in lt-LT Lithuanian (Lithuania), Nils in lv-LV Latvian (Latvia), Osman in ms-MY Malay (Malaysia), Joseph in mt-MT Maltese (Malta), Finn in nb-NO Norwegian, Bokmål (Norway), Pernille in nb-NO Norwegian, Bokmål (Norway), Fenna in nl-NL Dutch (Netherlands), Maarten in nl-NL Dutch (Netherlands), Agnieszka in pl-PL Polish (Poland), Marek in pl-PL Polish (Poland), Duarte in pt-BR Portuguese (Brazil), Raquel in pt-PT Portuguese (Potugal), Emil in ro-RO Romanian (Romania), Dmitry in ru-RU Russian (Russia), Svetlana in ru-RU Russian (Russia), Lukas in sk-SK Slovak (Slovakia), Rok in sl-SI Slovenian (Slovenia), Mattias in sv-SE Swedish (Sweden), Sofie in sv-SE Swedish (Sweden), Niwat in th-TH Thai (Thailand), Ahmet in tr-TR Turkish (Türkiye), NamMinh in vi-VN Vietnamese (Vietnam), HsiaoChen in zh-TW Taiwanese Mandarin (Taiwan), YunJhe in zh-TW Taiwanese Mandarin (Taiwan), HiuMaan in zh-HK Chinese Cantonese (Hong Kong Special Administrative Region), WanLung in zh-HK Chinese Cantonese (Hong Kong SAR).

5 new voices in preview locales: Kert in et-EE Estonian (Estonia), Colm in ga-IE Irish (Ireland), Nils in lv-LV Latvian (Latvia), Leonas in lt-LT Lithuanian (Lithuania), Joseph in mt-MT Maltese (Malta).

With this release, we now support a total of 129 neural voices across 54 languages/locales. In addition, over 70 standard voices are available in 49 languages/locales. Visit Language support for the full list.

Containers: Neural text to speech Container released in public preview with 16 voices available in 14 languages. Learn more on how to deploy Speech Containers for Neural text to speech

Neural text to speech: new speaking style for en-US Aria voice. AriaNeural can sound like a news caster when reading news. The 'newscast-formal' style sounds more serious, while the 'newscast-casual' style is more relaxed and informal. See how to use the speaking styles in SSML.

Custom Voice: a new feature is released to automatically check training data quality. When you upload your data, the system will examine various aspects of your audio and transcript data, and automatically fix or filter issues to improve the quality of the voice model. This covers the volume of your audio, the noise level, the pronunciation accuracy of speech, the alignment of speech with the normalized text, silence in the audio, in addition to the audio and script format.

From the options menu, you can select a language, type of voice (female or male), and tone. Once you've created the perfect narrator for you, all your texts will be read with that voice. Of course, you can always go back to the options menu and adjust the voice again.

The Advanced Controls differentiates this plugin from its younger sibling, the Clarity Vx. The top added functionalities include a Reflections knob that restores a certain amount of the natural voice reflections without adding extra reverb. The Analysis button will process the audio as mono or stereo, achieving better results with the latter. The stereo processing adds a heavier load on the CPU than the Single mode, so bear that in mind.