AI translation - first to phonetics?

Sergei Grichine

unread,

Jul 11, 2024, 11:37:50 AM (7 days ago) Jul 11

to hbrob...@googlegroups.com

I stumbled upon an interesting effect today, while using Google Translate - for a short excerpt in Russian to English. Here is a screenshot:

You can notice that Google first appends the original Russian text with a "phonetic" conversion, spelled in English alphabet. It looks like only then the actual translation is done. To me, the outcome is of very good quality, no problem here.

So, maybe, somebody here has read/heard anything about it? Looks like a smart trick to me, to help dealing with other alphabets and hieroglyphic notations.

--

Best Regards,

-- Sergei

Gmail

unread,

Jul 11, 2024, 1:26:02 PM (7 days ago) Jul 11

to hbrob...@googlegroups.com

That’s interesting.

Thomas

-

Need something prototyped, built or coded? I’ve been building prototypes for companies for 15 years. I am now incorporating generative AI into products.

-

Need a great hardworking engineer? I am currently looking for a new job opportunity in robotics and/ or AI.

Contact me directly or through LinkedIn:

https://www.linkedin.com/in/ai-robotics/

On Jul 11, 2024, at 8:37 AM, Sergei Grichine <vital...@gmail.com> wrote:

I stumbled upon an interesting effect today, while using Google Translate - for a short excerpt in Russian to English. Here is a screenshot:

<image.png>

You can notice that Google first appends the original Russian text with a "phonetic" conversion, spelled in English alphabet. It looks like only then the actual translation is done. To me, the outcome is of very good quality, no problem here.

So, maybe, somebody here has read/heard anything about it? Looks like a smart trick to me, to help dealing with other alphabets and hieroglyphic notations.

--

Best Regards,

-- Sergei

--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hbrobotics/CA%2BKVXVPrqX%2B17Ypfo8jXszUTqUYCxEcVPo87v816HXN1ov%2BcPw%40mail.gmail.com.

Stephen Williams

unread,

Jul 11, 2024, 2:30:27 PM (7 days ago) Jul 11

to hbrob...@googlegroups.com

Google Translate generally also has a button to hear the text spoken. Perhaps it is there for that?

There is an international phoneme 'alphabet' that tries to capture every sound a human can make. It would make sense to use this as the lingua franca of a verbalization system.

Looks like there are several alternatives: https://www.internationalphoneticalphabet.org/ipa-sounds/ipa-chart-with-sounds/

sdw

--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hbrobotics/CA%2BKVXVPrqX%2B17Ypfo8jXszUTqUYCxEcVPo87v816HXN1ov%2BcPw%40mail.gmail.com.

--

Stephen D. Williams
Founder: VolksDroid, Blue Scholar Foundation

Chris Albertson

unread,

Jul 11, 2024, 3:06:43 PM (7 days ago) Jul 11

to hbrob...@googlegroups.com

I have been playing with the Whisper model on my Mac. It runs locally and does not require so much RAM for processing. It accepts digitized audio of speach in any of many languages and outputs English text.

So there is at least one other model around that takes multiple language phonetics to English text.

Because it is open source and runs locally, I can look at is and know a little bit about how it works. It seems the neural model is trained to do this, there is no wrapper software to first decide what the input is. It is all done in the model weights.

On Jul 11, 2024, at 11:30 AM, 'Stephen Williams' via HomeBrew Robotics Club <hbrob...@googlegroups.com> wrote:

Google Translate generally also has a button to hear the text spoken. Perhaps it is there for that?
There is an international phoneme 'alphabet' that tries to capture every sound a human can make. It would make sense to use this as the lingua franca of a verbalization system.
Looks like there are several alternatives: https://www.internationalphoneticalphabet.org/ipa-sounds/ipa-chart-with-sounds/
sdw

On 7/11/24 8:33 AM, Sergei Grichine wrote:

I stumbled upon an interesting effect today, while using Google Translate - for a short excerpt in Russian to English. Here is a screenshot:

<image.png>

You can notice that Google first appends the original Russian text with a "phonetic" conversion, spelled in English alphabet. It looks like only then the actual translation is done. To me, the outcome is of very good quality, no problem here.

So, maybe, somebody here has read/heard anything about it? Looks like a smart trick to me, to help dealing with other alphabets and hieroglyphic notations.

--

Best Regards,

-- Sergei

--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hbrobotics/CA%2BKVXVPrqX%2B17Ypfo8jXszUTqUYCxEcVPo87v816HXN1ov%2BcPw%40mail.gmail.com.

--

Stephen D. Williams
Founder: VolksDroid, Blue Scholar Foundation

650-450-8649 | fax:703-995-0407 | s...@lg.net | https://VolksDroid.org | https://BlueScholar.org | https://sdw.st/in

--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/hbrobotics/2202029e-7415-4dcd-add9-d4f1877fd84d%40lig.net.

Gmail

unread,

Jul 11, 2024, 5:00:00 PM (6 days ago) Jul 11

to hbrob...@googlegroups.com

What kind of response does it give you?

Thomas

-

Need something prototyped, built or coded? I’ve been building prototypes for companies for 15 years. I am now incorporating generative AI into products.

-

Need a great hardworking engineer? I am currently looking for a new job opportunity in robotics and/ or AI.

Contact me directly or through LinkedIn:

https://www.linkedin.com/in/ai-robotics/

On Jul 11, 2024, at 12:06 PM, Chris Albertson <alberts...@gmail.com> wrote:

I have been playing with the Whisper model on my Mac. It runs locally and does not require so much RAM for processing. It accepts digitized audio of speach in any of many languages and outputs English text.

To view this discussion on the web visit https://groups.google.com/d/msgid/hbrobotics/7C36EE2D-77CC-410B-8778-3CD31F42EC6D%40gmail.com.

Chris Albertson

unread,

Jul 11, 2024, 5:35:15 PM (6 days ago) Jul 11

to hbrob...@googlegroups.com

On Jul 11, 2024, at 1:59 PM, Gmail <thomas...@gmail.com> wrote:

What kind of response does it give you?

If you are askingh about Whisper, it depends on which of the models you run. They have from “tiny” to “large”, multi-lingual and English only to choose from. And then of course it depends on how fast your hardware is.

I’m on an Apple Silicon (AKA “ARM”) Mac, An M2-Pro with 16GB RAM. So it is a mid range computer. I would compare it to a newer Intel PC with mid-range Nvidia GPU.

As for speed, the model takes some time to load as it can be big, up to 10GB that needs to be read off the disk. but after loading it is nearly real-time for the mid to small models and a second of so delay for the large model.

I spent zero effort to make it run faster. Igt would be easy to gain quite a lot of performance if I would use a different version of PyTorch that oukld make better us of the hardware in the Mac. But I found it works OK and will leave optimization for later.

This is free open source from openAI and seems to be the best speech-to-text system currently available. it is well-enough documented on the openAI GitHub page

openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision

github.com

To view this discussion on the web visit https://groups.google.com/d/msgid/hbrobotics/BDED551C-6DC7-45E1-A2E0-519B13ECF05C%40gmail.com.

Reply all

Reply to author

Forward