Real-time speech recognition

David Cunningham

unread,

May 7, 2024, 8:02:25 PMMay 7

to UniMRCP

Hello,

We've been looking at the UniMRCP for Asterisk manual, trying to figure out how to do real-time speech recognition. We don't want to run an Asterisk dialplan application and have it block until the speech recognition is finished. We want to start the speech recognition, do lots of other stuff (eg Dial a destination), and then end the speech recognition when the call hangs up.

I found a message in this group from 2019 where Arsen said:

"This subject has been brought up several times but no progress has been made so far. The problem is the MRCP framework is primarily made for recognition of short utterances.

In order to support continuous speech transcription from Asterisk, several components in the path would need to be extended."

Does anyone know if a solution has been created since then? Or is real-time speech recognition still not possible?

Thanks in advance for any advice.

David Cunningham

unread,

May 8, 2024, 7:39:35 PMMay 8

to UniMRCP

Could anyone give me a quick pointer in the right direction?

I have referred to the UniMRCP for Asterisk manual, but can't see a dialplan application which will start speech recognition and then allow the dialplan to continue with recognition still running.

https://www.unimrcp.org/manuals/html/AsteriskManual.html

Thanks again.

Michael Levy

unread,

May 8, 2024, 9:23:59 PMMay 8

to uni...@googlegroups.com

I am not an Asterisk expert.

However, I think you are looking in the wrong place. When MRCP is used to support speech recognition, it is all about doing recognition as part of a conversation. A prompt is played, an utterance gets recognized. This repeats until the call is complete. MRCP recognition is also single channel. It is the caller's voice, but not the agent/IVR. Do you want both parts of the call in your transcript?

I think what you want to do is full call recording, or use the technology of call recording to fork the SIP call to two endpoints. One goes to your agent or IVR, one goes to your recorder or real-time recognition engine.

Take a look at SIPREC - https://www.rfc-editor.org/rfc/rfc7866.html or https://recordia.net/en/what-is-sip-recording-siprec/

Or perhaps treat it as a 3-way call in asterisk (or bridge transfer?) where one leg goes to your recognizer.

Your recognizer doesn't need MRCP, it just needs SIP/RTP.

--
You received this message because you are subscribed to the Google Groups "UniMRCP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unimrcp/d7a96870-f2b8-4eb1-ae5e-fda1c3537520n%40googlegroups.com.

David Cunningham

unread,

May 8, 2024, 10:38:33 PMMay 8

to UniMRCP

Thank you very much for the reply Michael.

We would probably fine to have the two channels with separate transcriptions, especially if we could use timestamps to show how they align.

Our Asterisk based system already as call recording, but we had thought that UniMRCP's integration with AWS or Google would allow real-time transcription of those recordings (ie while a person is still talking).

The unispeech website says "You can process audio in batch or in near real-time. Using a secure connection, you can send a live audio stream to the service, and receive a stream of text in response.", as per:

https://unispeech.io/product/transcribe-speech-recognition/

So what is this near real-time functionality? If we have to start listening, stop listening, send that for transcription, then I wouldn't call that near real-time. The mention of "receive a stream of text" makes it sound like you can receive text while the person is still speaking.

Have I misunderstood what the website says?

Thanks again for your help.

Mickael Hubert

unread,

May 9, 2024, 3:39:50 AMMay 9

to uni...@googlegroups.com

Hi David,

We use this system to transcribe sip conversation with our own transcription system.

But we use our specific daemon, developed on premise, so it's private.

Before our daemon, I searched an open source solution, but notre realy find a good solution.

Maybe you can watch this prototype :

https://github.com/pc-m/transcript-demo

Have a good day

--
You received this message because you are subscribed to the Google Groups "UniMRCP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+u...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/unimrcp/e0b14ed6-45f7-4e3f-8b8f-1118ff721028n%40googlegroups.com.

Michael Levy

unread,

May 9, 2024, 8:12:11 AMMay 9

to uni...@googlegroups.com

Nice. https://github.com/pc-m/transcript-demo says:

How it works

When a call enters the stasis application it will be added to the bridge
When the server starts listening on the configured port and create the external media channel
When RTP is received the payload is sent to Google Speech to text API and an HTML file is generated

This makes sense. They are not using MRCP.

To view this discussion on the web visit https://groups.google.com/d/msgid/unimrcp/CA%2BfRVSVP-vPzHFVwL7n62T7EMJZjrY3MKr6A6saovBAYR2WQYg%40mail.gmail.com.

Ken Walker

unread,

May 9, 2024, 12:34:47 PMMay 9

to UniMRCP

The asterisk team has a similar demo as well using ARI and JS.

https://github.com/asterisk/asterisk-external-media

ewrj...@gmail.com

unread,

May 9, 2024, 12:46:41 PMMay 9

to UniMRCP

Hi David,

For real time transcription during a call you could try Speechmatics ( https://www.speechmatics.com/product/real-time ). Basically you do some session set up then fork the call RTP and send it to them and they'll send back transcription snippets throughout the life of the audio stream. At the end you've then got a ready made transcription of the entire audio.

I found that uniMRCP could only effectively transcribe the entire audio stream at the end of the audio stream but not during it (which as you say required a Start/Stop loop and isn't real time).

David Cunningham

unread,

May 10, 2024, 6:45:26 PMMay 10

to UniMRCP

Thank you everyone for the replies. We'll give the https://github.com/asterisk/asterisk-external-media project a try since it seems that whatever solution is used, we need to use the Asterisk ARI to get the call audio.

Either I've missed something or it seems the UniSpeech website is a bit misleading when it says it can do "near real-time" transcription.

Vahagn Kocharyan

unread,

Jul 5, 2024, 11:52:15 AM (11 days ago) Jul 5

to UniMRCP

HI.You can get call audio from unimrcp side.change unimrcp timeouts to use Continuous Recognition Mode. You can use our sample application for sample configuration.

Thanks

David Cunningham

unread,

Jul 5, 2024, 4:56:21 PM (10 days ago) Jul 5

to uni...@googlegroups.com

Hello,

We have already moved on to use snoopChannel from the Asterisk ARI instead, but thank you.

You received this message because you are subscribed to a topic in the Google Groups "UniMRCP" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/unimrcp/Xw-FCvMloII/unsubscribe.
To unsubscribe from this group and all its topics, send an email to unimrcp+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unimrcp/45ce5ea1-8975-4f93-a123-bdaf775d2f9en%40googlegroups.com.

--

David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782

Reply all

Reply to author

Forward