MRCP ASR Project Need Help

832 views
Skip to first unread message

Ömer Bitikçioğlu

unread,
Jul 21, 2022, 1:27:11 AM7/21/22
to UniMRCP
Hi,

I am an undergraduate researcher working on a MRCP project. My task is to establish a system which has ASR capability for a telephony application. When a telephone calls us we want to see the text form of the talk.

I discovered your UniMRCP project and thought that no need to implement MRCP from scratch, we can use it. But I am having a struggle about testing the system. I accomplished to establish the UniMRCP server but then what should I do, how can I send MRCP messages and media to the server and get back responses? And how to use those responses on a client site?

I have lots of empty spaces in my thoughts, thanks for filling me in.

Have a good one!

Jones Luk

unread,
Jul 21, 2022, 5:18:07 AM7/21/22
to UniMRCP
Your question is too broad to answer.

Perhaps you can get started by playing around with the test client program located at {unimrcp_source_dir}/platforms/unimrcp-client and {unimrcp_source_dir}/platforms/asr-client, study the source codes and see what's happening between the client and server. Later on you may have to install a free PBX system for testing real voice. Thanks to VoIP so you don't really have to have a landline to play around PBX system.

Good luck.

Michael Levy

unread,
Jul 21, 2022, 9:52:28 AM7/21/22
to UniMRCP
I'll give you some suggestions on how to think about this. This is how I think about these systems, but others may have different views. 
The question I often ask is "what answers the telephone call?". Unimrcp does not answer phone calls. It lets systems that do answer phone calls communicate with media services like speech recognition and speech synthesis.

Phone calls can arrive using different technologies or protocols, but for your purposes a Voice-over-IP call using SIP and RTP is most likely. 

In larger enterprises we often use systems that are dedicated for answering phone calls and running IVR applications. One class of these systems are called VXML voice browsers. They are made by companies like Avaya, Genesys, Cisco and others. As an example, here are some docs from Genesys - https://docs.genesys.com/Documentation/GVP 

There are open source solutions that can do this as well. Here is a great introduction to FreeSwitch which is one of the most popular solutions for handling telephony - https://freeswitch.org/confluence/display/FREESWITCH/Introduction
There is a module for FreeSwitch that lets you integrate with Unimrcp as well - https://freeswitch.org/confluence/display/FREESWITCH/mod_unimrcp

Another popular open source solution is Asterisk, and you can start at https://unimrcp.org/asterisk. See https://www.asterisk.org/

Here is an example of how these can work together:

Telecom/VOIP -----> FreeSwitch -----> Unimrcp ------> ASR server
            SIP/RTP             MRCP           ASR protocol

A phone call arrives using SIP/RTP protocol. It is answered by FreeSwitch.
FreeSwitch uses Mod_unimrcp to communicate with Unimrcp server. (FreeSwitch is the MRCP client)
Unimrcp has a plug in to support your speech recognizer of choice . That plugin uses the ASR server to perform recognition.
Go to the Solutions menu on https://unimrcp.org/ to see the available plugins.


There are many ways to build a solution like the one you've described. I hope this helps.

- Michael

Michael Levy

unread,
Jul 21, 2022, 9:56:51 AM7/21/22
to UniMRCP
One other thought. Is it a requirement that you build your own solution?
You can solve the requirements without deploying your own services or writing much code.
Take a look at https://aws.amazon.com/pm/connect/ or similar cloud contact centers from Twilio, Genesys, Vonage, or others.

On Thursday, July 21, 2022 at 1:27:11 AM UTC-4 o.biti...@gmail.com wrote:

Ömer Bitikçioğlu

unread,
Aug 9, 2022, 8:40:43 AM8/9/22
to uni...@googlegroups.com
Hi,

First of all I am very appreciative of your detailed answers, they really helped me a lot in understanding the subject.

I managed to set up unimrcp server with vosk plugin (https://github.com/alphacep/unimrcp-vosk-plugin/tree/vosk-plugin),
and communicate with it via ./asrclient. It recognizes the audio files and transcribes them.

Now I want to stream microphone input to the server. My tutors said that the client side is not our responsibility.
We just need a client for demonstration purposes. For this reason do you think I should install and configure FreeSwitch
or edit the unimrcp client to be able to use the microphone stream? (I don't know how to achieve that though)

From your experiences, could you tell me which path I should follow?

Thanks a lot again!



--
You received this message because you are subscribed to a topic in the Google Groups "UniMRCP" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/unimrcp/bfT8l5CtkPU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to unimrcp+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unimrcp/8ce63963-80d2-4794-83f8-49c8ce546080n%40googlegroups.com.

Arsen Chaloyan

unread,
Aug 9, 2022, 8:33:48 PM8/9/22
to uni...@googlegroups.com
Hi Omer,

Yes, you may implement a sample client application utilizing the UniMRCP client library and capturing audio from the microphone.

You may also install FreeSWITCH or Asterisk to place calls to and run speech transcription using MRCP.



You received this message because you are subscribed to the Google Groups "UniMRCP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unimrcp/CACnEDQQ2ZOWHO-%3DcFZixyiMzgiMF%2B%3D_M-WbyPn%2BY%3DTba_n55ug%40mail.gmail.com.


--
Arsen Chaloyan
Author of UniMRCP
http://www.unimrcp.org

Ömer Bitikçioğlu

unread,
Sep 2, 2022, 2:18:40 AM9/2/22
to uni...@googlegroups.com
Hi,

I have one other question about UniMRCP.
I can use the below diagram to do continuous speech recognition:  

Zoiper VOIP -----> Asterisk ------> UniMRCP (Vosk plugin)
            SIP/RTP          MRCP          

Vosk engine is in the same machine as UniMRCP. Not in a separate websocket server.
(I will not use Vosk, I have my own ASR engine)
I see that UniMRCP offers many plugins for different speech engine servers and communication is done with WS, gRPC. Why?
Should I also implement a different ASR server for my engine that uses websockets and a UniMRCP plugin to do real-time speech recognition?
Or only implementing a UniMRCP plugin is enough?

Thanks in advance.
Have a nice day.

Arsen Chaloyan

unread,
Sep 2, 2022, 7:42:21 PM9/2/22
to uni...@googlegroups.com
Hi Omer,

It is actually up to you to decide based on your use cases. The system requirements of an ASR engine and the UniMRCP server may differ quite cordially and you may or may not want them to run combined in a single process.

Ömer Bitikçioğlu

unread,
Sep 20, 2022, 4:27:56 AM9/20/22
to uni...@googlegroups.com
Hi again,

Thank you for the response Arsen!

Now I want to do sentiment analysis based on the voice (not text). Should I add a new resource like speechrecog, speechsynth etc.?
Do you have any ideas or experiments on such a scenario?

Arsen Chaloyan

unread,
Sep 24, 2022, 9:49:14 PM9/24/22
to uni...@googlegroups.com
Hi Omer,

> Now I want to do sentiment analysis based on the voice (not tex)

You may introduce a new vendor-specific parameter for your engine which would allow the client application to enable/disable sentiment analysis on RECOGNIZE. No need to add a new resource.


Ömer Bitikçioğlu

unread,
Nov 10, 2022, 6:51:08 AM11/10/22
to UniMRCP
Hi Arsen,

I continue to develop from your asr-client example. 
I just get tons of linking errors when I try to debug asr-client. 

For example:

How can I compile and debug without such errors?
Thanks in advance,
Ömer

Arsen Chaloyan

unread,
Nov 12, 2022, 1:26:37 PM11/12/22
to uni...@googlegroups.com
Hi Omer,

You can build the UniMRCP project using either the GNU make or CMake. I have not tried building it with ninja.

Ömer Bitikçioğlu

unread,
Dec 16, 2022, 6:26:10 AM12/16/22
to UniMRCP
Hi,

I'm using asr-client example in the Unimrcp/platforms. My goal is to stream audio to UniMRCP server and analyze it in the meantime until the client is terminated.

I am using a different thread to open stream and record audio with Port Audio library and in the callbacks I'm using "asr_session_stream_write" function to write recorded buffers to session->media_buffer.
In the meantime, I am calling "asr_session_stream_recognize" function in a infinite loop.

I am having an error like below:

Thread 7 "MPF Scheduler" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffdffff640 (LWP 47454)]
mpf_context_process (context=context@entry=0x7fffd8005a10) at src/mpf_context.c:438
438                object->process(object);
(gdb) bt
#0  mpf_context_process (context=context@entry=0x7fffd8005a10) at src/mpf_context.c:438
#1  0x00007ffff7f89360 in mpf_context_factory_process (factory=0x5555555879e0) at src/mpf_context.c:105
#2  0x00007ffff7f8c30c in timer_thread_proc (thread=0x5555555a9d18, data=0x555555587a40) at src/mpf_scheduler.c:212
#3  0x00007ffff7d30b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#4  0x00007ffff7dc2a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

I can not understand what is the problem. There are not much details. MPF framework's codes are hidden. What is object it is trying to reach?

Thanks in advance.
Ömer

Arsen Chaloyan

unread,
Dec 22, 2022, 8:33:00 PM12/22/22
to uni...@googlegroups.com
Hi Omer,

It is almost impossible to determine the root cause of your problem based on the brief description and the provided backtrace. I can only tell that the object used by the MPF callback is likely destroyed by your application at the time the MPF callback attempts to process it.

--
You received this message because you are subscribed to the Google Groups "UniMRCP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+u...@googlegroups.com.
Message has been deleted

Ömer Bitikçioğlu

unread,
Jan 24, 2023, 5:58:42 AM1/24/23
to uni...@googlegroups.com
Hi Arsen,

I managed to stream audio from the mic and recognize it using MRCP.

But it is right now giving the "START-OF-INPUT" message once, recording all the audio streamed into one utter file.
Is it normal, or should it be giving "START-OF-INPUT" for every detected utterance and recording it separately?
What would be causing this issue if it is not normal?

Apologize if I asked too many questions.

Thank you. Regards
Ömer

Arsen Chaloyan

unread,
Jan 26, 2023, 6:50:28 PM1/26/23
to uni...@googlegroups.com
Hi Omer,

For each RECOGNIZE request, there must be one START-OF-INPUT event sent back to the client as soon as the start of input is detected and one RECOGNITION-COMPLETE event whenever the input is finally complete and recognition is concluded.

--
You received this message because you are subscribed to the Google Groups "UniMRCP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages