Using MRCP TTS and ASR resources together in a single application

155 views
Skip to first unread message

Iman Saleh

unread,
Nov 11, 2010, 6:02:41 AM11/11/10
to uni...@googlegroups.com
Hi,

I integrated an ASR and a TTS in unimrcp project. I want to use both resources in a single VXML application. Is it possible to use both resources together? When I try using both of them the ASR seems to interrupt TTS. Is there something specific I should take care of when using both of them in a single session?

Thanks,

Iman

Randy Olson

unread,
Nov 12, 2010, 10:54:54 AM11/12/10
to UniMRCP

That is fairly typical behavior for an IVR. One typically wants the
prompt interrupted once the person starts talking; otherwise, it is
very uncomfortable for the speaker to have to try and talk over the
IVR. If you look at two people talking to each other that is the
typical behavior. One person talks at a time. If the second person
starts to answer before the first person is done, the first person
stops talking and listens.

If you don't want that behavior, the solution will be different
depending on your voice platform and your Speech and TTS server(s).

Iman Saleh

unread,
Dec 1, 2010, 4:13:41 AM12/1/10
to uni...@googlegroups.com
Hi,

Let me explain more. When I start running a VXML application, both recognizer and synthesizer start. The recognizer sends DEFINE-GRAMMAR request and RECOGNIZE requests, and the synthesizer plays speech. However, the speech played by the TTS is sometimes detected by the recognizer as speech activity, and hence a START-OF-SPEECH event is sent to the client. The client responds with STOP request that stops synthesizer. I want to prevent recognizer from detecting synthesizer speech as speech activity. How can I do this?

--
You received this message because you are subscribed to the Google Groups "UniMRCP" group.
To post to this group, send email to uni...@googlegroups.com.
To unsubscribe from this group, send email to unimrcp+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/unimrcp?hl=en.




--
Iman Saleh
R&D Software Developer
RDI, www.rdi-eg.com

Christopher Rienzo

unread,
Dec 1, 2010, 10:03:16 AM12/1/10
to uni...@googlegroups.com
That sounds more like an echo problem than a UniMRCP problem.  How else would the recognizer hear synthesizer speech?

Iman Saleh

unread,
Dec 2, 2010, 4:07:09 AM12/2/10
to uni...@googlegroups.com
Actually there is an if condition in demo_recog_stream_write method in demorecog plugin that is a little bit confusing:

if(recog_channel->recog_request)
{
          mpf_detector_event_e det_event = mpf_activity_detector_process(recog_channel->detector,frame);
          ....
}

This means that detecting speech should start once RECOGNIZE request is sent. I think detecting speech should start once RECOGNITION_START_TIMERS are set to true, so would it be valid if I changed the condition as follows:

if(recog_channel->recog_request && recog_channel->timers_started == TRUE)
{
          mpf_detector_event_e det_event = mpf_activity_detector_process(recog_channel->detector,frame);
          ....
}

I understand from MRCP reference document that when I allow both synthesizer and recognizer to work together, recognition should start only if start timers are started. Does this make sense?

Arsen Chaloyan

unread,
Dec 16, 2010, 8:36:14 PM12/16/10
to uni...@googlegroups.com
Briefly following up the discussion, I think all the statements are
true. Iman probably was using speakerphone that's why synthesized
speech was echoed back to the recognizer. And yes, actual recognition
should start when input timers are started.

--
Arsen Chaloyan
The author of UniMRCP
http://www.unimrcp.org

Reply all
Reply to author
Forward
0 new messages