Unimrcp Lex/Polly barge-in - delays in getting the TTS output to stop on barge-in

46 views

Skip to first unread message

Daniel Ng

unread,

Jun 23, 2022, 10:32:07 AM6/23/22

to UniMRCP

Hi Arsen,

Regarding the use of barge-in with the Unimrcp Lex and Polly plugins, there seems to be an awful delay (in terms of the TTS stop speaking) which from a caller's experience, feels like barge-in didn't work?

So my setup is using Cisco VVB as the VXMLgateway/browser configured with two Unimrcp servers (one for TTS and one for ASR) communicating via a proxy.

I make a call into the system and (deliberately configured with a long TTS phrase such as "Welcome to the yyy service for zzz. How can I help you? You can choose from ....") and upon hearing "Welcome to ..." I interrupt the system with some speech.

The first few times I did it, I repeated myself several times because the TTS continued to "speak" the rest of the prompt (through to the options). It was only after I checked the log files and audio files that I decided to be more patient and just say my utterance once.

As per the earlier calls, the TTS continued but did stop at some point before the end of the full tts phrase. So barge-in does in effect work but with further testing, I realised that the point (for where) TTS stops "speaking" is variable (my guess is that it is dependent on the Unimrcp Lex servers to send the MRCP Start-of-Speech notification? back to the VVB so that it can then send the MRCP STOP event back to the TTS?).

Is this due to the VAD or plugin implementations or are there any settings or timers that we should double check too?

I'm struggling to piece together the logs from the separate systems but will see if I can find a way to get a network trace (at the VVB end) to show within a single capture the messaging and RTP flow between the VXMLgateway and the two media resources. So far wireshark/pcap traces on the ASR or TTS servers only show the communications between the VXMLgateway and that server.

Thank you.

Kind regards,

Daniel

Arsen Chaloyan

unread,

Jul 16, 2022, 1:48:16 PM7/16/22

to UniMRCP

Hi Daniel,

The Lex plugin sends a START-OF-INPUT event back to the MRCP client, VVB in your case, as soon as a corresponding event is received from the Lex V2 API. We had an extended conversation on this or a related subject in the past. Do you start a conversation by eliciting an intent

https://docs.unispeech.io/en/ums/aws/lex/usage#h-45-eliciting-intent

Otherwise, if you play the initial prompt without making LexV2 API aware of it, the LexV2 API may not deliver the event serving as a barge-in. And this is where the huge delay could come from.

Complete logs would be required to identify the source of the problem.

--
You received this message because you are subscribed to the Google Groups "UniMRCP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unimrcp/c6604e9a-08b1-465f-9a0a-cadec515dbfan%40googlegroups.com.