Hi Arsen,
Regarding the use of barge-in with the Unimrcp Lex and Polly plugins, there seems to be an awful delay (in terms of the TTS stop speaking) which from a caller's experience, feels like barge-in didn't work?
So my setup is using Cisco VVB as the VXMLgateway/browser configured with two Unimrcp servers (one for TTS and one for ASR) communicating via a proxy.
I make a call into the system and (deliberately configured with a long TTS phrase such as "Welcome to the yyy service for zzz. How can I help you? You can choose from ....") and upon hearing "Welcome to ..." I interrupt the system with some speech.
The first few times I did it, I repeated myself several times because the TTS continued to "speak" the rest of the prompt (through to the options). It was only after I checked the log files and audio files that I decided to be more patient and just say my utterance once.
As per the earlier calls, the TTS continued but did stop at some point before the end of the full tts phrase. So barge-in does in effect work but with further testing, I realised that the point (for where) TTS stops "speaking" is variable (my guess is that it is dependent on the Unimrcp Lex servers to send the MRCP Start-of-Speech notification? back to the VVB so that it can then send the MRCP STOP event back to the TTS?).
Is this due to the VAD or plugin implementations or are there any settings or timers that we should double check too?
I'm struggling to piece together the logs from the separate systems but will see if I can find a way to get a network trace (at the VVB end) to show within a single capture the messaging and RTP flow between the VXMLgateway and the two media resources. So far wireshark/pcap traces on the ASR or TTS servers only show the communications between the VXMLgateway and that server.
Thank you.
Kind regards,
Daniel