Speech timeout not being respected when using GSR plugin

68 views
Skip to first unread message

mayamatakeshi

unread,
May 27, 2022, 3:43:01 AM5/27/22
to uni...@googlegroups.com
I am facing a situation where unimrcp sends
START-OF-INPUT and RECOGNITION-COMPLETE 
one immediately after the other and the speech timeout is not respected.
Here are the MRCP messages:

2022-05-26 11:14:04:689938 [INFO]   Receive MRCPv2 Data 192.168.88.221:1544 <-> 192.168.88.38:48003 [447 bytes]
MRCP/2.0 447 RECOGNIZE 1
Channel-Identifier: 5e8ed690a36e4d1f@speechrecog
Content-Type: text/uri-list
Cancel-If-Queue: false
Recognition-Timeout: 5000
Start-Input-Timers: false
Speech-Language: ja-JP
No-Input-Timeout: 30000
Speech-Incomplete-Timeout: 5000
Speech-Complete-Timeout: 5000
Vendor-Specific-Parameters: reference-uuid=6112fa77-baa9-4697-9a6e-6ea35d484d26;profanity-filter=false
Content-Length: 25

builtin:speech/transcribe


2022-05-26 11:14:04:691039 [INFO]   Send MRCPv2 Data 192.168.88.221:1544 <-> 192.168.88.38:48003 [83 bytes]
MRCP/2.0 83 1 200 IN-PROGRESS
Channel-Identifier: 5e8ed690a36e4d1f@speechrecog


2022-05-26 11:14:09:970517 [INFO]   Send MRCPv2 Data 192.168.88.221:1544 <-> 192.168.88.38:48003 [115 bytes]
MRCP/2.0 115 START-OF-INPUT 1 IN-PROGRESS
Channel-Identifier: 5e8ed690a36e4d1f@speechrecog
Input-Type: speech


2022-05-26 11:14:09:970606 [DEBUG]  Process Message [MRCPv2-Agent-1] [0x7f6c08112390;1;0]
2022-05-26 11:14:09:970623 [INFO]   Send MRCPv2 Data 192.168.88.221:1544 <-> 192.168.88.38:48003 [130 bytes]
MRCP/2.0 130 RECOGNITION-COMPLETE 1 COMPLETE
Channel-Identifier: 5e8ed690a36e4d1f@speechrecog
Completion-Cause: 001 no-match


This is a production server and I got some reports of if happening occasionally from our customers.
I am trying to reproduce this in dev env but no success so far after dozens of attempts.

I am using old
unimrcp-gsr 1:1.7.6-bionic.
I checked the release notes for subsequent versions for possible bug fixes related to this but there were none so I assume this issue exists with latest GSR plugin version too (but I will upgrade anyway due to other fixes).

Details:
  - the caller isn't speaking anything (so START-OF-INPUT is actually incorrect)
  - it always happens about 5 seconds after the MRCP RECOGNIZE message. So there might be some race condition with the 5000ms timers I am using (but I am using 'Start-Input-Timers: false' and START-INPUT-TIMERS so this is unexpected as no timers should have been actually started).
  - the prompt being played during RECOGNIZE lasts 7 seconds so we don't see START-INPUT-TIMERS because the problem always happens before it ends (after about 5 seconds)


The full unimrcp logs are below:

2022-05-26 11:14:04:689938 [INFO]   Receive MRCPv2 Data 192.168.88.221:1544 <-> 192.168.88.38:48003 [447 bytes]
MRCP/2.0 447 RECOGNIZE 1
Channel-Identifier: 5e8ed690a36e4d1f@speechrecog
Content-Type: text/uri-list
Cancel-If-Queue: false
Recognition-Timeout: 5000
Start-Input-Timers: false
Speech-Language: ja-JP
No-Input-Timeout: 30000
Speech-Incomplete-Timeout: 5000
Speech-Complete-Timeout: 5000
Vendor-Specific-Parameters: reference-uuid=6112fa77-baa9-4697-9a6e-6ea35d484d26;profanity-filter=false
Content-Length: 25

builtin:speech/transcribe
2022-05-26 11:14:04:690005 [INFO]   Assign Control Channel <5e8ed690a36e4d1f@speechrecog> to Connection 192.168.88.221:1544 <-> 192.168.88.38:48003 [0] -> [1]
2022-05-26 11:14:04:690021 [DEBUG]  Signal Message to [MRCP Server] [0x7f6c20000f70;2;3]
2022-05-26 11:14:04:690040 [DEBUG]  Wait for Messages [MRCPv2-Agent-1] timeout [600000]
2022-05-26 11:14:04:690062 [DEBUG]  Process Message [MRCP Server] [0x7f6c20000f70;2;3]
2022-05-26 11:14:04:690076 [DEBUG]  Dispatch Signaling Message [1]
2022-05-26 11:14:04:690086 [INFO]   Process RECOGNIZE Request <5e8ed690a36e4d1f@speechrecog> [1]
2022-05-26 11:14:04:690124 [DEBUG]  Wait for Messages [MRCP Server]
2022-05-26 11:14:04:690182 [WARN]   Unknown Parameter [reference-uuid] <5e8ed690a36e4d1f@gsr>
2022-05-26 11:14:04:690217 [INFO]   Init Speech Detector: frame-size=160, max-frame-count=550, output-frame-count=20, vad-mode=2, noinput-timeout=30000 ms, input-timeout=5000 ms, start-timeout=50 ms, complete-timeout=5000 ms, incomplete-timeout=5000 ms, leading-silence=300 ms, trailing-silence=300 ms, interim-results=1, start-of-input=external <5e8ed690a36e4d1f>
2022-05-26 11:14:04:690269 [INFO]   Create gRPC Channel [speech.googleapis.com:443] <5e8ed690a36e4d1f@gsr>
2022-05-26 11:14:04:690489 [INFO]   gRPC Streaming Recognize <5e8ed690a36e4d1f@gsr>
2022-05-26 11:14:04:690864 [DEBUG]  Signal Message to [MRCP Server] [0x7f6bfc000b60;3;4]
2022-05-26 11:14:04:690921 [DEBUG]  Process Message [MRCP Server] [0x7f6bfc000b60;3;4]
2022-05-26 11:14:04:690944 [INFO]   Process RECOGNIZE Response <5e8ed690a36e4d1f@speechrecog> [1]
2022-05-26 11:14:04:690956 [INFO]   State Transition IDLE -> RECOGNIZING <5e8ed690a36e4d1f@speechrecog>
2022-05-26 11:14:04:690969 [DEBUG]  Signal Message to [MRCPv2-Agent-1] [0x7f6c2c030c80;1;0]
2022-05-26 11:14:04:690988 [DEBUG]  Wait for Messages [MRCP Server]
2022-05-26 11:14:04:691010 [DEBUG]  Process Poller Wakeup [MRCPv2-Agent-1]
2022-05-26 11:14:04:691022 [DEBUG]  Process Message [MRCPv2-Agent-1] [0x7f6c2c030c80;1;0]
2022-05-26 11:14:04:691039 [INFO]   Send MRCPv2 Data 192.168.88.221:1544 <-> 192.168.88.38:48003 [83 bytes]
MRCP/2.0 83 1 200 IN-PROGRESS
Channel-Identifier: 5e8ed690a36e4d1f@speechrecog


2022-05-26 11:14:04:691084 [DEBUG]  Wait for Messages [MRCPv2-Agent-1] timeout [599999]
2022-05-26 11:14:04:825500 [DEBUG]  Start Detector State Probation NO-INPUT -> IN-PROGRESS [140 ms] <5e8ed690a36e4d1f>
2022-05-26 11:14:04:825547 [DEBUG]  Detector: read-marker=0, input-start-marker=6, write-marker=7 <5e8ed690a36e4d1f>
2022-05-26 11:14:04:875521 [INFO]   Speech Detector State Transition NO-INPUT -> IN-PROGRESS [190 ms] <5e8ed690a36e4d1f>
2022-05-26 11:14:04:875558 [INFO]   Start Input Timer [5000 ms] <5e8ed690a36e4d1f>
2022-05-26 11:14:04:875570 [DEBUG]  Set Speech Complete Timeout [5000 ms] <5e8ed690a36e4d1f>
2022-05-26 11:14:04:875582 [DEBUG]  Detector Silence Output: read-marker=0, count=24, write-marker=12 <5e8ed690a36e4d1f>
2022-05-26 11:14:04:895498 [DEBUG]  Start Detector State Probation IN-PROGRESS -> COMPLETE [20 ms] <5e8ed690a36e4d1f>
2022-05-26 11:14:04:895533 [DEBUG]  Detector: read-marker=0, input-end-marker=13, write-marker=14 <5e8ed690a36e4d1f>
2022-05-26 11:14:04:955525 [DEBUG]  Detector Speech Output: read-marker=0, count=20, write-marker=20 <5e8ed690a36e4d1f>
2022-05-26 11:14:05:155531 [DEBUG]  Detector Speech Output: read-marker=20, count=20, write-marker=40 <5e8ed690a36e4d1f>
2022-05-26 11:14:05:355509 [DEBUG]  Detector Speech Output: read-marker=40, count=20, write-marker=60 <5e8ed690a36e4d1f>
2022-05-26 11:14:05:555495 [DEBUG]  Detector Speech Output: read-marker=60, count=20, write-marker=80 <5e8ed690a36e4d1f>
2022-05-26 11:14:05:755480 [DEBUG]  Detector Speech Output: read-marker=80, count=20, write-marker=100 <5e8ed690a36e4d1f>
2022-05-26 11:14:05:955503 [DEBUG]  Detector Speech Output: read-marker=100, count=20, write-marker=120 <5e8ed690a36e4d1f>
2022-05-26 11:14:06:155503 [DEBUG]  Detector Speech Output: read-marker=120, count=20, write-marker=140 <5e8ed690a36e4d1f>
2022-05-26 11:14:06:355508 [DEBUG]  Detector Speech Output: read-marker=140, count=20, write-marker=160 <5e8ed690a36e4d1f>
2022-05-26 11:14:06:555497 [DEBUG]  Detector Speech Output: read-marker=160, count=20, write-marker=180 <5e8ed690a36e4d1f>
2022-05-26 11:14:06:755483 [DEBUG]  Detector Speech Output: read-marker=180, count=20, write-marker=200 <5e8ed690a36e4d1f>
2022-05-26 11:14:06:955528 [DEBUG]  Detector Speech Output: read-marker=200, count=20, write-marker=220 <5e8ed690a36e4d1f>
2022-05-26 11:14:07:155530 [DEBUG]  Detector Speech Output: read-marker=220, count=20, write-marker=240 <5e8ed690a36e4d1f>
2022-05-26 11:14:07:355513 [DEBUG]  Detector Speech Output: read-marker=240, count=20, write-marker=260 <5e8ed690a36e4d1f>
2022-05-26 11:14:07:555497 [DEBUG]  Detector Speech Output: read-marker=260, count=20, write-marker=280 <5e8ed690a36e4d1f>
2022-05-26 11:14:07:755480 [DEBUG]  Detector Speech Output: read-marker=280, count=20, write-marker=300 <5e8ed690a36e4d1f>
2022-05-26 11:14:07:955528 [DEBUG]  Detector Speech Output: read-marker=300, count=20, write-marker=320 <5e8ed690a36e4d1f>
2022-05-26 11:14:08:155505 [DEBUG]  Detector Speech Output: read-marker=320, count=20, write-marker=340 <5e8ed690a36e4d1f>
2022-05-26 11:14:08:355507 [DEBUG]  Detector Speech Output: read-marker=340, count=20, write-marker=360 <5e8ed690a36e4d1f>
2022-05-26 11:14:08:555498 [DEBUG]  Detector Speech Output: read-marker=360, count=20, write-marker=380 <5e8ed690a36e4d1f>
2022-05-26 11:14:08:755480 [DEBUG]  Detector Speech Output: read-marker=380, count=20, write-marker=400 <5e8ed690a36e4d1f>
2022-05-26 11:14:08:955499 [DEBUG]  Detector Speech Output: read-marker=400, count=20, write-marker=420 <5e8ed690a36e4d1f>
2022-05-26 11:14:09:155522 [DEBUG]  Detector Speech Output: read-marker=420, count=20, write-marker=440 <5e8ed690a36e4d1f>
2022-05-26 11:14:09:355497 [DEBUG]  Detector Speech Output: read-marker=440, count=20, write-marker=460 <5e8ed690a36e4d1f>
2022-05-26 11:14:09:555496 [DEBUG]  Detector Speech Output: read-marker=460, count=20, write-marker=480 <5e8ed690a36e4d1f>
2022-05-26 11:14:09:755484 [DEBUG]  Detector Speech Output: read-marker=480, count=20, write-marker=500 <5e8ed690a36e4d1f>
2022-05-26 11:14:09:895508 [INFO]   Speech Detector State Transition IN-PROGRESS -> COMPLETE [5020 ms] <5e8ed690a36e4d1f>
2022-05-26 11:14:09:895550 [INFO]   Detector Stats: leading-silence=60 ms, input=70 ms, trailing-silence=5000 ms <5e8ed690a36e4d1f>
2022-05-26 11:14:09:895565 [DEBUG]  Detector Speech Output: read-marker=500, count=13, write-marker=514 <5e8ed690a36e4d1f>
2022-05-26 11:14:09:895792 [INFO]   Input Complete [success] size=85920 bytes, dur=5370 ms <5e8ed690a36e4d1f@gsr>
2022-05-26 11:14:09:970063 [INFO]   Received Response: status [0] type [unspecified] result-count [0] <5e8ed690a36e4d1f@gsr>
2022-05-26 11:14:09:970143 [DEBUG]  Signal Message to [MRCP Server] [0x7f6bfc002ce0;3;4]
2022-05-26 11:14:09:970194 [DEBUG]  Signal Message to [MRCP Server] [0x7f6bfc002ca0;3;4]
2022-05-26 11:14:09:970244 [DEBUG]  Process Message [MRCP Server] [0x7f6bfc002ce0;3;4]
2022-05-26 11:14:09:970288 [INFO]   Process START-OF-INPUT Event <5e8ed690a36e4d1f@speechrecog> [1]
2022-05-26 11:14:09:970306 [DEBUG]  Signal Message to [MRCPv2-Agent-1] [0x7f6c08110b90;1;0]
2022-05-26 11:14:09:970334 [DEBUG]  Wait for Messages [MRCP Server]
2022-05-26 11:14:09:970390 [DEBUG]  Process Message [MRCP Server] [0x7f6bfc002ca0;3;4]
2022-05-26 11:14:09:970407 [INFO]   Process RECOGNITION-COMPLETE Event <5e8ed690a36e4d1f@speechrecog> [1]
2022-05-26 11:14:09:970421 [INFO]   State Transition RECOGNIZING -> RECOGNIZED <5e8ed690a36e4d1f@speechrecog>
2022-05-26 11:14:09:970434 [DEBUG]  Signal Message to [MRCPv2-Agent-1] [0x7f6c08112390;1;0]
2022-05-26 11:14:09:970451 [DEBUG]  Wait for Messages [MRCP Server]
2022-05-26 11:14:09:970481 [DEBUG]  Process Poller Wakeup [MRCPv2-Agent-1]
2022-05-26 11:14:09:970495 [DEBUG]  Process Message [MRCPv2-Agent-1] [0x7f6c08110b90;1;0]
2022-05-26 11:14:09:970517 [INFO]   Send MRCPv2 Data 192.168.88.221:1544 <-> 192.168.88.38:48003 [115 bytes]
MRCP/2.0 115 START-OF-INPUT 1 IN-PROGRESS
Channel-Identifier: 5e8ed690a36e4d1f@speechrecog
Input-Type: speech


2022-05-26 11:14:09:970606 [DEBUG]  Process Message [MRCPv2-Agent-1] [0x7f6c08112390;1;0]
2022-05-26 11:14:09:970623 [INFO]   Send MRCPv2 Data 192.168.88.221:1544 <-> 192.168.88.38:48003 [130 bytes]
MRCP/2.0 130 RECOGNITION-COMPLETE 1 COMPLETE
Channel-Identifier: 5e8ed690a36e4d1f@speechrecog
Completion-Cause: 001 no-match

Arsen Chaloyan

unread,
May 28, 2022, 1:52:30 PM5/28/22
to UniMRCP
Hi Mayama,

- the caller isn't speaking anything (so START-OF-INPUT is actually incorrect)

This is indeed a fake START-OF-INPUT event sent before responding with RECOGNITION-COMPLETE to address an interop with a certain platform. So, this event should be harmless and is not relevant to the problem you encountered.

- it always happens about 5 seconds after the MRCP RECOGNIZE message. So there might be some race condition with the 5000ms timers I am using (but I am using 'Start-Input-Timers: false' and START-INPUT-TIMERS so this is unexpected as no timers should have been actually started).

You should consider using a shorter speech-complete timeout with a longer speech-incomplete timeout. The default values of the two are 1 and 15 seconds correspondingly. The latter is causing the problem and should be basically avoided.

- the prompt being played during RECOGNIZE lasts 7 seconds so we don't see START-INPUT-TIMERS because the problem always happens before it ends (after about 5 seconds)

That is right. Please note that the input timers apply to noinput and recognition timeouts and do not affect the above behavior.


--
You received this message because you are subscribed to the Google Groups "UniMRCP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unimrcp/CABaNFCbg7tvibkUcbBcxiPmhsnQRb_wHDs%3DAW4%2BXzbwQNHwcKw%40mail.gmail.com.


--
Arsen Chaloyan
Author of UniMRCP
http://www.unimrcp.org

mayamatakeshi

unread,
May 29, 2022, 12:52:14 AM5/29/22
to uni...@googlegroups.com
Hi Arsen,
Understood.
I'll your suggestions.
Thanks.

Reply all
Reply to author
Forward
0 new messages