one immediately after the other and the speech timeout is not respected.
2022-05-26 11:14:04:689938 [INFO] Receive MRCPv2 Data 192.168.88.221:1544 <-> 192.168.88.38:48003 [447 bytes]
MRCP/2.0 447 RECOGNIZE 1
Channel-Identifier: 5e8ed690a36e4d1f@speechrecog
Content-Type: text/uri-list
Cancel-If-Queue: false
Recognition-Timeout: 5000
Start-Input-Timers: false
Speech-Language: ja-JP
No-Input-Timeout: 30000
Speech-Incomplete-Timeout: 5000
Speech-Complete-Timeout: 5000
Vendor-Specific-Parameters: reference-uuid=6112fa77-baa9-4697-9a6e-6ea35d484d26;profanity-filter=false
Content-Length: 25
builtin:speech/transcribe
2022-05-26 11:14:04:691039 [INFO] Send MRCPv2 Data 192.168.88.221:1544 <-> 192.168.88.38:48003 [83 bytes]
MRCP/2.0 83 1 200 IN-PROGRESS
Channel-Identifier: 5e8ed690a36e4d1f@speechrecog
2022-05-26 11:14:09:970517 [INFO] Send MRCPv2 Data 192.168.88.221:1544 <-> 192.168.88.38:48003 [115 bytes]
MRCP/2.0 115 START-OF-INPUT 1 IN-PROGRESS
Channel-Identifier: 5e8ed690a36e4d1f@speechrecog
Input-Type: speech
2022-05-26 11:14:09:970606 [DEBUG] Process Message [MRCPv2-Agent-1] [0x7f6c08112390;1;0]
2022-05-26 11:14:09:970623 [INFO] Send MRCPv2 Data 192.168.88.221:1544 <-> 192.168.88.38:48003 [130 bytes]
MRCP/2.0 130 RECOGNITION-COMPLETE 1 COMPLETE
Channel-Identifier: 5e8ed690a36e4d1f@speechrecog
Completion-Cause: 001 no-match
This is a production server and I got some reports of if happening occasionally from our customers.
I am trying to reproduce this in dev env but no success so far after dozens of attempts.
I am using old
unimrcp-gsr 1:1.7.6-bionic.
I checked the release notes for subsequent versions for possible bug fixes related to this but there were none so I assume this issue exists with latest GSR plugin version too (but I will upgrade anyway due to other fixes).
Details:
- the caller isn't speaking anything (so START-OF-INPUT is actually incorrect)
- it always happens about 5 seconds after the MRCP RECOGNIZE message. So there might be some race condition with the 5000ms timers I am using (but I am using 'Start-Input-Timers: false' and START-INPUT-TIMERS so this is unexpected as no timers should have been actually started).
- the prompt being played during RECOGNIZE lasts 7 seconds so we don't see START-INPUT-TIMERS because the problem always happens before it ends (after about 5 seconds)
The full unimrcp logs are below:
2022-05-26 11:14:04:689938 [INFO] Receive MRCPv2 Data 192.168.88.221:1544 <-> 192.168.88.38:48003 [447 bytes]
MRCP/2.0 447 RECOGNIZE 1
Channel-Identifier: 5e8ed690a36e4d1f@speechrecog
Content-Type: text/uri-list
Cancel-If-Queue: false
Recognition-Timeout: 5000
Start-Input-Timers: false
Speech-Language: ja-JP
No-Input-Timeout: 30000
Speech-Incomplete-Timeout: 5000
Speech-Complete-Timeout: 5000
Vendor-Specific-Parameters: reference-uuid=6112fa77-baa9-4697-9a6e-6ea35d484d26;profanity-filter=false
Content-Length: 25
builtin:speech/transcribe
2022-05-26 11:14:04:690005 [INFO] Assign Control Channel <5e8ed690a36e4d1f@speechrecog> to Connection 192.168.88.221:1544 <-> 192.168.88.38:48003 [0] -> [1]
2022-05-26 11:14:04:690021 [DEBUG] Signal Message to [MRCP Server] [0x7f6c20000f70;2;3]
2022-05-26 11:14:04:690040 [DEBUG] Wait for Messages [MRCPv2-Agent-1] timeout [600000]
2022-05-26 11:14:04:690062 [DEBUG] Process Message [MRCP Server] [0x7f6c20000f70;2;3]
2022-05-26 11:14:04:690076 [DEBUG] Dispatch Signaling Message [1]
2022-05-26 11:14:04:690086 [INFO] Process RECOGNIZE Request <5e8ed690a36e4d1f@speechrecog> [1]
2022-05-26 11:14:04:690124 [DEBUG] Wait for Messages [MRCP Server]
2022-05-26 11:14:04:690182 [WARN] Unknown Parameter [reference-uuid] <5e8ed690a36e4d1f@gsr>
2022-05-26 11:14:04:690217 [INFO] Init Speech Detector: frame-size=160, max-frame-count=550, output-frame-count=20, vad-mode=2, noinput-timeout=30000 ms, input-timeout=5000 ms, start-timeout=50 ms, complete-timeout=5000 ms, incomplete-timeout=5000 ms, leading-silence=300 ms, trailing-silence=300 ms, interim-results=1, start-of-input=external <5e8ed690a36e4d1f>
2022-05-26 11:14:04:690269 [INFO] Create gRPC Channel [speech.googleapis.com:443] <5e8ed690a36e4d1f@gsr>
2022-05-26 11:14:04:690489 [INFO] gRPC Streaming Recognize <5e8ed690a36e4d1f@gsr>
2022-05-26 11:14:04:690864 [DEBUG] Signal Message to [MRCP Server] [0x7f6bfc000b60;3;4]
2022-05-26 11:14:04:690921 [DEBUG] Process Message [MRCP Server] [0x7f6bfc000b60;3;4]
2022-05-26 11:14:04:690944 [INFO] Process RECOGNIZE Response <5e8ed690a36e4d1f@speechrecog> [1]
2022-05-26 11:14:04:690956 [INFO] State Transition IDLE -> RECOGNIZING <5e8ed690a36e4d1f@speechrecog>
2022-05-26 11:14:04:690969 [DEBUG] Signal Message to [MRCPv2-Agent-1] [0x7f6c2c030c80;1;0]
2022-05-26 11:14:04:690988 [DEBUG] Wait for Messages [MRCP Server]
2022-05-26 11:14:04:691010 [DEBUG] Process Poller Wakeup [MRCPv2-Agent-1]
2022-05-26 11:14:04:691022 [DEBUG] Process Message [MRCPv2-Agent-1] [0x7f6c2c030c80;1;0]
2022-05-26 11:14:04:691039 [INFO] Send MRCPv2 Data 192.168.88.221:1544 <-> 192.168.88.38:48003 [83 bytes]
MRCP/2.0 83 1 200 IN-PROGRESS
Channel-Identifier: 5e8ed690a36e4d1f@speechrecog
2022-05-26 11:14:04:691084 [DEBUG] Wait for Messages [MRCPv2-Agent-1] timeout [599999]
2022-05-26 11:14:04:825500 [DEBUG] Start Detector State Probation NO-INPUT -> IN-PROGRESS [140 ms] <5e8ed690a36e4d1f>
2022-05-26 11:14:04:825547 [DEBUG] Detector: read-marker=0, input-start-marker=6, write-marker=7 <5e8ed690a36e4d1f>
2022-05-26 11:14:04:875521 [INFO] Speech Detector State Transition NO-INPUT -> IN-PROGRESS [190 ms] <5e8ed690a36e4d1f>
2022-05-26 11:14:04:875558 [INFO] Start Input Timer [5000 ms] <5e8ed690a36e4d1f>
2022-05-26 11:14:04:875570 [DEBUG] Set Speech Complete Timeout [5000 ms] <5e8ed690a36e4d1f>
2022-05-26 11:14:04:875582 [DEBUG] Detector Silence Output: read-marker=0, count=24, write-marker=12 <5e8ed690a36e4d1f>
2022-05-26 11:14:04:895498 [DEBUG] Start Detector State Probation IN-PROGRESS -> COMPLETE [20 ms] <5e8ed690a36e4d1f>
2022-05-26 11:14:04:895533 [DEBUG] Detector: read-marker=0, input-end-marker=13, write-marker=14 <5e8ed690a36e4d1f>
2022-05-26 11:14:04:955525 [DEBUG] Detector Speech Output: read-marker=0, count=20, write-marker=20 <5e8ed690a36e4d1f>
2022-05-26 11:14:05:155531 [DEBUG] Detector Speech Output: read-marker=20, count=20, write-marker=40 <5e8ed690a36e4d1f>
2022-05-26 11:14:05:355509 [DEBUG] Detector Speech Output: read-marker=40, count=20, write-marker=60 <5e8ed690a36e4d1f>
2022-05-26 11:14:05:555495 [DEBUG] Detector Speech Output: read-marker=60, count=20, write-marker=80 <5e8ed690a36e4d1f>
2022-05-26 11:14:05:755480 [DEBUG] Detector Speech Output: read-marker=80, count=20, write-marker=100 <5e8ed690a36e4d1f>
2022-05-26 11:14:05:955503 [DEBUG] Detector Speech Output: read-marker=100, count=20, write-marker=120 <5e8ed690a36e4d1f>
2022-05-26 11:14:06:155503 [DEBUG] Detector Speech Output: read-marker=120, count=20, write-marker=140 <5e8ed690a36e4d1f>
2022-05-26 11:14:06:355508 [DEBUG] Detector Speech Output: read-marker=140, count=20, write-marker=160 <5e8ed690a36e4d1f>
2022-05-26 11:14:06:555497 [DEBUG] Detector Speech Output: read-marker=160, count=20, write-marker=180 <5e8ed690a36e4d1f>
2022-05-26 11:14:06:755483 [DEBUG] Detector Speech Output: read-marker=180, count=20, write-marker=200 <5e8ed690a36e4d1f>
2022-05-26 11:14:06:955528 [DEBUG] Detector Speech Output: read-marker=200, count=20, write-marker=220 <5e8ed690a36e4d1f>
2022-05-26 11:14:07:155530 [DEBUG] Detector Speech Output: read-marker=220, count=20, write-marker=240 <5e8ed690a36e4d1f>
2022-05-26 11:14:07:355513 [DEBUG] Detector Speech Output: read-marker=240, count=20, write-marker=260 <5e8ed690a36e4d1f>
2022-05-26 11:14:07:555497 [DEBUG] Detector Speech Output: read-marker=260, count=20, write-marker=280 <5e8ed690a36e4d1f>
2022-05-26 11:14:07:755480 [DEBUG] Detector Speech Output: read-marker=280, count=20, write-marker=300 <5e8ed690a36e4d1f>
2022-05-26 11:14:07:955528 [DEBUG] Detector Speech Output: read-marker=300, count=20, write-marker=320 <5e8ed690a36e4d1f>
2022-05-26 11:14:08:155505 [DEBUG] Detector Speech Output: read-marker=320, count=20, write-marker=340 <5e8ed690a36e4d1f>
2022-05-26 11:14:08:355507 [DEBUG] Detector Speech Output: read-marker=340, count=20, write-marker=360 <5e8ed690a36e4d1f>
2022-05-26 11:14:08:555498 [DEBUG] Detector Speech Output: read-marker=360, count=20, write-marker=380 <5e8ed690a36e4d1f>
2022-05-26 11:14:08:755480 [DEBUG] Detector Speech Output: read-marker=380, count=20, write-marker=400 <5e8ed690a36e4d1f>
2022-05-26 11:14:08:955499 [DEBUG] Detector Speech Output: read-marker=400, count=20, write-marker=420 <5e8ed690a36e4d1f>
2022-05-26 11:14:09:155522 [DEBUG] Detector Speech Output: read-marker=420, count=20, write-marker=440 <5e8ed690a36e4d1f>
2022-05-26 11:14:09:355497 [DEBUG] Detector Speech Output: read-marker=440, count=20, write-marker=460 <5e8ed690a36e4d1f>
2022-05-26 11:14:09:555496 [DEBUG] Detector Speech Output: read-marker=460, count=20, write-marker=480 <5e8ed690a36e4d1f>
2022-05-26 11:14:09:755484 [DEBUG] Detector Speech Output: read-marker=480, count=20, write-marker=500 <5e8ed690a36e4d1f>
2022-05-26 11:14:09:895508 [INFO] Speech Detector State Transition IN-PROGRESS -> COMPLETE [5020 ms] <5e8ed690a36e4d1f>
2022-05-26 11:14:09:895550 [INFO] Detector Stats: leading-silence=60 ms, input=70 ms, trailing-silence=5000 ms <5e8ed690a36e4d1f>
2022-05-26 11:14:09:895565 [DEBUG] Detector Speech Output: read-marker=500, count=13, write-marker=514 <5e8ed690a36e4d1f>
2022-05-26 11:14:09:895792 [INFO] Input Complete [success] size=85920 bytes, dur=5370 ms <5e8ed690a36e4d1f@gsr>
2022-05-26 11:14:09:970063 [INFO] Received Response: status [0] type [unspecified] result-count [0] <5e8ed690a36e4d1f@gsr>
2022-05-26 11:14:09:970143 [DEBUG] Signal Message to [MRCP Server] [0x7f6bfc002ce0;3;4]
2022-05-26 11:14:09:970194 [DEBUG] Signal Message to [MRCP Server] [0x7f6bfc002ca0;3;4]
2022-05-26 11:14:09:970244 [DEBUG] Process Message [MRCP Server] [0x7f6bfc002ce0;3;4]
2022-05-26 11:14:09:970288 [INFO] Process START-OF-INPUT Event <5e8ed690a36e4d1f@speechrecog> [1]
2022-05-26 11:14:09:970306 [DEBUG] Signal Message to [MRCPv2-Agent-1] [0x7f6c08110b90;1;0]
2022-05-26 11:14:09:970334 [DEBUG] Wait for Messages [MRCP Server]
2022-05-26 11:14:09:970390 [DEBUG] Process Message [MRCP Server] [0x7f6bfc002ca0;3;4]
2022-05-26 11:14:09:970407 [INFO] Process RECOGNITION-COMPLETE Event <5e8ed690a36e4d1f@speechrecog> [1]
2022-05-26 11:14:09:970421 [INFO] State Transition RECOGNIZING -> RECOGNIZED <5e8ed690a36e4d1f@speechrecog>
2022-05-26 11:14:09:970434 [DEBUG] Signal Message to [MRCPv2-Agent-1] [0x7f6c08112390;1;0]
2022-05-26 11:14:09:970451 [DEBUG] Wait for Messages [MRCP Server]
2022-05-26 11:14:09:970481 [DEBUG] Process Poller Wakeup [MRCPv2-Agent-1]
2022-05-26 11:14:09:970495 [DEBUG] Process Message [MRCPv2-Agent-1] [0x7f6c08110b90;1;0]
2022-05-26 11:14:09:970517 [INFO] Send MRCPv2 Data 192.168.88.221:1544 <-> 192.168.88.38:48003 [115 bytes]
MRCP/2.0 115 START-OF-INPUT 1 IN-PROGRESS
Channel-Identifier: 5e8ed690a36e4d1f@speechrecog
Input-Type: speech
2022-05-26 11:14:09:970606 [DEBUG] Process Message [MRCPv2-Agent-1] [0x7f6c08112390;1;0]
2022-05-26 11:14:09:970623 [INFO] Send MRCPv2 Data 192.168.88.221:1544 <-> 192.168.88.38:48003 [130 bytes]
MRCP/2.0 130 RECOGNITION-COMPLETE 1 COMPLETE
Channel-Identifier: 5e8ed690a36e4d1f@speechrecog
Completion-Cause: 001 no-match