(GDF Plugin) Experiencing long delay in getting detection result from google dialogflow

112 views
Skip to first unread message

Frank Wang

unread,
Mar 31, 2022, 3:13:13 PM3/31/22
to UniMRCP
Hi, Arsen

We are getting another issue more frequently with our freeswitch/UniMRCP server/GDF plugin solution. Basically google dialogflow has successfully recognized the user input, but it took long time to return intent detection result. See the following UniMRCP server log trace. the recognition was marked final at 16:18:38, but the detection result didn't come until 16:18:59, and there was no other activities during this period. Do you know any possible cause of the delay? does the gdf plugin have anything to do with it, or we have to seek support from google?

...
2022-03-31 16:18:38:614458 [INFO]   Received Response: status [1] id [] recog result [1] query result [0] webhook status [0] output audio [0 bytes] <550efaebb6c243b9@gdf>
2022-03-31 16:18:38:614486 [INFO]   Recognition Result: transcript [<text recognized>] confidence [0.93] final [1] end-offset [5:480] <550efaebb6c243b9@gdf>
2022-03-31 16:18:59:852986 [INFO]   Speech Detector State Transition IN-PROGRESS -> COMPLETE [26810 ms] <550efaebb6c243b9>
2022-03-31 16:18:59:853008 [INFO]   Detector Stats: leading-silence=220 ms, input=25860 ms, trailing-silence=1000 ms <550efaebb6c243b9>
2022-03-31 16:18:59:853448 [INFO]   Input Complete [success] size=434560 bytes, dur=27160 ms <550efaebb6c243b9@gdf>
2022-03-31 16:19:00:065760 [INFO]   Received Response: status [1] id [b9f50382-0e8f-4cdd-a25f-6b38a85a2627-96b8a746] recog result [0] query result [1] webhook status [1] output audio [0 bytes] <550efaebb6c243b9@gdf>
2022-03-31 16:19:00:066212 [INFO]   Query Result: {
 "queryText": "<text recognized>",
 "action": "switch",
 "parameters": {
...

Thanks,
Frank

Arsen Chaloyan

unread,
Apr 12, 2022, 9:35:36 PM4/12/22
to UniMRCP
Hi Frank,

The first question is whether the recognition is performed in the single-utterance or continuous mode. I guess it was the continuous mode; otherwise, Google would have indicated the input completion earlier. The use of inter-result timeout, which is disabled by default, is implied when recognition is performed in the continuous mode, as the inter-result timeout allows the plugin (client) to signal the input completion more reliably, independent of vad.

   <streaming-recognition
      inter-result-timeout="2000"


--
You received this message because you are subscribed to the Google Groups "UniMRCP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unimrcp/7a95d40c-c559-4289-8076-7996b92d93b6n%40googlegroups.com.


--
Arsen Chaloyan
Author of UniMRCP
http://www.unimrcp.org

Frank Wang

unread,
Apr 12, 2022, 11:57:21 PM4/12/22
to UniMRCP
Hi,  Arsen,

Yes, the recognition is performed in the continuous mode. Base on my previous understanding, Google will always decide when input is completed, so you are saying client can force the completion too (eg. using the inter-result-timeout)?  I will give it a try, but is there a side effect of doing so?

As for the delay issue, Google some times takes as long as 50 secs to complete recognition after caller finishes speaking. We listened the recording and it was completely silent during the delay period, have you experienced this issue before? we created a ticket with Google but the support we are talking to doesn't have a clue what's going on since google cloud logging only shows the dialogflow intent detection process, but not the voice recognition process. right now they are escalating to another level of support. if you have any insight of the issue, please let us know

Thanks,
Frank

Arsen Chaloyan

unread,
Apr 15, 2022, 2:21:31 PM4/15/22
to UniMRCP
Hi Frank,

I just had a chance to read your message. The discussion group is not monitored on a daily basis. If there is any urgency in your requests, there is an option to obtain support complimentary to the licenses with different plans available.

> Yes, the recognition is performed in the continuous mode.

Well. If the caller is supposed to input a long sequence of digits or a sentence consisting of multiple words, then, yes, the continuous mode shall be used; otherwise, in the single-utterance mode, Google may signal end-of-utterance prematurely. For short utterances, though, the single-utterance mode is more efficient.

> Base on my previous understanding, Google will always decide when input is completed,

No, this is not true. In the single-utterance mode, either Google or the client may indicate end of input, whichever happens first. In the continuous mode, it is solely up to the client.

> so you are saying client can force the completion too (eg. using the inter-result-timeout)?

Yes, the client may signal EOS based on its internal timeouts. There are many different timeouts enforced, including the inter-result timeout, when set.

> I will give it a try, but is there a side effect of doing so?

The inter-result timeout was added relatively recently, and it is disabled by default. There are no side effects, but you should be careful not to cause premature completion, if the timeout is too short.

> As for the delay issue, Google some times takes as long as 50 secs to complete recognition after caller finishes speaking.

Google is not going to complete the request performed in the continuous mode, unless the EOS is indicated by the client.

> We listened the recording and it was completely silent during the delay period, have you experienced this issue before?

It depends, I'd need detailed logs and the utterance to comment further. But, the inter-result timeout was introduced to fight the exact same problem.

> we created a ticket with Google but the support we are talking to doesn't have a clue what's going on since google cloud logging only shows the dialogflow intent detection process, but not the voice recognition process. right now they are escalating to another level of support. if you have any insight of the issue, please let us know

We tackle this sort of problem nearly on a daily basis with many different customers. It is basically a matter of proper configuration.

Frank Wang

unread,
Apr 16, 2022, 1:53:08 PM4/16/22
to UniMRCP
Hi, Arsen,

thanks for detailed explanation, I think I need to understand the protocol better now,

my first question is who triggers the intent detection if "... Google is not going to complete the request performed in the continuous mode ...", I thought google decides when a complete phrase is recognized and then sends the recognized text to dialogflow, the unimrcp client basically just waits for the intent detection result until any timeout occurs. Was my understanding correct or things work differently? 

also while analyzing the logs, I see during recognition process, plugin receives quite a few inter results that are marked as final [1] but intent detection is not triggered until certain condition is met, so is this condition when google thinks a complete sentence is received?

another question is are all timeouts (no input timeout, input timeout or inter result timeout) triggered by UniMRCP client ? I thought they are done by google and the plugin just passes how long these timeouts should be, but I guess I was wrong. so in this case, in the event of input timeout or inter result timeout, the plugin will tell google to stop recognition process and go ahead do the intent detection with whatever is recognized so far?

Thanks,
Frank

Arsen Chaloyan

unread,
Apr 23, 2022, 1:49:33 PM4/23/22
to UniMRCP
Hi Frank,

Please see my comments inline.

On Sat, Apr 16, 2022 at 10:53 AM 'Frank Wang' via UniMRCP <uni...@googlegroups.com> wrote:
Hi, Arsen,

thanks for detailed explanation, I think I need to understand the protocol better now,

my first question is who triggers the intent detection if "... Google is not going to complete the request performed in the continuous mode ...", I thought google decides when a complete phrase is recognized and then sends the recognized text to dialogflow, the unimrcp client basically just waits for the intent detection result until any timeout occurs. Was my understanding correct or things work differently? 

It is a bit tricky. Your understanding is partially correct but there are key points that you miss here. It is true that a single gRPC call for the StreamingDetectIntent method is placed to Google in this case. It is also true that this method consists of two integral parts handled by Google underneath, such as speech transcription and intent detection. What is not aligned with your understanding is: in the continuous mode, Google does not indicate completion of the input to wrap up with the speech transcription and move to the intent detection part. It is solely up to the client to signal the end of input. Afterwards, Google will proceed to the intent detection and the gRPC call will eventually complete.
 
also while analyzing the logs, I see during recognition process, plugin receives quite a few inter results that are marked as final [1] but intent detection is not triggered until certain condition is met, so is this condition when google thinks a complete sentence is received?

See above, the input completion must be signalled by the client.


another question is are all timeouts (no input timeout, input timeout or inter result timeout) triggered by UniMRCP client ? I thought they are done by google and the plugin just passes how long these timeouts should be, but I guess I was wrong. so in this case, in the event of input timeout or inter result timeout, the plugin will tell google to stop recognition process and go ahead do the intent detection with whatever is recognized so far?

All the timeouts are implemented in the plugin (client) and are not propagated to Google, partly because Google does not provide such an API, and partly because some of the timeouts do not apply to Google.

Frank Wang

unread,
Apr 23, 2022, 10:46:37 PM4/23/22
to UniMRCP
Hi, Arsen,

I am clear now that it is the client (or the gdf plugin) to signal the end of input and trigger intent detection in the continuous mode. now the question is what's the criteria to signal the end of input? is it the length of the silence from the last word that user speaks, like trailing silence (1000ms)? if that's the case, then the whole delay issue is on the plugin side now, if you read the first post in this thread, there are 21 secs delay after user spoke the last words before the intent detection was triggered, you proposed to add inter-result timeout to alleviate the issue, but what was the plugin doing during this silent period, was it detecting some audio but we couldn't hear from the recording? I think we are near to the bottom of the issue now ...

Thanks,
Frank

Arsen Chaloyan

unread,
May 3, 2022, 6:24:15 PM5/3/22
to UniMRCP
Hi Frank,

If you could provide the recorded utterance along with the logs, I can tell exactly what was causing the delay. Otherwise, my initial response remains intact. Even insignificant background noise may affect detection of the end of input in the internal VAD. The inter-result timeout has been implemented to determine the end of input completion in a more redundant way.

Frank Wang

unread,
May 4, 2022, 12:11:11 AM5/4/22
to UniMRCP
Hi, Arsen,

we have put our system into production for a week now and it's using a 2 second inter-result timeout. So far we haven't heard any complaint on delays. I think this option really helped resolve the issue and we don't need to troubleshoot further now.

also from this entire thread, I learned how the gdf plugin works based on different configurations. The knowledge is very valuable. Thanks for all your to-the-point answers and detailed explanations. Much appreciated. Hope we will be doing more business with you.

Thanks,
Frank.

Reply all
Reply to author
Forward
0 new messages