Tuning GSR configuration parameters for better accuracy with short utterances

306 views
Skip to first unread message

bharath kumar

unread,
Jan 30, 2019, 6:23:31 PM1/30/19
to UniMRCP
Hi Arsen,

I'm trying to understand the impact of tuning parameters in GSR plugin. Particularly, 
  1. what would be the consequence if I set the speech-start-timeout to "0" without providing any transition time?
  2. If I set speech-complete-timeout to a lesser value, will that cut off the input which is been spoken at a slow pace?
  3. If I set the vad-mode to 3, will the input get impacted by surrounding noise?

<speech-dtmf-input-detector

      vad-mode="2"

      speech-start-timeout="300"

      speech-complete-timeout="1000"

      speech-incomplete-timeout="3000"

      noinput-timeout="5000"

      input-timeout="10000"

      dtmf-interdigit-timeout="5000"

      dtmf-term-timeout="10000"

      dtmf-term-char=""

      speech-leading-silence="300"

      speech-trailing-silence="300"

      speech-output-period="200"

   />



Cheers,
Bharath

Arsen Chaloyan

unread,
Jan 30, 2019, 11:41:04 PM1/30/19
to UniMRCP
Hi Bharath,

Setting these parameters in the right way is quite important and may have certain impact. While the defaults are good for general use, you may need to adjust the parameters for a better performance in one or the other case.

Almost all the parameters can be set per recognition request, except for speech-start-timeout, which has no equivalent MRCP header field to use.

1. what would be the consequence if I set the speech-start-timeout to "0" without providing any transition time?

The minimum value is 10 ms. The default value of this parameter would likely be changed to 50 ms in future releases, as 300 ms sometimes is quite restrictive for very short utterances like "no", which may result in no speech activity detected. In fact, 50 ms has been successfully used in various installations.

2. If I set speech-complete-timeout to a lesser value, will that cut off the input which is been spoken at a slow pace?

Yes, be careful with that. Callers tend to breath while inputting a longer utterance, such a full address or multi digit numbers. But, what you can safely do is to use a shorter speech-complete-timeout for short utterances like "yes/no".

3. If I set the vad-mode to 3, will the input get impacted by surrounding noise 

This could be too much restrictive for short utterances. While setting vad-mode to 3 will result in less false positives caused by background noise, the capabilities of internal GMM-based voice activity detector are still naturally limited. You may consider having a longer speech-incomplete-timeout in this case not to cut off the input prematurely because of background noise.

Please let me know if you have any questions.

Cheers.

--
You received this message because you are subscribed to the Google Groups "UniMRCP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Arsen Chaloyan
Author of UniMRCP
http://www.unimrcp.org

bharath kumar

unread,
Jan 31, 2019, 7:23:18 PM1/31/19
to UniMRCP
Thanks, Arsen for the detail explanation about the GSR tuning parameters. As you mentioned the short utterances like "no"  were not transcribed or recognized all the time. After changing the "speech-start-timeout" to 50ms, it works better now. 
Reply all
Reply to author
Forward
0 new messages