Google ASR - Voice and DTMF Mixed Recognition

erj

unread,

Apr 24, 2023, 6:51:58 AM4/24/23

to UniMRCP

Hi All,

We are currently talking to a client and gathering requirements and they wish to have an IVR which gives the caller the ability to enter digits either by saying them or by pressing dtmf keys. Given that grammars sent to Google have to have a header that's either mode="voice" or mode="dtmf" then how could we send grammar(s) to Google if we don't know which of the options the customer has chosen ?

Are either of these possible/recommended in uniMRCP and Google:

1. Send multiple grammars (so call DEFINE-GRAMMAR twice) once for a voice grammar and once for a dtmf grammar. Would Google accept these as cumulative (so the second wouldn't supercede the first etc..)

2. Send a single multi part grammar to Google (using the apt_multipart_content_create function) with a voice and a dtmf section ?

Presumably either way we'd also have to reference both grammars in the subsequent RECOGNIZE request, is that correct ?

Any thoughts, ideas etc.. would be gratefully received.

Also the customer is saying that our recognition should be CISL interface compatible which isn't something I've come across before, I've searched the docs and there's no mention of it. Have you every heard of this interface ?

Thanks for your help, much appreciated.

Regards

Ed James

Girish Gopinath

unread,

Apr 26, 2023, 8:31:47 AM4/26/23

to uni...@googlegroups.com

Hello:

I have a system that detects digits either by DTMF or by speech. We use the GSR plugin. We have an Asterisk server communicating with the UniMRCP server. The instruction in Asterisk is something like the below:

exten => 1234,1,MRCPRecog(/etc/asterisk/grammars/onetwo.xml,i=none&p=mrcp2&t=3000&b=1&ct=0.5&spl="en-US"&f=<prompt-file>)

where the prompt-file has to play a message like "Press 1 or 2 or say 1 or 2".

The grammar file onetwo.xml looks like this:

<?xml version="1.0"?>
<grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" mode="voice" root="digit">
<rule id="digit">
<one-of>
<item>one</item>
<item>two</item>
</one-of>
</rule>
</grammar>

Hope this helps.

Thanks,

/Girish

--
You received this message because you are subscribed to the Google Groups "UniMRCP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unimrcp/26c16fb0-8b5a-48dd-9e81-560af0633e5dn%40googlegroups.com.

erj

unread,

Apr 26, 2023, 10:11:30 AM4/26/23

to UniMRCP

Hi Girish,

Thanks, it's very interesting that it works with DTMF if you send in a 'mode="voice"' dynamic grammar.

I was looking at 7.5 in the docs UMS GSR - Usage Manual | UniMRCP Documentation (unispeech.io) and it sends in two builtin grammars in the RECOGNIZE request so I wonder if the asterisk server is sending in a RECOGNIZE request referencing the dynamic voice grammar in onetwo.xml but also referencing a builtin DTMF grammar which exists on the uniMRCP server.

Have you got any logs from Asterisk server showing the contents of the RECOGNIZE request that's going to the uniMRCP server or a log snippet from the uniMRCP server showing the RECOGNIZE it received ?

If not (I'm sure you're very busy :-), thanks for your help anyway, I'll try putting a builtin DTMF grammar on the server and then adding a reference to it in all the RECOGNIZE requests, I can't see why it would do any harm even in the case where it's known for certain that DTMF wouldn't be being sent, it would just always catch it in the either-or case.

Thanks

Ed

erj

unread,

Apr 28, 2023, 6:47:46 AM4/28/23

to UniMRCP

Hi All,

That seems to work even if I make up a random name for the builtin dtmf grammar ("hsdsdfjhd" in this case). So if I do the following (with extracts from the client MRCP log):

1, Send a DEFINE-GRAMMAR with a 'mode="voice" ' grammar and 'Content-Id: request1@form-level' header.

2, Send a RECOGNIZE request containing a reference to that grammar and also to a builtin DTMF grammar using the made up name:

Content-Length: 50

session:request1@form-level
builtin:dtmf/hsdsdfjhd

3, This is the result:

MRCP/2.0 388 RECOGNITION-COMPLETE 2 COMPLETE
Channel-Identifier: 82eaef9ff8ca42aa@speechrecog
Completion-Cause: 000 success
Waveform-Uri:
Content-Type: application/x-nlsml
Content-Length: 187

<?xml version="1.0"?>
<result>
<interpretation grammar="builtin:dtmf/hsdsdfjhd" confidence="1">
<instance>1</instance>
<input mode="dtmf">1</input>
</interpretation>
</result>

Does anyone understand why this builtin DTMF grammar with a made up name actually works, is the name irrelevant ? The server logs are as follows. What in 'builtin:dtmf/hsdsdfjhd' triggers the server to do Init DTMF Detector ? Also where is this builtin DTMF grammar, I looked in the umsgsr.xml file (as per docs) and couldn't find anything ?

2023-04-27 17:02:50:783064 [INFO] Receive MRCPv2 Data 172.18.1.178:1544 <-> 10.1.1.117:52719 [444 bytes]
MRCP/2.0 444 RECOGNIZE 2
Channel-Identifier: 82eaef9ff8ca42aa@speechrecog
Content-Type: text/uri-list
Vendor-Specific-Parameters: single-utterance=false;use-enhanced=true;model=latest_long
Cancel-If-Queue: false
Recognition-Timeout: 60006
No-Input-Timeout: 10007
Speech-Complete-Timeout: 2003
Start-Input-Timers: true
Save-Waveform: true
Speech-Language: en-US
Content-Length: 50

session:request1@form-level
builtin:dtmf/hsdsdfjhd
2023-04-27 17:02:50:783162 [INFO] Process RECOGNIZE Request <82eaef9ff8ca42aa@speechrecog> [2]
2023-04-27 17:02:50:783236 [WARN] Unknown Parameter [input-timeout] <82eaef9ff8ca42aa@gsr>
2023-04-27 17:02:50:783271 [WARN] Unknown Parameter [noinput-timeout] <82eaef9ff8ca42aa@gsr>
2023-04-27 17:02:50:783280 [WARN] Unknown Parameter [speech-complete-timeout] <82eaef9ff8ca42aa@gsr>
2023-04-27 17:02:50:783309 [INFO] Init Speech Detector: frame-size=160, max-frame-count=360, output-frame-count=20, vad-mode=2, noinput-timeout=10007 ms, input-timeout=60006 ms, start-timeout=300 ms, complete-timeout=2003 ms, incomplete-timeout=3000 ms, leading-silence=300 ms, trailing-silence=300 ms, interim-results=1, start-of-input=external <82eaef9ff8ca42aa>
2023-04-27 17:02:50:783342 [INFO] Init DTMF Detector: interdigit-timeout=5000 ms, term-timeout=10000 ms, term-char= , length=0, min-length=0, max-length=0 <82eaef9ff8ca42aa>
2023-04-27 17:02:50:783349 [INFO] Start No-Input Timer [10007 ms] <82eaef9ff8ca42aa>
2023-04-27 17:02:50:783363 [INFO] Open Waveform File for Writing /opt/unimrcp/var/umsgsr-82eaef9ff8ca42aa-2.wav, sampling-rate [8000]
2023-04-27 17:02:50:783982 [INFO] Create gRPC Channel [eu-speech.googleapis.com:443] <82eaef9ff8ca42aa@gsr>
2023-04-27 17:02:50:784216 [INFO] Set Model [latest_long] <82eaef9ff8ca42aa@gsr>
2023-04-27 17:02:50:784252 [INFO] gRPC Streaming Recognize <82eaef9ff8ca42aa@gsr>
2023-04-27 17:02:50:787121 [INFO] Process RECOGNIZE Response <82eaef9ff8ca42aa@speechrecog> [2]
2023-04-27 17:02:50:787148 [INFO] State Transition IDLE -> RECOGNIZING <82eaef9ff8ca42aa@speechrecog>
2023-04-27 17:02:50:787224 [INFO] Send MRCPv2 Data 172.18.1.178:1544 <-> 10.1.1.117:52719 [83 bytes]
MRCP/2.0 83 2 200 IN-PROGRESS
Channel-Identifier: 82eaef9ff8ca42aa@speechrecog

2023-04-27 17:02:52:360054 [INFO] DTMF Detector State Transition NO-INPUT -> IN-PROGRESS [0 ms] <82eaef9ff8ca42aa>
2023-04-27 17:02:52:360134 [INFO] Start Input Timer [60006 ms] <82eaef9ff8ca42aa>
2023-04-27 17:02:52:360171 [INFO] Detected Start of Event: id=1, digit=1 <82eaef9ff8ca42aa>
2023-04-27 17:02:52:360296 [INFO] Process START-OF-INPUT Event <82eaef9ff8ca42aa@speechrecog> [2]
2023-04-27 17:02:52:360374 [INFO] Send MRCPv2 Data 172.18.1.178:1544 <-> 10.1.1.117:52719 [113 bytes]
MRCP/2.0 113 START-OF-INPUT 2 IN-PROGRESS
Channel-Identifier: 82eaef9ff8ca42aa@speechrecog
Input-Type: dtmf

2023-04-27 17:02:52:420082 [INFO] Detected End of Event: id=1 duration=560 ts <82eaef9ff8ca42aa>
2023-04-27 17:02:52:420162 [INFO] Start Inter-Digit Timer [5000 ms] <82eaef9ff8ca42aa>
2023-04-27 17:02:57:410036 [INFO] DTMF Detector State Transition IN-PROGRESS -> COMPLETE [0 ms] <82eaef9ff8ca42aa>
2023-04-27 17:02:57:410168 [INFO] Detector Stats: leading-silence=0 ms, input=0 ms, trailing-silence=0 ms <82eaef9ff8ca42aa>
2023-04-27 17:02:57:410274 [INFO] Input Complete [success] size=0 bytes, dur=0 ms <82eaef9ff8ca42aa@gsr>
2023-04-27 17:02:57:410508 [INFO] Process RECOGNITION-COMPLETE Event <82eaef9ff8ca42aa@speechrecog> [2]
2023-04-27 17:02:57:410520 [INFO] State Transition RECOGNIZING -> RECOGNIZED <82eaef9ff8ca42aa@speechrecog>
2023-04-27 17:02:57:410580 [INFO] Send MRCPv2 Data 172.18.1.178:1544 <-> 10.1.1.117:52719 [388 bytes]
MRCP/2.0 388 RECOGNITION-COMPLETE 2 COMPLETE
Channel-Identifier: 82eaef9ff8ca42aa@speechrecog
Completion-Cause: 000 success
Waveform-Uri:
Content-Type: application/x-nlsml
Content-Length: 187

<?xml version="1.0"?>
<result>
<interpretation grammar="builtin:dtmf/hsdsdfjhd" confidence="1">
<instance>1</instance>
<input mode="dtmf">1</input>
</interpretation>
</result>

I'm happy that it works, it would be good to understand why.

Any thoughts/comments very welcome

Thanks

Ed

Arsen Chaloyan

unread,

Apr 28, 2023, 6:08:33 PM4/28/23

to uni...@googlegroups.com

Hi Ed,

It is very typical to accept both speech and DTMF input. For that purpose, you are supposed to activate a speech grammar and a DTMF grammar per this use case.

The name of the builtin DTMF grammar is actually not observed. The only supported DTMF grammar is made for a plain sequence of digits. For example,

builtin:dtmf/digits
builtin:dtmf/digit?length=4
builtin:dtmf/digit?min-length=2;max-length=6

Please also note that the DTMF handling is done entirely by the plugin. Google is not involved.

To view this discussion on the web visit https://groups.google.com/d/msgid/unimrcp/65fbd072-5d5a-4aca-adbe-c7298512da31n%40googlegroups.com.

--

Arsen Chaloyan
Author of UniMRCP
http://www.unimrcp.org

Girish Gopinath

unread,

May 3, 2023, 1:16:44 AM5/3/23

to uni...@googlegroups.com

Hello Ed,

Apologies for the long delay. I was away from work due to personal reasons.

Looks like DTMF is working now for you. Still, if you need logs from my Asterisk and UniMRCP server, please let me know.

Regards,

/Girish

To view this discussion on the web visit https://groups.google.com/d/msgid/unimrcp/65fbd072-5d5a-4aca-adbe-c7298512da31n%40googlegroups.com.

erj

unread,

May 3, 2023, 3:57:58 AM5/3/23

to UniMRCP

Hi Girish,

No worries, Arsen's reply has cleared this up for us but thanks anyway.

Regards

Ed

Reply all

Reply to author

Forward