MRCP/2.0 553 DEFINE-GRAMMAR 1
Channel-Identifier: fda8893c9fec4dbc@speechrecog
Content-Type: application/srgs+xml
Content-Id: request1@form-level
Content-Length: 380
<?xml version="1.0"?><grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" root="pre" mode="voice"><meta name="single-utterance" content="true"/><meta name="input-timeout" content="10000"/><meta name="speech-complete-timeout" content="2000"/><meta name="scope" content="hint"/><rule id="pre"><one-of><item>$FULLPHONENUM</item></one-of></rule></grammar>
MRCP/2.0 359 RECOGNIZE 2
Channel-Identifier: fda8893c9fec4dbc@speechrecog
Content-Type: text/uri-list
Vendor-Specific-Parameters: single-utterance=true
Cancel-If-Queue: false
Recognition-Timeout: 10000
Speech-Complete-Timeout: 2000
Start-Input-Timers: true
Save-Waveform: true
Speech-Language: en-US
Content-Length: 27
session:request1@form-level
Interestingly we have also tested $TIME and this did produce some better results. Is $TIME a special case in uniMRCP or should we expect all the Google class tokens to work as they're just passed on by the server/plugin ?
Thanks
Ed James
I've just done some testing with Google Class Tokens with a prompt that reads out a 16 digit number:
Azure (No Grammar and 100% Correct)
<result>
<interpretation grammar="builtin:speech/transcribe" confidence="0.96">
<instance>4931785634782378</instance>
<input mode="speech">four nine three one seven eight five six three four seven eight two three seven eight</input>
</interpretation>
</result>
Google (No Grammar and has the digits but not the format)
<result>
<interpretation grammar="builtin:speech/transcribe" confidence="0.93">
<instance>4931 +78-563-478-2378</instance>
<input mode="speech">4931 +78-563-478-2378</input>
</interpretation>
<interpretation grammar="builtin:speech/transcribe" confidence="0.8">
<instance>for 93178 563-478-2378</instance>
<input mode="speech">for 93178 563-478-2378</input>
</interpretation>
<interpretation grammar="builtin:speech/transcribe" confidence="0.66">
<instance>49 +317-856-347-8237 8</instance>
<input mode="speech">49 +317-856-347-8237 8</input>
</interpretation>
</result>
Google ($FULLPHONENUM and missing digits and incorrect format)
<result>
<interpretation grammar="session:request1@form-level" confidence="0.86">
<instance>01785 6347 8237</instance>
<input mode="speech">01785 6347 8237</input>
</interpretation>
<interpretation grammar="session:request1@form-level" confidence="0.66">
<instance>01785 6347 827</instance>
<input mode="speech">01785 6347 827</input>
</interpretation>
<interpretation grammar="session:request1@form-level" confidence="0.65">
<instance>1785 6347 8237</instance>
<input mode="speech">1785 6347 8237</input>
</interpretation>
</result>
Google ($OOV_CLASS_DIGIT_SEQUENCE and one missing digit and correct format)
<result>
<interpretation grammar="session:request1@form-level" confidence="0.95">
<instance>493178563478237</instance>
<input mode="speech">493178563478237</input>
</interpretation>
<interpretation grammar="session:request1@form-level" confidence="0.95">
<instance>4931788563478237</instance>
<input mode="speech">4931788563478237</input>
</interpretation>
<interpretation grammar="session:request1@form-level" confidence="0.91">
<instance>4931785634788237</instance>
<input mode="speech">4931785634788237</input>
</interpretation>
</result>
So the Google class tokens do seem to impact the results (which does suggest they are reaching Google) although if this pattern could be relied upon then we'd be better off not using a class token and just reformatting the Google result without tokens as at least it got all the digits.
Does anyone have any experience of using Google tokens producing reliably better results than having no grammar ?
Thanks
Ed James
--
You received this message because you are subscribed to the Google Groups "UniMRCP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unimrcp/a075d607-113c-477b-ad99-0c8093d31972n%40googlegroups.com.
1. Can you verify that the sampling rate is at 16000 khz or higher?
2. Can you please verify the codec being used is a lossless codec? We recommend FLAC or LINEAR16
3. Please provide the
Entire request code sent to API including the entire RecognitionConfig (see https://cloud.google.com/speech-to-text/docs/reference/rest/v1/RecognitionConfig )
I send PCMA 8kHz audio to the uniMRCP server but I notice they're saying 16kHz is optimal and PCMA doesn't seem to be a listed codec so are you transcoding ?
Thanks
Ed
> 1. Can you verify that the sampling rate is at 16000 khz or higher?
The plugin supports both 8 kHz and 16 kHz. The sampling rate used in communication with Google is derived from the sampling rate negotiated via SDP answer/offer for RTP. No resampling is performed.
> 2. Can you please verify the codec being used is a lossless codec? We recommend FLAC or LINEAR16
LINEAR16.
> 3. Please provide the
Entire request code sent to API including the entire RecognitionConfig (see https://cloud.google.com/speech-to-text/docs/reference/rest/v1/RecognitionConfig )
> I
send PCMA 8kHz audio to the uniMRCP server but I notice they're saying
16kHz is optimal and PCMA doesn't seem to be a listed codec so are you
transcoding ?
PCMA is decoded to raw PCM and sent to Google in 8 kHz. Their phone_call model, I referred to in my previous post, is trained on 8 kHz data. Naturally, 16 kHz would be preferable, but it does not apply to traditional telephony.
This would be a typical config sent to Google in a similar case.
{
"streamingConfig":
{
"config":
{
"encoding":"LINEAR16","sampleRateHertz":8000,"languageCode":"en-US","maxAlternatives":1,
"speechContexts":[{"phrases":["$OOV_CLASS_ALPHANUMERIC_SEQUENCE"]}],
"model":"phone_call",
"useEnhanced":true,
"enableSpokenPunctuation":false,
"enableSpokenEmojis":false
},
"interimResults":true
}
}
To view this discussion on the web visit https://groups.google.com/d/msgid/unimrcp/2ebb9fb9-2ab8-4d37-9997-a43db1330e12n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unimrcp/da5f26d2-a127-4f78-918f-629ba4b68967n%40googlegroups.com.