Google Speech Adaptation Boost and Class Tokens

284 views
Skip to first unread message

Arsen Chaloyan

unread,
Apr 11, 2020, 5:53:26 PM4/11/20
to UniMRCP
Purpose

This post is intended to provide additional clarifications regarding the use and limitations of speech adaptation boost and class tokens with the following UniMRCP server plugins:
  • GSR 1.17.0
  • GDF 1.15.0
Speech Adaptation Boost

To start off, even in the latest Google APIs, speech adaptation boost is available only for Dialogflow v2. For Speech-to-Text, this feature is still available in v1b1beta1, but not v1. That is why the approach discussed in this post is currently applicable to GDF only. The same approach would be applicable to GSR in the future when Google makes this feature available in v1 and we have the Google APIs upgraded.

Boost values can be set in a speech context defined in the configuration file by using the attribute name weight or boost, for example, as follows. Note the two attribute names can be used interchangeably.

<speech-context id="custom" enable="true">
     <phrase weight="15">fair</phrase>
     <phrase weight="2">fare</phrase>
</speech-context>

The same can be specified via SRGS XML by using the attribute name weight, for example, as follows.

<grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" mode="voice" root="custom">
  <rule id="custom">
    <one-of>
      <item weight="15">fair</item>
      <item weight="2">fare</item>
    </one-of>
  </rule>
</grammar>

Refer to recommendations from Google for best practices on setting up boost values:


Class Tokens

Class tokens can be used with both GSR and GDF. The following is a sample speech context defined in the configuration file which make use of the class token $TIME.

<speech-context id="time" language="en-US" enable="false">
   <phrase>$TIME</phrase>
</speech-context>

The specified speech context can be referenced via a built-in grammar as follows.

builtin:speech/time
or
builtin:grammar/time

The same cab be specified via SRGS XML as follows

<grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" mode="voice" root="custom">
  <meta name="scope" content="hint"/>
  <rule id="custom">
    <one-of>
       <item>$TIME</item>
    </one-of>
  </rule>
</grammar>

Refer to the following page for the list of available class tokens:


Questions and suggestions are welcome.

--
Arsen Chaloyan
Author of UniMRCP
http://www.unimrcp.org

Arsen Chaloyan

unread,
Apr 26, 2020, 2:50:57 PM4/26/20
to UniMRCP
If you are interested in using speech adaptation boost with a custom GSR plugin built against v1p1beta1 API, then follow the instructions below.

Make sure you have all the latest packages/dependencies installed.

yum install unimrcp-gsr

Remove the GSR package.

rpm -e unimrcp-gsr

Upgrade Google APIs and install the custom GSR package.


The use of boost is explained earlier in this thread. The custom GSR package also supports alternate languages, which is another feature available in v1p1beta1 API.


The alternate languages can be specified globally in umsgsr.xml, for example, as follows

   <streaming-recognition
      language="en-US"
      alternate-languages="es-ES, de-DE"

Andres Ortiz

unread,
Jul 15, 2020, 10:11:22 AM7/15/20
to UniMRCP
Hi Arsen,
I am testing the GSR alternate-languages, it looks great from the google transcription perspective, but I don't get any indication in the result what the actual language is. We could use that information to inform the downstream engine to change language and change the TTS voice and language. For instance:

MRCP/2.0 471 RECOGNITION-COMPLETE 2 COMPLETE
Channel-Identifier: 1ac9da9949f842c7@speechrecog
Completion-Cause: 000 success
Content-Type: application/x-nlsml
Content-Length: 286

<?xml version="1.0"?>
<result>
  <interpretation grammar="session:593bd5b2c38b168d7a4e800b72a7597b-general" confidence="0.94">
    <instance>j&apos;ai perdu ma carte de crédit</instance>
    <language>fr-FR</language>
    <input mode="speech">j&apos;ai perdu ma carte de crédit</input>
  </interpretation>
</result>

Thanks,

Andres

Arsen Chaloyan

unread,
Jul 20, 2020, 10:06:10 PM7/20/20
to UniMRCP
Right, the language would need to be returned with the results. The question is in which format. We should conform to NLSML, which has a certain XML schema. The right way would be to extend the format of the instance element. I'll take this issue into consideration going forward. Thanks for the note.

--
You received this message because you are subscribed to the Google Groups "UniMRCP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unimrcp/7fae5d75-364e-4a4c-b7c6-555e749bf3e6o%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages