UniMRCP and Speech API from Google Azure or Amazon

Gianluca Moretti

unread,

Feb 3, 2017, 12:29:38 PM2/3/17

to UniMRCP

Has anyone attempted to use Google SpeechAPI and Google Translate with UniMRCP? I would traslate the stream over html/HTTP in MRCP protocol.

Thank you in advance
Best regards
Gianluca

Arsen Chaloyan

unread,

Feb 3, 2017, 11:44:47 PM2/3/17

to UniMRCP

Hi Gianluca,

The Google Speech API wrapper would be available in the second quarter of 2017.

This is a very demanding topic nowadays. There are certain plans to make the wrapper also available as a service, but nothing in particular yet...

Thank you for your interest.

Gianluca

--
You received this message because you are subscribed to the Google Groups "UniMRCP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Arsen Chaloyan
Author of UniMRCP
http://www.unimrcp.org

Gianluca Moretti

unread,

Feb 7, 2017, 3:25:18 AM2/7/17

to UniMRCP

Gianluca Moretti

unread,

Feb 7, 2017, 3:25:39 AM2/7/17

to UniMRCP

Thank you, I'm very interested to this topic, may we keep in touch to that?

BR

On Friday, 3 February 2017 18:29:38 UTC+1, Gianluca Moretti wrote:

Gianluca Moretti

unread,

Feb 14, 2017, 12:49:46 PM2/14/17

to UniMRCP

Hi Arsen, I'm try to anticipate a bit the road map.

I try to use the following Architecture:

IVR --> Unimrcp Server --> Google Speech.

Is it correct, the usage of Unimrcp as Server, to propagate only RTP vs Google speech API?

The wrapper to comunicate to Google speech within Unimrcp as to be developed as a Plugin?

Thank you in advance

Best regards

On Friday, 3 February 2017 18:29:38 UTC+1, Gianluca Moretti wrote:

Arsen Chaloyan

unread,

Feb 15, 2017, 12:20:54 AM2/15/17

to UniMRCP

Hi Gianluca,

Your understanding is correct. For further details, you may check out the following proposal.

http://www.unimrcp.org/manuals/pdf/GoogleSpeechPluginProposal.pdf

Implementation of plugin via the HTTP REST API is covered having the project completion and delivery dates set up. If you or anyone else in the group is interested in a budgetary estimate for implementation of the gRPC API or other features not included in the "basic project", contact me off-list.

Please note that any amount spent on development of a new feature or module will be available back to you in terms of commercial licenses to be offered by UniMRCP.

Questions are welcome

--

You received this message because you are subscribed to the Google Groups "UniMRCP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Levy

unread,

Apr 24, 2017, 4:36:39 PM4/24/17

to UniMRCP

The link to the UniMRCP Google Speech Plugin Proposal is very helpful. Are there similar documents for other parts of the roadmap? What about Microsoft Speech plugin? Is there a proposal for that? Is to for Azure/Bing speech or Windows Server Speech Platform?

Arsen Chaloyan

unread,

May 5, 2017, 2:33:09 PM5/5/17

to UniMRCP

While the time for the Microsoft Bing Speech API integration is yet to come, I'd like to provide a quick update on the Google Speech API plugin.

Instead of originally planned implementation of the corresponding HTTP/REST API, the gRPC streaming recognition interface is being implemented, which is a better fit for MRCP inter-operation and the most efficient option among the others.

The plugin is already in a functional stage, and I'd expect a preliminary version be available by the end of next week. If anyone is interested in testing that out, please let me know.

On Mon, Apr 24, 2017 at 1:23 PM, Michael Levy <michae...@gmail.com> wrote:

The link to the UniMRCP Google Speech Plugin Proposal is very helpful. Are there similar documents for other parts of the roadmap? What about Microsoft Speech plugin? Is there a proposal for that? Is to for Azure/Bing speech or Windows Server Speech Platform?

--
You received this message because you are subscribed to the Google Groups "UniMRCP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Joshua Gigg

unread,

May 8, 2017, 5:50:32 PM5/8/17

to UniMRCP

On Friday, 5 May 2017 19:33:09 UTC+1, Arsen Chaloyan wrote:
> While the time for the Microsoft Bing Speech API integration is yet to come, I'd like to provide a quick update on the Google Speech API plugin.
>
> Instead of originally planned implementation of the corresponding HTTP/REST API, the gRPC streaming recognition interface is being implemented, which is a better fit for MRCP inter-operation and the most efficient option among the others.
>
> The plugin is already in a functional stage, and I'd expect a preliminary version be available by the end of next week. If anyone is interested in testing that out, please let me know.
>
>
>
>
> On Mon, Apr 24, 2017 at 1:23 PM, Michael Levy <michae...@gmail.com> wrote:
> The link to the UniMRCP Google Speech Plugin Proposal is very helpful. Are there similar documents for other parts of the roadmap? What about Microsoft Speech plugin? Is there a proposal for that? Is to for Azure/Bing speech or Windows Server Speech Platform?
>
>
>
>
>
> --
>
> You received this message because you are subscribed to the Google Groups "UniMRCP" group.
>

> To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+u...@googlegroups.com.

>
> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
>
> Arsen Chaloyan
> Author of UniMRCP
> http://www.unimrcp.org

Hi,

This is something that I would be interested in helping to test.

Arsen Chaloyan

unread,

May 8, 2017, 11:32:32 PM5/8/17

to UniMRCP

To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Hi,

Very well, thanks. The plugin is more than basically ready, and I'm going to work on the packaging next. First would probably come support for Cent OS 7. I hope this is a distro you are comfortable with.

It will also take some time to put together a basic documentation, which would guide you through the installation steps.

Massimo Romano

unread,

May 12, 2017, 11:32:33 AM5/12/17

to UniMRCP

Hi Arsen,

I'm very intersted in testing this plugin because at this time i'm working on an IVR app based on Freeswitch. For a fast response, I'need to use a reliable speech recognition system in asynchronous mode, so i think the MRPCv2 <-> gRPC is the best solution. I tried both the Bing Speech API and the Google Speech API with a big preference for the latter in conjunction with the speechContexts.

Are the plugin ready to use the Google Speech Contexts ?

I usually work with Ubuntu/Debian vm's on AWS,

so I'll try to compile the plugin directly from the sources.

Thank you,

Massimo

Il giorno martedì 9 maggio 2017 05:32:32 UTC+2, Arsen Chaloyan ha scritto:

On Mon, May 8, 2017 at 2:45 PM, Joshua Gigg <gig...@gmail.com> wrote:
On Friday, 5 May 2017 19:33:09 UTC+1, Arsen Chaloyan wrote:
> While the time for the Microsoft Bing Speech API integration is yet to come, I'd like to provide a quick update on the Google Speech API plugin.
>
> Instead of originally planned implementation of the corresponding HTTP/REST API, the gRPC streaming recognition interface is being implemented, which is a better fit for MRCP inter-operation and the most efficient option among the others.
>
> The plugin is already in a functional stage, and I'd expect a preliminary version be available by the end of next week. If anyone is interested in testing that out, please let me know.
>
>
>
>
> On Mon, Apr 24, 2017 at 1:23 PM, Michael Levy <michae...@gmail.com> wrote:
> The link to the UniMRCP Google Speech Plugin Proposal is very helpful. Are there similar documents for other parts of the roadmap? What about Microsoft Speech plugin? Is there a proposal for that? Is to for Azure/Bing speech or Windows Server Speech Platform?
>
>
>
>
>
> --
>
> You received this message because you are subscribed to the Google Groups "UniMRCP" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+u...@googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
>
> Arsen Chaloyan
> Author of UniMRCP
> http://www.unimrcp.org

Hi,

This is something that I would be interested in helping to test.

--
You received this message because you are subscribed to the Google Groups "UniMRCP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Arsen Chaloyan

unread,

May 12, 2017, 12:50:38 PM5/12/17

to UniMRCP

Hi Massimo,

On Fri, May 12, 2017 at 8:10 AM, Massimo Romano <rma...@gmail.com> wrote:

Hi Arsen,

I'm very intersted in testing this plugin because at this time i'm working on an IVR app based on Freeswitch.

Thanks for your interest. Yes, the plugin would allow any MRCP-capable IVR platform, including but not limited to FreeSWITCH and Asterisk, to utilize the Google Speech Recognition service. Early test results are very promising...

For a fast response, I'need to use a reliable speech recognition system in asynchronous mode, so i think the MRPCv2 <-> gRPC is the best solution. I tried both the Bing Speech API and the Google Speech API with a big preference for the latter in conjunction with the speechContexts.

Right, I like the idea behind gRPC and how Google provides big variety of interfaces, including speech, based on their gRPC and protobuf concepts.

Are the plugin ready to use the Google Speech Contexts ?

I've made a provision on how to implement support for speech contexts in the MRCP world, where a speech context is a collection of arbitrary phrases. However, such a support will not be included in the first version of the plugin. Nonetheless, believe it or not, plain speech transcription works like a charm without any grammars defined at least for a number of languages we could test.

I usually work with Ubuntu/Debian vm's on AWS,
so I'll try to compile the plugin directly from the sources.

Red Hat 7-based binaries will hopefully be published later today for those who may want to try the plugin out before an official release. Ubuntu 16.04 binaries would be ready next week.

Please note that the source of the plugin will NOT be available, as this is going to be a commercial module affordably priced with free trials available.

More details will come later...

To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

David Taieb

unread,

May 12, 2017, 1:16:42 PM5/12/17

to uni...@googlegroups.com

Hi Massimo,
I tried Google Speech but found the few seconds of delay in response not suitable for real time speech recognition. Did you experience shorter response time?

Thanks for sharing your experience

To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+unsubscribe@googlegroups.com.

Massimo Romano

unread,

May 12, 2017, 6:16:12 PM5/12/17

to UniMRCP

@Arsen

Ok, i'm looking forward to test the trial version but i'll wait for the Ubuntu trial release.

You said :

Where a speech context is a collection of arbitrary phrases.

I understand the Speech Context, in the Cognitive and Natural Language Processing point of view, as a collection of phrases related to a specific Intent (see Watson Conversation, Google Api.ai or Amazon Lex).

Last month i tried for about 3 weeks the Bing and Google Speech API to evaluate the accuracy of the recognition in a telephony use case (8KHz LINEAR16). I concluded that for my personal purposes the speech contexts are important because they are good enough to dicriminate between subtle but fundamental alternatives.

@David

Yes, i experencied your same timing problems with Google Speech. With the same audio sample, i registred random response times between 1200 and 3700 millisecs. Conversely, Bing Speech is fast and constant, about 400-500 millisecs but the recog accuracy is lesser and for my purposes unusable @ 8KHz.

The Google timings is not suitables for realtime speech recog as I think the maximum time acceptable in a natural conversation is under 2 seconds for the overall round-robin process. I think but i'm not sure that the problem could be lead as the fact that the Google Speech Api was in Beta version or as the fact that the Machine Learning model based on Tensorflow behind the scenes is (very much) computationally intensive.

I'm asking our (italian) Regional Google Product Manager for the Speech API about these issues ...

Massimo

Arsen Chaloyan

unread,

May 12, 2017, 9:51:53 PM5/12/17

to UniMRCP

Massimo,

Please see my comments below.

On Fri, May 12, 2017 at 3:16 PM, Massimo Romano <rma...@gmail.com> wrote:

@Arsen
Ok, i'm looking forward to test the trial version but i'll wait for the Ubuntu trial release.

Thanks. Ubuntu binaries will be available by mid of next week.

You said :

Where a speech context is a collection of arbitrary phrases.

I understand the Speech Context, in the Cognitive and Natural Language Processing point of view, as a collection of phrases related to a specific Intent (see Watson Conversation, Google Api.ai or Amazon Lex).
Last month i tried for about 3 weeks the Bing and Google Speech API to evaluate the accuracy of the recognition in a telephony use case (8KHz LINEAR16). I concluded that for my personal purposes the speech contexts are important because they are good enough to dicriminate between subtle but fundamental alternatives.

Agreed, for subtle alternatives, specifying a speech context would be required. I wanted to mention that the accuracy of generic speech transcription was far beyond my expectations.

@David
Yes, i experencied your same timing problems with Google Speech. With the same audio sample, i registred random response times between 1200 and 3700 millisecs. Conversely, Bing Speech is fast and constant, about 400-500 millisecs but the recog accuracy is lesser and for my purposes unusable @ 8KHz.
The Google timings is not suitables for realtime speech recog as I think the maximum time acceptable in a natural conversation is under 2 seconds for the overall round-robin process. I think but i'm not sure that the problem could be lead as the fact that the Google Speech Api was in Beta version or as the fact that the Machine Learning model based on Tensorflow behind the scenes is (very much) computationally intensive.
I'm asking our (italian) Regional Google Product Manager for the Speech API about these issues ...

I think the geographical location also matters here. In my experiments made from the servers located in the US, the average response time is about 2 sec.

To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Arsen Chaloyan

unread,

May 12, 2017, 10:00:58 PM5/12/17

to UniMRCP

To users willing to test an early version of the Google Speech Recognition plugin on RedHat/CentOS 7:

please follow the instructions provided in the manual below

http://unimrcp.org/manuals/pdf/GSRRPMInstallationManual.pdf

Everything should be functional at this stage. Though, this is not an official release yet.

Further details will be available in the usage guide. Meantime, questions/suggestions are welcome.

Arsen Chaloyan

unread,

May 16, 2017, 10:57:10 PM5/16/17

to UniMRCP

Ubuntu 16.04 LTS binaries for the Google Speech Recognition plugin are available as well. Please follow the installation instructions below.

http://unimrcp.org/manuals/pdf/GSRDebInstallationManual.pdf

Next comes a usage guide and a corresponding page on the website. The official release is due next week, unless any blocking issues are identified.

Massimo Romano

unread,

May 17, 2017, 5:14:42 AM5/17/17

to UniMRCP

Thank you Arsen,

i'm new to UniMRCP, so i'm starting to read the GSR plugin and the UniMRCP developers docs.

UniMRCP is really a very interesting work.

Now, I would like to execute some real-time DSP analysis in a separate c pthread.

I would also like to control either the stream passed to your GSP plugin and the (async ?) results returned from Google Speech.

What is, in your opinion, the best way to do this with UniMRCP ?

Ideally, for rapid prototyping, I would like to work in c/c++ for the real-time analysis and in Node.js for events handling.

Does UniMRCP have a mechanism like the Freeswitch mod_event_socket to control the execution flow from an external program ?

Thank you again for your great work,

--
Arsen Chaloyan
Author of UniMRCP
http://www.unimrcp.org

Arsen Chaloyan

unread,

May 17, 2017, 3:48:43 PM5/17/17

to UniMRCP

Massimo,

Please see my comment below.

On Wed, May 17, 2017 at 2:14 AM, Massimo Romano <rma...@gmail.com> wrote:

Thank you Arsen,

i'm new to UniMRCP, so i'm starting to read the GSR plugin and the UniMRCP developers docs.
UniMRCP is really a very interesting work.

Thanks for your interest.

Now, I would like to execute some real-time DSP analysis in a separate c pthread.
I would also like to control either the stream passed to your GSP plugin and the (async ?) results returned from Google Speech.
What is, in your opinion, the best way to do this with UniMRCP ?

The Google Speech Recognition plugin is implemented for the UniMRCP server, which provides generic MRCP v1 and v2 interface to clients. You may utilize the UniMRCP client stack to implement a [test] application based on your requirements.

http://unimrcp.org/index.php/solutions/client

Ideally, for rapid prototyping, I would like to work in c/c++ for the real-time analysis and in Node.js for events handling.
Does UniMRCP have a mechanism like the Freeswitch mod_event_socket to control the execution flow from an external program ?

UniMRCP itself is not an IVR platform, but mainly an implementation of the MRCP standard, both client and server. If you are looking for a Node.js integration, then I suppose you may use a FreeSWITCH ESL extension for Node.js. Alternatively, you may also extend the SWIG wrapper of the UniMRCP client interface.

https://github.com/unispeech/swig-wrapper

Hope this helps.

To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Arsen Chaloyan

unread,

May 19, 2017, 12:10:55 AM5/19/17

to UniMRCP

Intermediate update on this subject.

Based on your ongoing feedback, I've uploaded an improved version of the Google Speech Recognition plugin.

Since this is still a preliminary version, package names/version numbers have not been changed. Here are a few receipts for those who want to upgrade.

CentOS 7

yum remove unimrcp-gsr
yum install unimrcp-gsr

Ubuntu 16.04 LTS

sudo apt-get remove unimrcp-gsr
sudo apt-get update
sudo apt-get install unimrcp-gsr

As a result, you will end up with the same version of the unimrcp-gsr package, having the upgraded umsgsr.so underneath.

Also, here are some hints how to initiate recognition from well-known telephony platforms.

FreeSWITCH

Asterisk

MRCPRecog("builtin:speech/transcribe",t=5000&b=1&ct=0.7&spl=en-US&f=beep)

If you have any questions or need further assistance, feel free to post you questions here or send them directly to me.

Reply all

Reply to author

Forward