[on-asterisk] Survey: Speech Recognition - Good, Bad, Ugly?

27 views
Skip to first unread message

Jim Van Meggelen

unread,
Jul 24, 2015, 8:54:07 AM7/24/15
to TAUG Technical
I'm curious about people's experience with speech recognition development.

For myself, I'd like to be able to use an external service of some sort
(similar to google's API), supply it a grammar (i.e. list of names from a
voicemail directory), and have it be supported by a company who has a good
knowledge of speech recognition (i.e. their recognition engine is robust
and gives reliable results).

I would so love to use google's speech API, but it's not really
production-ready, as google doesn't support it for commercial use (there's
a quota and you can't supply custom grammar so far as I've been able to
tell). Still, it's an excellent engine in so many ways.

There's always been Sphinx, but I have never had the impression this has
gone far beyond the academic, and from what I've seen it's not something to
be taken on lightly.

Then there are the commercial products, but I really can't figure out how
to choose amongst them.

Anybody had any good experiences they'd be willing to share?


Jim

Jim Van Meggelen

unread,
Jul 24, 2015, 9:05:40 AM7/24/15
to TAUG Technical

Nabeel Jafferali

unread,
Jul 24, 2015, 10:50:06 AM7/24/15
to TAUG Technical
This was posted to the asterisk-biz list a few weeks ago:

http://www.speechaas.com/#/home

--
Nabeel Jafferali

On Fri, Jul 24, 2015 at 9:05 AM, Jim Van Meggelen <jim.van...@gmail.com

Lloyd Aloysius

unread,
Jul 24, 2015, 11:19:13 AM7/24/15
to Nabeel Jafferali, TAUG Technical
The following was recorded on year 2013 ClueCon conference(FreeSWITCH)

https://www.youtube.com/watch?v=viJxyyDaJoA

Waterloo based company - http://www.vestec.com/

Lloyd

On Fri, Jul 24, 2015 at 10:49 AM, Nabeel Jafferali <nab...@jaffera.li>
wrote:

Jim Van Meggelen

unread,
Jul 24, 2015, 11:47:45 AM7/24/15
to Nabeel Jafferali, TAUG Technical
It looks interesting, but doesn't answer any questions about who is using
it, and who is standing behind it, and what it costs, and how it would be
supported.

OK for experimental work, perhaps, but I couldn't feel comfortable using
that in a production environment.

The thing that really blows me away about google is how well their speech
rec works. I use it on my 'droid all the time, and it just totally stuns me
how good it is (especially in noisy conditions, which it almost seems to
ignore). I just wish they had a more formal support for it (i.e. I could
pay them and they would officially agree to support the particular
application of it). Also, that they would support me sending them valid
grammar so that I could narrow down to a few dozen words what would be
valid for that API call.

Nevertheless, thanks for the info!


On Fri, Jul 24, 2015 at 10:49 AM, Nabeel Jafferali <nab...@jaffera.li>
wrote:

Jim Van Meggelen

unread,
Jul 24, 2015, 1:44:34 PM7/24/15
to Lloyd Aloysius, Nabeel Jafferali, TAUG Technical
Thanks Lloyd, that's really interesting. I've reached out to them to see
what they're up to these days.

Jim


On Fri, Jul 24, 2015 at 11:18 AM, Lloyd Aloysius <lloyd.a...@gmail.com>
wrote:

Leif Madsen

unread,
Sep 17, 2015, 10:23:24 AM9/17/15
to Jim Van Meggelen, Lloyd Aloysius, Nabeel Jafferali, TAUG Technical
I did some research in this area a while ago, and came up with these three
service providers. It might not be exactly what you're looking for, but
might fill some niches.


- Clarify (http://clarify.io/)
- VoiceBase (http://www.voicebase.com/public/)
- VoiceCloud (http://www.voicecloud.com/)


On 24 July 2015 at 13:44, Jim Van Meggelen <jim.van...@gmail.com>
--
Leif Madsen.
http://www.leifmadsen.com
http://www.oreilly.com/catalog/asterisk

John Lange

unread,
Sep 17, 2015, 10:35:13 AM9/17/15
to Leif Madsen, Jim Van Meggelen, Lloyd Aloysius, Nabeel Jafferali, TAUG Technical
I have no experience with it, but Azure has a speech recognition service
via an API.

http://datamarket.azure.com/dataset/bing/speechrecognition

One implementation of it is the "Speech Recognition Control" which is a
small app that installs on Windows and lets Windows developers use it for
native Windows apps.

http://datamarket.azure.com/dataset/bing/speechcontrol

I also assume that this is the speech engine that Cortana uses in
Windows10, and that Skype uses for it's real-time speech translation
services, in which case, Microsoft will be investing heavily in making it
work well.

http://www.skype.com/en/translator-preview/

In my experience, the biggest factor in accurate translation is quality of
the source audio. Calls from cell phones don't translate well, while HD
audio is nearly 100%.

John
--
John Lange
www.johnlange.ca
Reply all
Reply to author
Forward
0 new messages