Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

179 views
Skip to first unread message

Andre Natal

unread,
Oct 30, 2014, 7:18:56 PM10/30/14
to dev-pl...@lists.mozilla.org, Sandip Kamat, Olli.Pettay
I've been researching speech recognition in Firefox for two years. First
SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
[1] embedded in Gecko C++ layer, project that I had the luck to develop for
Google Summer of Code with the mentoring of Olli Pettay, Guilherme
Gonçalves, Steven Lee, Randell Jesup plus others and with the management of
Sandip Kamat.

The implementation already works in B2G, Fennec and all FF desktop
versions, and the first language supported will be english. The API and
implementation are in conformity with W3C standard [2]. The preference to
enable it is: media.webspeech.service.default = pocketsphinx

The required patches for achieve this are:

- Import pocketsphinx sources in Gecko. Bug 1051146 [3]
- Embed english models. Bug 1065911 [4]
- Change SpeechGrammarList to store grammars inside SpeechGrammar objects.
Bug 1088336 [5]
- Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6]


Also, other important features that we don't have patches yet:
- Relax VAD strategy to be les strict and avoid stop in the middle of
speech when speaking low volume phonemes [7]
- Integrate or develop a grapheme to phoneme algorithm to realtime
generator when compiling grammars [8]
- Inlcude and build models for other languages [9]
- Continuous and wordspotting recognition [10]

The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
has more detailed info [13].

At this comment you can see a cpu usage on flame while recognition is
happening [14]

I wish to hear your comments.

Thanks,

Andre Natal

[1] http://cmusphinx.sourceforge.net/
[2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
[5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
[6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
[7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
[8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
[9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
[10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
[11] https://github.com/andrenatal/gecko-dev
[12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump
to 12:00)
[13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
[14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14

Nick Alexander

unread,
Oct 30, 2014, 7:37:01 PM10/30/14
to dev-pl...@lists.mozilla.org
On 2014-10-30, 4:18 PM, Andre Natal wrote:
> I've been researching speech recognition in Firefox for two years. First
> SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
> [1] embedded in Gecko C++ layer, project that I had the luck to develop for
> Google Summer of Code with the mentoring of Olli Pettay, Guilherme
> Gonçalves, Steven Lee, Randell Jesup plus others and with the management of
> Sandip Kamat.
>
> The implementation already works in B2G, Fennec and all FF desktop
> versions, and the first language supported will be english. The API and
> implementation are in conformity with W3C standard [2]. The preference to
> enable it is: media.webspeech.service.default = pocketsphinx

First, Andre, let me offer my congratulations on getting this project to
this point. We've talked a few times and I've always been impressed.

Can you point me at Fennec try builds? I vaguely recall that these
speech recognition approaches require large pattern matching files, and
I'd like to see what including the Speech API does to the Fennec APK
size. We're pushing pretty hard on reducing our APK size right now
because we believe it's a big barrier to entry and especially to
upgrading older devices.

Nick

smaug

unread,
Oct 30, 2014, 8:21:06 PM10/30/14
to Andre Natal, Sandip Kamat
Intent to ship is too strong for this.
We need to first have implementation landed and tested ;)

I wouldn't ship the implementation in desktop FF without plenty of more testing.



-Olli

smaug

unread,
Oct 30, 2014, 8:24:24 PM10/30/14
to Andre Natal, Sandip Kamat
On 10/31/2014 02:21 AM, smaug wrote:
> Intent to ship is too strong for this.
> We need to first have implementation landed and tested ;)
>
> I wouldn't ship the implementation in desktop FF without plenty of more testing.
>

But I guess the question is what people think about shipping the pocketspinx + API, even if disabled by default.

Andre, we need some numbers here. How much does Pocketsphinx increase binary size? or download size?
When the pref is enabled, how much does it use memory on desktop, what about on b2g?

Chris Hofmann

unread,
Oct 30, 2014, 8:46:50 PM10/30/14
to dev-pl...@lists.mozilla.org
On 10/30/14 5:24 PM, smaug wrote:
> On 10/31/2014 02:21 AM, smaug wrote:
>> Intent to ship is too strong for this.
>> We need to first have implementation landed and tested ;)
>>
>> I wouldn't ship the implementation in desktop FF without plenty of
>> more testing.
>>
>
> But I guess the question is what people think about shipping the
> pocketspinx + API, even if disabled by default.
>
> Andre, we need some numbers here. How much does Pocketsphinx increase
> binary size? or download size?
> When the pref is enabled, how much does it use memory on desktop, what
> about on b2g?
>
>
This is important work and the competition is ramping quicky after many
years of promises about this year being the year of voice recognition.
We will probably fall behind quickly if we don't get something going
here in the next year.

Can you also talk a bit about what the plan and set of challenges look
like for expanding the supported languages, and how these would impact
the numbers ollie has asked for?

The place we really need this is b2g, but phones are only shipping in
international markets right now so english only is not all that helpful.

-chofmann
> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform

Mark Hammond

unread,
Oct 30, 2014, 9:50:10 PM10/30/14
to
On 31/10/2014 11:45 AM, Chris Hofmann wrote:
> The place we really need this is b2g, but phones are only shipping in
> international markets right now so english only is not all that helpful.

While this doesn't change the point you are making in any way, FWIW,
Firefox OS phones are on sale in Australia via one of our largest
electronics retailers:

https://www.jbhifi.com.au/phones/Outright-Mobile-Handsets/zte/zte-open-c-handset-grey/624980/

http://www.gizmodo.com.au/2014/10/jb-hi-fi-is-now-selling-australias-first-firefox-os-phone/

Nice!

Mark

Marco Chen

unread,
Oct 30, 2014, 10:28:50 PM10/30/14
to Andre Natal, Sandip Kamat, Olli.Pettay, dev-pl...@lists.mozilla.org
Hi Andre,

It is a nice work and expect the voice recognition on B2G.

Beside this final result, I am also interesting in the reason of you migrate from SpeechRTC -> emscripten -> Web Speech API.
Could you also share what is the factor triggered these transition? Then that can be the lesson learn for us.

ex: SpeechRTC -> voice recognition can't be performed on local.
emscripten -> performance issue? or license issue? or ?

Thanks,
Sincerely yours.

----- Original Message -----

Chris Mills

unread,
Nov 3, 2014, 6:59:15 AM11/3/14
to Marco Chen, Andre Natal, dev-pl...@lists.mozilla.org, Olli.Pettay, Sandip Kamat
Awesome to see this mail, Andre!

And remember that we do have the pages set up on MDN ready to be filled in also.

https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API

Once this is shipped, do you think we can find some time to start collaborating on these docs?

Chris Mills
Senior tech writer || Mozilla
developer.mozilla.org || MDN
cmi...@mozilla.com || @chrisdavidmills

Andre Natal

unread,
Nov 8, 2014, 11:21:41 PM11/8/14
to Nick Alexander, dev-pl...@lists.mozilla.org
Thanks Nick, I appreciate your help.

I created two versions of Fennec apk: one [1] with the english models
bundled (43.7 mb), and other [2] without it (34.6mb). This was the
mozconfig I used [3]

Actually, I had a conversation with Jonas Sicking some months ago and we
agreed that the ideal scenario about this is to allow the user to download
the package for the language he prefer from some sort of preferences
screen, instead ship them bundled into the apk.


[1]
https://www.dropbox.com/s/6snv6e3mqqcs4zi/fennec-34.0a1.en-US.android-arm.apk?dl=0
[2]
https://www.dropbox.com/s/zxxop34unj21r1s/fennec-35.0a1.en-US.android-arm.apk?dl=0
[3]
#DEBUG
#ac_add_options --enable-debug
#ac_add_options --enable-trace-malloc
#ac_add_options --enable-accessibility
#ac_add_options --enable-signmar
ac_add_options --disable-tests

# android options
ac_add_options --enable-application=mobile/android
ac_add_options --with-android-ndk="/Volumes/extra/android-ndk-r8e/"
ac_add_options
--with-android-sdk="/Volumes/extra/android-sdk-macosx/platforms/android-19/"

# FOR ARM
ac_add_options --target=arm-linux-androideabi
mk_add_options MOZ_OBJDIR=./obj-arm-linux-androideabi-debug


# FOR 386
#ac_add_options --target=i386-linux-android
#mk_add_options MOZ_OBJDIR=./objdir-droid-i386

On Thu, Oct 30, 2014 at 9:36 PM, Nick Alexander <nalex...@mozilla.com>
wrote:

> On 2014-10-30, 4:18 PM, Andre Natal wrote:
>
>> I've been researching speech recognition in Firefox for two years. First
>> SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
>> [1] embedded in Gecko C++ layer, project that I had the luck to develop
>> for
>> Google Summer of Code with the mentoring of Olli Pettay, Guilherme
>> Gonçalves, Steven Lee, Randell Jesup plus others and with the management
>> of
>> Sandip Kamat.
>>
>> The implementation already works in B2G, Fennec and all FF desktop
>> versions, and the first language supported will be english. The API and
>> implementation are in conformity with W3C standard [2]. The preference to
>> enable it is: media.webspeech.service.default = pocketsphinx
>>
>
> First, Andre, let me offer my congratulations on getting this project to
> this point. We've talked a few times and I've always been impressed.
>
> Can you point me at Fennec try builds? I vaguely recall that these speech
> recognition approaches require large pattern matching files, and I'd like
> to see what including the Speech API does to the Fennec APK size. We're
> pushing pretty hard on reducing our APK size right now because we believe
> it's a big barrier to entry and especially to upgrading older devices.
>
> Nick

Andre Natal

unread,
Nov 8, 2014, 11:51:36 PM11/8/14
to smaug, dev-pl...@lists.mozilla.org, Sandip Kamat
Hi Olli,


> How much does Pocketsphinx increase binary size? or download size?

In the past was suggested to avoid ship the models with packages, but yes
to create a preferences panel in the apps to allow the user to download the
models he wants to.

About the size of pocketsphinx libraries itself, in mac os, they sum ~ 2.3
mb [1]. I don't know which type of compression the build system does when
compiling/packaging, but should be efficient enough.

[1]
MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
/usr/local/lib/libsphinxbase.a
2184 -rw-r--r-- 1 root admin 1114840 Jul 7 14:39
/usr/local/lib/libsphinxbase.a
MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
/usr/local/lib/libpocketsphinx.a
2352 -rw-r--r-- 1 root admin 1201240 Jul 7 14:52
/usr/local/lib/libpocketsphinx.a



When the pref is enabled, how much does it use memory on desktop, what
> about on b2g?
>
>
>
On b2g, it uses memory only after the decoder be activated and loaded the
models. I did a profile in Zte Open C and here is the report [2] and here
the exact snapshot [3]. Seems ~ 21 mb is used after load the models.

In desktop mac os Nightly, the memory usage was of ~11mb.

[2] https://www.dropbox.com/s/cf1drl3thkf6mp1/memory-reports?dl=0
[3] https://www.dropbox.com/s/1rt6z9t5h30whn0/Vaani_b2g_openc.png?dl=0

Andre Natal

unread,
Nov 9, 2014, 12:34:03 AM11/9/14
to chof...@mozilla.org, Sandip Kamat, dev-pl...@lists.mozilla.org
Hi Chris.

For new languages, after the decoder get integrated inside gecko, you only
need to build new models (acoustic and language), since the decoder is
language agnostic.

The procedure of model building is the same for every language: in pretty
big picture, you need to record thousands of hours of spoken phrases
covering all phones of the aimed language from people of different genders
age, regions, accents and etc... all this data is compiled and transformed
in the acoustic model.

For the language model, you need to build a phonetic dictionary for that
language, to then allow tools that do grapheme-to-phoneme (like
phonetisaurus [1], e.g.) generate real-time phonetic representations of the
words input in your grammar.

Build models it is not a trivial task, and requires a closer work between
speech engineers and linguists.

Pocketsphinx offers some models besides English [2] and they have useful
tutorials about acoustic [3] and language [4] model creation.

Thanks,

Andre

[1] https://code.google.com/p/phonetisaurus/
[2]
http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/
[3] http://cmusphinx.sourceforge.net/wiki/tutorialam?s[]=acoustic&s[]=models
[4] http://cmusphinx.sourceforge.net/wiki/tutoriallm



On Thu, Oct 30, 2014 at 10:45 PM, Chris Hofmann <chof...@mozilla.com>
wrote:

> On 10/30/14 5:24 PM, smaug wrote:
>
>> On 10/31/2014 02:21 AM, smaug wrote:
>>
>>> Intent to ship is too strong for this.
>>> We need to first have implementation landed and tested ;)
>>>
>>> I wouldn't ship the implementation in desktop FF without plenty of more
>>> testing.
>>>
>>>
>> But I guess the question is what people think about shipping the
>> pocketspinx + API, even if disabled by default.
>>
>> Andre, we need some numbers here. How much does Pocketsphinx increase
>> binary size? or download size?
>> When the pref is enabled, how much does it use memory on desktop, what
>> about on b2g?
>>
>>
>> This is important work and the competition is ramping quicky after many
> years of promises about this year being the year of voice recognition. We
> will probably fall behind quickly if we don't get something going here in
> the next year.
>
> Can you also talk a bit about what the plan and set of challenges look
> like for expanding the supported languages, and how these would impact the
> numbers ollie has asked for?
>
> The place we really need this is b2g, but phones are only shipping in
> international markets right now so english only is not all that helpful.
>
> -chofmann

Andre Natal

unread,
Nov 9, 2014, 12:40:24 AM11/9/14
to Chris Mills, dev-pl...@lists.mozilla.org
Thank you Chris, sure we can do it!

Here we have a straightforward page with all objects and methods for the
Speech API we are aiming to do:

https://github.com/andrenatal/webspeechapi/blob/gh-pages/index_clean.html

Maybe we can start from it.

Thanks!

Andre

Andre Natal

unread,
Nov 9, 2014, 8:12:35 AM11/9/14
to Marco Chen, Sandip Kamat, Olli. Pettay, dev-pl...@lists.mozilla.org
Hi Marco.

SpeechRTC was my first tentative with the platform. At early 2013 neither I
had enough knowledge about gecko internals as even b2g was at very early
stage (in the very beggining, Steven Lee needed to send me patches to gum
work properly), so the fastest path was capture and stream online. The
great part is that opus is pretty efficient plus nodejs + a speech server
wrapping pocketsphinx turned the whole roundtrip really fast.

But I knew that was not ideal for command and control / grammar, then I
started to research a direct port of pocketsphinx using emscripten. Did
work but three reasons made me move to a full cpp version:

1) the whole speech api frontend in gecko was ready to roll only waiting a
backend, and this, as we know was built in cpp;

2) my tests ran very well, but on peak [2] for example, performed slower
than on low end devices running android [3]

3) with emscripten, the model loading inside decoder's creation at each
reload ended very slow and I couldn't figure out how to keep the decoder
instance between tabs and reloads while in cpp this happens only once, due
Gecko's architecture
On Oct 31, 2014 12:27 AM, "Marco Chen" <mc...@mozilla.com> wrote:

> Hi Andre,
>
> It is a nice work and expect the voice recognition on B2G.
>
> Beside this final result, I am also interesting in the reason of you
> migrate from SpeechRTC -> emscripten -> Web Speech API.
> Could you also share what is the factor triggered these transition? Then
> that can be the lesson learn for us.
>
> ex: SpeechRTC -> voice recognition can't be performed on local.
> emscripten -> performance issue? or license issue? or ?
>
> Thanks,
> Sincerely yours.
>
> ------------------------------
> *From: *"Andre Natal" <ana...@gmail.com>
> *To: *dev-pl...@lists.mozilla.org, "Sandip Kamat" <ska...@mozilla.com>,
> "Olli.Pettay" <ope...@mozilla.com>
> *Sent: *Friday, October 31, 2014 7:18:06 AM
> *Subject: *Intent to ship: Web Speech API - Speech Recognition with

Andre Natal

unread,
Nov 9, 2014, 8:14:46 AM11/9/14
to Marco Chen, Sandip Kamat, Olli. Pettay, dev-pl...@lists.mozilla.org
Sorry, I forgot the links:

2 - Speechrtc offline on Firefox OS (Peak): http://youtu.be/FXKXhrRDEb8

3 - Continuous speech recognition on android with poc…:
http://youtu.be/3lTtCFaQF2A

Sandip Kamat

unread,
Nov 14, 2014, 6:36:31 PM11/14/14
to Andre Natal, dev-pl...@lists.mozilla.org, smaug
Hi Andre, I suggest let's update the wiki for these sizes (as well as other questions in this thread) so we can use that as a central place of info.

-Sandip

----- Original Message -----

> From: "Andre Natal" <ana...@gmail.com>
> To: "smaug" <sm...@welho.com>
> Cc: "Sandip Kamat" <ska...@mozilla.com>, dev-pl...@lists.mozilla.org
> Sent: Saturday, November 8, 2014 8:50:44 PM
> Subject: Re: Intent to ship: Web Speech API - Speech Recognition with
> Pocketsphinx

> Hi Olli,

> > How much does Pocketsphinx increase binary size? or download size?

> In the past was suggested to avoid ship the models with packages, but yes to
> create a preferences panel in the apps to allow the user to download the
> models he wants to.

> About the size of pocketsphinx libraries itself, in mac os, they sum ~ 2.3 mb
> [1]. I don't know which type of compression the build system does when
> compiling/packaging, but should be efficient enough.

> [1]
> MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
> /usr/local/lib/libsphinxbase.a
> 2184 -rw-r--r-- 1 root admin 1114840 Jul 7 14:39
> /usr/local/lib/libsphinxbase.a
> MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
> /usr/local/lib/libpocketsphinx.a
> 2352 -rw-r--r-- 1 root admin 1201240 Jul 7 14:52
> /usr/local/lib/libpocketsphinx.a

> > When the pref is enabled, how much does it use memory on desktop, what
> > about
> > on b2g?
>

> On b2g, it uses memory only after the decoder be activated and loaded the
> models. I did a profile in Zte Open C and here is the report [2] and here
> the exact snapshot [3]. Seems ~ 21 mb is used after load the models.

> In desktop mac os Nightly, the memory usage was of ~11mb.

> [2] https://www.dropbox.com/s/cf1drl3thkf6mp1/memory-reports?dl=0
> [3] https://www.dropbox.com/s/1rt6z9t5h30whn0/Vaani_b2g_openc.png?dl=0

> > > -Olli
> >
>

> > > On 10/31/2014 01:18 AM, Andre Natal wrote:
> >
>

> > > > I've been researching speech recognition in Firefox for two years.
> > > > First
> > >
> >
>
> > > > SpeechRTC, then emscripten, and now Web Speech API with CMU
> > > > pocketsphinx
> > >
> >
>
> > > > [1] embedded in Gecko C++ layer, project that I had the luck to develop
> > > > for
> > >
> >
>
> > > > Google Summer of Code with the mentoring of Olli Pettay, Guilherme
> > >
> >
>
> > > > Gonçalves, Steven Lee, Randell Jesup plus others and with the
> > > > management
> > > > of
> > >
> >
>
> > > > Sandip Kamat.
> > >
> >
>

> > > > The implementation already works in B2G, Fennec and all FF desktop
> > >
> >
>
> > > > versions, and the first language supported will be english. The API and
> > >
> >
>
> > > > implementation are in conformity with W3C standard [2]. The preference
> > > > to
> > >
> >
>
> > > > enable it is: media.webspeech.service. default = pocketsphinx

Sandip Kamat

unread,
Nov 14, 2014, 6:54:21 PM11/14/14
to Andre Natal, dev-pl...@lists.mozilla.org, smaug
Hi Olli, In general for FxOS devices, the thought is to let the OEMs decide which language models they would like to ship with, preloaded. That way there is a partner choice based on regions, but also the users could directly download the packages they like. For now, since we are very early stage, we just have English support. We need help to build and test other language models in parallel.

Sandip

----- Original Message -----

> From: "Andre Natal" <ana...@gmail.com>
> To: "smaug" <sm...@welho.com>
> Cc: "Sandip Kamat" <ska...@mozilla.com>, dev-pl...@lists.mozilla.org
> Sent: Saturday, November 8, 2014 8:50:44 PM
> Subject: Re: Intent to ship: Web Speech API - Speech Recognition with
> Pocketsphinx

> Hi Olli,

> > How much does Pocketsphinx increase binary size? or download size?

> In the past was suggested to avoid ship the models with packages, but yes to
> create a preferences panel in the apps to allow the user to download the
> models he wants to.

> About the size of pocketsphinx libraries itself, in mac os, they sum ~ 2.3 mb
> [1]. I don't know which type of compression the build system does when
> compiling/packaging, but should be efficient enough.

> [1]
> MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
> /usr/local/lib/libsphinxbase.a
> 2184 -rw-r--r-- 1 root admin 1114840 Jul 7 14:39
> /usr/local/lib/libsphinxbase.a
> MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
> /usr/local/lib/libpocketsphinx.a
> 2352 -rw-r--r-- 1 root admin 1201240 Jul 7 14:52
> /usr/local/lib/libpocketsphinx.a

> > When the pref is enabled, how much does it use memory on desktop, what
> > about
> > on b2g?
>

> On b2g, it uses memory only after the decoder be activated and loaded the
> models. I did a profile in Zte Open C and here is the report [2] and here
> the exact snapshot [3]. Seems ~ 21 mb is used after load the models.

> In desktop mac os Nightly, the memory usage was of ~11mb.

> [2] https://www.dropbox.com/s/cf1drl3thkf6mp1/memory-reports?dl=0
> [3] https://www.dropbox.com/s/1rt6z9t5h30whn0/Vaani_b2g_openc.png?dl=0

> > > -Olli
> >
>

> > > On 10/31/2014 01:18 AM, Andre Natal wrote:
> >
>

> > > > I've been researching speech recognition in Firefox for two years.
> > > > First
> > >
> >
>
> > > > SpeechRTC, then emscripten, and now Web Speech API with CMU
> > > > pocketsphinx
> > >
> >
>
> > > > [1] embedded in Gecko C++ layer, project that I had the luck to develop
> > > > for
> > >
> >
>
> > > > Google Summer of Code with the mentoring of Olli Pettay, Guilherme
> > >
> >
>
> > > > Gonçalves, Steven Lee, Randell Jesup plus others and with the
> > > > management
> > > > of
> > >
> >
>
> > > > Sandip Kamat.
> > >
> >
>

> > > > The implementation already works in B2G, Fennec and all FF desktop
> > >
> >
>
> > > > versions, and the first language supported will be english. The API and
> > >
> >
>
> > > > implementation are in conformity with W3C standard [2]. The preference
> > > > to
> > >
> >
>
> > > > enable it is: media.webspeech.service. default = pocketsphinx
> > >
> >
>

> > > > The required patches for achieve this are:
> > >
> >
>

> > > > - Import pocketsphinx sources in Gecko. Bug 1051146 [3]
> > >
> >
>
> > > > - Embed english models. Bug 1065911 [4]
> > >
> >
>
> > > > - Change SpeechGrammarList to store grammars inside SpeechGrammar
> > > > objects.
> > >
> >
>
> > > > Bug 1088336 [5]
> > >
> >
>
> > > > - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148
> > > > [6]
> > >
> >
>

> > > > Also, other important features that we don't have patches yet:
> > >
> >
>
> > > > - Relax VAD strategy to be les strict and avoid stop in the middle of
> > >
> >
>
> > > > speech when speaking low volume phonemes [7]
> > >
> >
>
> > > > - Integrate or develop a grapheme to phoneme algorithm to realtime
> > >
> >
>
> > > > generator when compiling grammars [8]
> > >
> >
>
> > > > - Inlcude and build models for other languages [9]
> > >
> >
>
> > > > - Continuous and wordspotting recognition [10]
> > >
> >
>

> > > > The wip repo is here [11] and this Air Mozilla video [12] plus this
> > > > wiki
> > >
> >
>
> > > > has more detailed info [13].
> > >
> >
>

> > > > At this comment you can see a cpu usage on flame while recognition is
> > >
> >
>
> > > > happening [14]
> > >
> >
>

> > > > I wish to hear your comments.
> > >
> >
>

> > > > Thanks,
> > >
> >
>

> > > > Andre Natal
> > >
> >
>

> > > > [1] http://cmusphinx.sourceforge. net/
> > >
> >
>
> > > > [2] https://dvcs.w3.org/hg/speech- api/raw-file/tip/speechapi. html
> > >
> >
>
> > > > [3] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051146
> > >
> >
>
> > > > [4] https://bugzilla.mozilla.org/ show_bug.cgi?id=1065911
> > >
> >
>
> > > > [5] https://bugzilla.mozilla.org/ show_bug.cgi?id=1088336
> > >
> >
>
> > > > [6] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051148
> > >
> >
>
> > > > [7] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051604
> > >
> >
>
> > > > [8] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051554
> > >
> >
>
> > > > [9] https://bugzilla.mozilla.org/ show_bug.cgi?id=1065904 and
> > >
> >
>
> > > > https://bugzilla.mozilla.org/ show_bug.cgi?id=1051607
> > >
> >
>
> > > > [10] https://bugzilla.mozilla.org/ show_bug.cgi?id=967896
> > >
> >
>
> > > > [11] https://github.com/andrenatal/ gecko-dev
> > >
> >
>
> > > > [12] https://air.mozilla.org/ mozilla-weekly-project- meeting-20141027/
> > > > (Jump
> > >
> >
>
> > > > to 12:00)
> > >
> >
>

Andre Natal

unread,
Nov 19, 2014, 2:16:50 AM11/19/14
to chof...@mozilla.org, dev-pl...@lists.mozilla.org
Chris,

I was discussing with sphinx leaders and we can build models from
audiobooks as well.

This approach saves a lot of time and enhances the quality since the
narrative is well accurate and clear.

We are currently defining a way to create hindi and brazilian portuguese
models.

Thanks

Andre
On Oct 30, 2014 5:47 PM, "Chris Hofmann" <chof...@mozilla.com> wrote:

> On 10/30/14 5:24 PM, smaug wrote:
>
>> On 10/31/2014 02:21 AM, smaug wrote:
>>
>>> Intent to ship is too strong for this.
>>> We need to first have implementation landed and tested ;)
>>>
>>> I wouldn't ship the implementation in desktop FF without plenty of more
>>> testing.
>>>
>>>
>> But I guess the question is what people think about shipping the
>> pocketspinx + API, even if disabled by default.
>>
>> Andre, we need some numbers here. How much does Pocketsphinx increase
>> binary size? or download size?
>> When the pref is enabled, how much does it use memory on desktop, what
>> about on b2g?
>>
>>
>> This is important work and the competition is ramping quicky after many
> years of promises about this year being the year of voice recognition. We
> will probably fall behind quickly if we don't get something going here in
> the next year.
>
> Can you also talk a bit about what the plan and set of challenges look
> like for expanding the supported languages, and how these would impact the
> numbers ollie has asked for?
>
> The place we really need this is b2g, but phones are only shipping in
> international markets right now so english only is not all that helpful.
>
> -chofmann
>
>
>>>
Message has been deleted
Message has been deleted
0 new messages