about some differences between PocketSphinx and Kaldi

oren...@gmail.com

unread,

Oct 25, 2016, 8:20:31 PM10/25/16

to kaldi-help

If You develop a commercial app and needs ASR support, then you can get it almost out-of-the-box using PocketSphinx. This is especially true if you use the command line tool pocketsphinx_continuous(.exe). Four points:

1. PocketSphinx comes with a high-quality, trained, generic US-English acoustic model. If you need a general US-English ASR support in your app, you don't need to create you own model, and you actually shouldn't, unless you need a more domain-specific model.

2. PocketSphinx works in one of three possible "modes": grammar, KWS, and statistical LM. Creating a grammar or KWS file is very simple. In many scenarios creating a statistical LM is also very simple. You can upload your text to the online lmtool (www.speech.cs.cmu.edu/tools/lmtool-new.html), and it will generate for you bi-grams and tri-grams.

3. pocketsphinx_continuous.exe is pre-configured. You can use command-line arguments to further configure it, but in many scenarios you don't need to. So you don't need deep understanding of how ASR works. And you also get probability scores for the hypothesis.

4. When you ship your app, you simple add the few binaries of PocketSphinx.

My question is how Kaldi compares to PocketSphinx in these points. I do understand that Kaldi is a big and complex system targeted the research community, and I know that it is built of numerous binaries and scripts. But I would like to ask if it is possible to use it in a simple way, as with PocketSphinx, especially when you don't need to create your own acoustic model or special configuration.

Daniel Povey

unread,

Oct 25, 2016, 8:24:26 PM10/25/16

to kaldi-help

Kaldi is more geared towards people who need high-quality ASR and are building their own models for their own domain. It is definitely not as simple to use as pocketsphinx and in particular the windows support is nowhere near as easy.

Dan

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

oren...@gmail.com

unread,

Oct 26, 2016, 10:08:57 AM10/26/16

to kaldi-help

Can you give me some more details? Does the English acoustic model that comes with Kaldi is of high quality and can I use it instead of creating my own model? Or can I use PocketSphinx acoustic model with Kaldi?

Message has been deleted

oren...@gmail.com

unread,

Oct 26, 2016, 10:13:46 AM10/26/16

to kaldi-help

Can you give me some more details? Does the English acoustic model that comes with Kaldi is of high quality and can I use it instead of creating my own model? Or can I use PocketSphinx acoustic model with Kaldi?

On Wednesday, October 26, 2016 at 2:20:31 AM UTC+2, oren...@gmail.com wrote:

Jan Trmal

unread,

Oct 26, 2016, 11:33:11 AM10/26/16

to kaldi-help

You can browse the egs directory to get sense what WER kaldi offers for given corpus. Usually, it's on the par with the state of the art results reported by IBM or MSR. I don't have any ideas what is the performance of the pocketsphinx models.

You should, however, understand, that while pocketsphinx is geared towards embedded applications (and is able to work in fixed point math, which is important for many of the small devices), kaldi does not support this mode of operation (as no-one implemented it yet).