If You develop a commercial app and needs ASR support, then you can get it almost out-of-the-box using PocketSphinx. This is especially true if you use the command line tool pocketsphinx_continuous(.exe). Four points:
1. PocketSphinx comes with a high-quality, trained, generic US-English acoustic model. If you need a general US-English ASR support in your app, you don't need to create you own model, and you actually shouldn't, unless you need a more domain-specific model.
2. PocketSphinx works in one of three possible "modes": grammar, KWS, and statistical LM. Creating a grammar or KWS file is very simple. In many scenarios creating a statistical LM is also very simple. You can upload your text to the online
lmtool (
www.speech.cs.cmu.edu/tools/lmtool-new.html), and it will generate for you bi-grams and tri-grams.
3. pocketsphinx_continuous.exe is pre-configured. You can use command-line arguments to further configure it, but in many scenarios you don't need to. So you don't need deep understanding of how ASR works. And you also get probability scores for the hypothesis.
4. When you ship your app, you simple add the few binaries of PocketSphinx.
My question is how Kaldi compares to PocketSphinx in these points. I do understand that Kaldi is a big and complex system targeted the research community, and I know that it is built of numerous binaries and scripts. But I would like to ask if it is possible to use it in a simple way, as with PocketSphinx, especially when you don't need to create your own acoustic model or special configuration.