John
unread,Jun 25, 2010, 4:04:15 PM6/25/10Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to VocalKit
Hey All,
I apologize in advance for such a long post and if there is
documentation of which I am ignorant which answers all of this. I was
wondering if we can discuss or if someone has information regarding
the various files that the VocalKit consumes during initiation and how
they relate to the various dictionaries that are included in the
sample.
THE FILES
In the source code I checked out a couple days ago, I see the
following 3 files and 1 directory being passed into the VKController
during initialization (i.e.VocalKitTestViewController:viewDidLoad()): /
model/pocketsphinx.conf, /model/lm/en_US/cmu07a.dic, /model/lm/en_US/
wsj0vp.5000.DMP and /model/hmm/hub4wsj_sc_8k/
FILE: pocketsphinx.conf
This seems to be a text file containing key/value pairs used to
configure pocketsphinx. I have been able to figure out the behaviors
for some of the keys given. Please correct and expand.
========================
-fwdflat no /* UNKNOWN */
-bestpath no /* if yes, will run bestpath (Dijkstra) search over word
lattice (3rd pass) - not sure what a word lattice is and how the
shortest path through it would affect the outcome and performance */
-nfft 512 /* UNKNOWN */
-lowerf 1 /* UNKNOWN */
-upperf 4000 /* UNKNOWN */
-samprate 8000 /* I assume this is the sampling rate of the audio
recording which is analyzed */
-nfilt 20 /* UNKNOWN */
-transform dct /* UNKNOWN */
-round_filters no /* UNKNOWN */
-remove_dc yes /* UNKNOWN */
/* Are there other options not listed by default? */
========================
FILE: cmu07a.dic
This seems to be a text file which lists all the words understood with
the US English dictionary and some sort of phonetic notation denoting
pronunciation. The pocketsphinx documentation refers to it as a
"pronunciation dictionary (lexicon) input file." Is this file used
for determining text from speech or for generating speech from text?
Is there a link with information about the phonetic notation used? Is
this file interchangeable with /model/lm/raven/0407.dic when switching
to the RAVEN dictionary?
FILE: wsj0vp.5000.DMP
It seems this is a binary file used to set the "word trigram language
model input file." Can anyone explain this or point me to
documentation? Is this file related to or interchangeably with /lm/
raven/0407.lm when switching to the RAVEN dictionary even though the
latter is a text file with a different extension?
DIRECTORY: hub4wsj_sc_8k
According to the documentation, this is the directory containing
acoustic model files. The "hmm" in the configuration flag and the
file-system path leads me to believe the files in this directory are
related to the Hidden Markov Model. Can anyone explain these files
and/or a high-level explanation of the Hidden Markov model in
relationship to speech recognition?
PROPER SUBSTITUTIONS AND MODIFICATION
In this group, I have seen references to 2 dictionaries; the WSJ and
the one based on the poem The Raven. However, there are 3 directories
under the model directory where the dictionaries seem to be saved;
raven, wsj and en_US. Is en_US a third dictionary or a shared
resource of some sort? What files should I switch during the
initialization to change which dictionary my app will use? How would
I generate my own dictionary?
Thanks and sorry again for such a length post,
-john