1) It appears that this would require very real-time continuous
dictation, something that I'm not sure pocketsphinx is up for. There
is some code to support continuous dictation, but I know this can suck
some CPU time. And while it can do "Continuous" dictation, it will
not have a real-time response. Decoding one word on an iPhone 4 is
rarely under a second. Also, I imagine if the notes were held for a
while, they would probably be goofed a bit.
2) It would need a custom dictionary that should be pretty easy to
generate. You don't have to do any training or anything, just create
a custom language model.
http://cmusphinx.sourceforge.net/wiki/languagemodelhowto
3) You do need all the code. Well you might not, but trimming it out
will be more effort than it's worth
4) You may be able to do this entirely based on pitch with visual
feedback. It would slightly restrict your feature set, but if you're
interested in real time results it should be very feasible. Sure, if
you said Do out of pitch, Do would not light up, but the visual
feedback may be able to provide the training just because the person
is saying one thing and seeing another light up. I really think this
the direction to go in. But your vision in that document was
substantially over my head and I'm not sure of all of the use cases
you want to see.
If you wanted to talk with me about your application usability and how
to make it pitch based, email me off list!
Brian King
> --
> You received this message because you are subscribed to the Google Groups "VocalKit" group.
> To post to this group, send email to voca...@googlegroups.com.
> To unsubscribe from this group, send email to vocalkit+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/vocalkit?hl=en.
>
>