Not at all. Open criticism is welcome in all forms.
Internally we use Kaldi in a slightly different form to the open project.
The G2P interface follows the same basic layout as in my github - but has actually been updated.
I am interested in trying to do this the 'right way', so I will lay out a simple blueprint of the current model:
- Python bindings based on pybindgen [this is pure python and is included in the project; we also use this for Kaldi]
- python commandline tool to train a model
- python commandline tool to produce new pronunciations
- Optionally takes a reference dictionary
- simple webserver which loads one or more G2P models and associated reference dictionaries
I have been working some on refactoring the c++ API in response to other requests [as well as more robust response to your original criticism].
In the simplest case, during the network building phase we call the G2P [either via the server or the script] with a word list, e.g.:
- $ get-pronunciations --wordlist words.wl --model input.en-US.fst --reference input.en-US.lex --output app.en-US.lex
- Server accepts a curl request.
Reference words are looked-up, OOVs call the model. Optional parameters can be provided to a.) generate additional pronunciations for reference words, or b.) n-best for OOVs.
There is nothing particularly 'interesting' about any of this, but it keeps things fairly sane and self-contained.
If you have some thoughts about this and/or specific requests about ideal interaction with public Kaldi patterns, I would very much like to leverage them this time around.
I have carte blanche at work with regards to this project so I see this as a good excuse to potentially provide something back where we are benefiting.
Best, Joe