Hey, I think, if I understand your mission correctly, that this would be quite easy to achieve, especially if you a phonemic language, as you mention. What I would suggest is to train an acoustic model using the graphemes (letters) of the language. That can be done by altering the pronunciation in the lexicon so that it is just the letters expanded. E.g.
word -> w o r d
bubblegum -> b u b b l e g u m
...
so forth.
You can set all the individual letters as the non-silence phones in the dictionary. Other files in the dict would, I think, remain just the same.
You can think about the G-fst just as a heavily subword tokenized language model, down to each individual character. Some research has been done on this for Kaldi, most notably from Peter Smits in Aalto University. You would need to alter the L-fst to handle subword tokenized units that can be done with the code here ->
https://github.com/aalto-speech/subword-kaldi. You would then have to alter the text corpus, say your corpus is the words "I am a dog" you would add a boundary marker after every letter if there is no word boundary after it "I a+ m a d+ o+ g". This just one of four possible marking styles, the other are listed in paper that is in the Git repository. To train the LM you can use KenLM or any other Kaldi supported tool. Just not that you would need a longer N-gram context the normal, perhaps at least 6 gram.
If you successfully compile a decoding graph with this setup it will output characters along with the boundary marker and you can alter the "wer_output_filter" file found in the local dir. This is a list of sed commands that should be done on the hypothesis. It's called upon by score.sh.
In about two weeks I will be posting my thesis code in a user-friendly script in this repository ->
https://github.com/cadia-lvl/samromur-asr. It already has most of these steps implement for subword ASR modelling but requires some cleanup.
Hope this helps.
DEM