Hi,
In steps/train_deltas.sh, there is a context_opts option deciding context width and central position.
What if I modify phones.txt into a biphone version (e.g. A+A 1, A+B 2 ...) and also the related files (nonsilence_phones.txt, text, etc.), and then initiate a model with steps/train_mono.sh, and doing state tying with steps/train_deltas.sh by setting context-width=1 and central-position=0 to train a model, is it still a biphone model?
I have tried it and the WER are likely the same with the one from a normal biphone model with <1% difference. Also if giving lower tying state (higher leaves) and more training data the WER could be lower.
I come up with this idea while playing with the scripts, but I am not sure if it's a reasonable way to build a biphone (or other CD model) simply because the result looks good.
Is there any way to verify it?
Any suggestion will be appreciated, thanks!