I am trying to create lookahead graph for my already trained
position independent kaldi model. First I followed the
mkgraph_lookahead.sh and it stuck on composition of Ha.fst and CL_$N_$P.fst .
There are similar issues below:
It is suggested to use chain model, but I am using chain model already.
-
https://github.com/kaldi-asr/kaldi/issues/4143 It is suggested to omit fstdeterminizestar
Then I have tried to narrow down the issue. I observed that using some mini lexicons allows me to create HCLr.fst and script completes successfully.
Here are the different mini lexicons and their results:
- Words are "GO" and "I" (it has single phone pronunciation), can't complete
GO G OW
I AY
<UNK> SPN
- Words are "GO" and "IS", completes successfully.
GO G OWIS IH Z
<UNK> SPN- I tried a small hack by changing pronunciation of "I" like below, it completes successfully.
GO G OW
I AY AY
<UNK> SPN
I thought it was related to single phone pronunciations. However, the lexicon below makes it stuck, while it completes successfully if I remove anyone of the words.
AS AE Z
AS EH Z
AT AE T
FOR F AO R
FOR F ER0
FOR F R ER0
<UNK> SPN
I think it is related to transitions introduced by specific contexts, they make HCL nondeterminizable for my tree and model. Because even if I use single word lexicons with "
I AY" or "
I AY AY" , the latter one completes while the former fails. Because the former one has extra contexts below (you can find full lists in attached files):
SPN/AY/SIL
SPN/AY/SPN
SPN/AY/<eps>
SIL/AY/SIL
SIL/AY/SPN
SIL/AY/<eps>
<eps>/AY/SIL
<eps>/AY/SPN
<eps>/AY/<eps>
My question is that is it possible to make HCL determinizable? Because omitting
fstdeterminizestar makes output HCLr.fst much bigger. (which has redundancy and slows down decoding)
I can share my model and tree if it is necessary.