Can I use grammar-fst with monophone acoustic models?

418 views
Skip to first unread message

Ziye Fan

unread,
Jul 4, 2019, 11:16:07 AM7/4/19
to kaldi-help
Dear developers, hello. I know that the grammar-fst (mentioned here) is designed to work with left-biphone acoustic 
models, but I was still wondering if it is possible to make the grammar-fst be used with monophone systems? And if
it does, what modifications should be made to make it happen and where to start with?

Thanks.

Daniel Povey

unread,
Jul 4, 2019, 11:17:30 AM7/4/19
to kaldi-help
It would work for monophone systems without modification, IIRC.


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/3fbaf9c1-f6fa-4c16-b5cb-1bdc7a19faa8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ziye Fan

unread,
Jul 5, 2019, 4:17:54 AM7/5/19
to kaldi-help
Thanks for the information, Dan. There is a concept "left-context-phone-set" in the document, what is the relationship between this and the context window in a left-biphone system?

I tried mini_librespeech/s5/local/grammar/simple_demo.sh with exp/mono model, and mkgraph.sh tells me that "when doing grammar decoding, you can only build graphs for left-biphone trees."
After comment the line, it still fails like this:
......
--> data/lang_nosp_grammar1/L.fst is olabel sorted
--> data/lang_nosp_grammar1/L_disambig.fst is olabel sorted
--> SUCCESS [validating lang directory data/lang_nosp_grammar1]
tree-info exp/mono/tree
tree-info exp/mono/tree
fstpushspecial
fstminimizeencoded
fstdeterminizestar --use-log=true
fsttablecompose data/lang_nosp_grammar1/L_disambig.fst data/lang_nosp_grammar1/G.fst
fstisstochastic data/lang_nosp_grammar1/tmp/LG.fst
-0.0274324 -0.0283582
[info]: LG not stochastic.
fstcomposecontext --nonterm-phones-offset=364 --context-size=1 --central-position=0 --read-disambig-syms=data/lang_nosp_grammar1/phones/disambig.int --write-disambig-syms=data/lang_nosp_grammar1/tmp/disambig_ilabels_1_0.int data/lang_nosp_grammar1/tmp/ilabels_1_0.84902 data/lang_nosp_grammar1/tmp/LG.fst
ERROR (fstcomposecontext[5.5]:main():fstcomposecontext.cc:155) Grammar-fst graph creation only supports models with left-biphone context.  (--nonterm-phones-offset option was supplied).

[ Stack-Trace: ]
0   libkaldi-base.dylib                 0x00000001055d2a6f kaldi::KaldiGetStackTrace() + 63
1   libkaldi-base.dylib                 0x00000001055d27e2 kaldi::MessageLogger::LogMessage() const + 354
2   fstcomposecontext                   0x0000000104601c98 kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&) + 24
3   fstcomposecontext                   0x00000001046017d5 main + 1845
4   libdyld.dylib                       0x00007fff6197d3d5 start + 1
5   ???                                 0x0000000000000008 0x0 + 8

kaldi::KaldiFatalErrorERROR: FstHeader::Read: Bad FST header: standard input
mv: rename data/lang_nosp_grammar1/tmp/ilabels_1_0.84902 to data/lang_nosp_grammar1/tmp/ilabels_1_0: No such file or directory
fstisstochastic data/lang_nosp_grammar1/tmp/CLG_1_0.fst
ERROR: FstHeader::Read: Bad FST header: data/lang_nosp_grammar1/tmp/CLG_1_0.fst
ERROR (fstisstochastic[5.5]:ReadFstKaldiGeneric():kaldi-fst-io.cc:53) Reading FST: error reading FST header from data/lang_nosp_grammar1/tmp/CLG_1_0.fst

[ Stack-Trace: ]
0   libkaldi-base.dylib                 0x00000001053bfa6f kaldi::KaldiGetStackTrace() + 63
1   libkaldi-base.dylib                 0x00000001053bf7e2 kaldi::MessageLogger::LogMessage() const + 354
2   libkaldi-fstext.dylib               0x0000000104fc1358 kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&) + 24
3   libkaldi-fstext.dylib               0x0000000104fc1830 fst::ReadFstKaldiGeneric(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, bool) + 960
4   fstisstochastic                     0x00000001043e80fd main + 285
5   libdyld.dylib                       0x00007fff6197d3d5 start + 1

kaldi::KaldiFatalError[info]: CLG not stochastic.
make-h-transducer --nonterm-phones-offset=364 --disambig-syms-out=exp/mono/grammar1/disambig_tid.int --transition-scale=1.0 data/lang_nosp_grammar1/tmp/ilabels_1_0 exp/mono/tree exp/mono/final.mdl
ERROR (make-h-transducer[5.5]:Input():kaldi-io.cc:756) Error opening input stream data/lang_nosp_grammar1/tmp/ilabels_1_0

[ Stack-Trace: ]
0   libkaldi-base.dylib                 0x000000010f43ea6f kaldi::KaldiGetStackTrace() + 63
1   libkaldi-base.dylib                 0x000000010f43e7e2 kaldi::MessageLogger::LogMessage() const + 354
2   libkaldi-util.dylib                 0x000000010f195928 kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&) + 24
3   libkaldi-util.dylib                 0x000000010f19f728 kaldi::Input::Input(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool*) + 152
4   make-h-transducer                   0x000000010ce2b157 main + 439
5   libdyld.dylib                       0x00007fff6197d3d5 start + 1
6   ???                                 0x0000000000000007 0x0 + 7

kaldi::KaldiFatalError

Where should I look into? Any help would be greatly appreciated.


在 2019年7月4日星期四 UTC+8下午11:17:30,Dan Povey写道:
It would work for monophone systems without modification, IIRC.


On Thu, Jul 4, 2019 at 11:16 AM Ziye Fan <fanzi...@gmail.com> wrote:
Dear developers, hello. I know that the grammar-fst (mentioned here) is designed to work with left-biphone acoustic 
models, but I was still wondering if it is possible to make the grammar-fst be used with monophone systems? And if
it does, what modifications should be made to make it happen and where to start with?

Thanks.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

Daniel Povey

unread,
Jul 5, 2019, 12:19:39 PM7/5/19
to kaldi-help
You could try modifying the check to:

if (!((context_width == 2 & central_position != 1) || context_width == 1)) {
}


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Daniel Povey

unread,
Jul 5, 2019, 12:20:11 PM7/5/19
to kaldi-help
.. at fstcomposecontext.cc:154.  I'll merge if it works.

Ziye Fan

unread,
Jul 7, 2019, 10:50:56 PM7/7/19
to kaldi-help
Thanks, Dan. I modified fstcomposecontext.cc:154 as you adviced, there are still error in make-h-transducer:
make-h-transducer --nonterm-phones-offset=364 --disambig-syms-out=exp/mono/grammar1/disambig_tid.int --transition-scale=1.0 data/lang_nosp_grammar1/tmp/ilabels_1_0 exp/mono/tree exp/mono/final.mdl
ERROR (make-h-transducer[5.5]:GetHmmAsFsa():hmm-utils.cc:41) Context size mismatch, ilabel-info [from context FST is 2, context-dependency object expects 1

It seems that "ilabel_info" object generated by fstcomposecontext is actually created by "ComposeContextLeftBiphone" rather than "ComposeContext", which is assuming a size-2 context window. Is it ok to use "ComposeContext" instead of the former one in monophone case?


在 2019年7月6日星期六 UTC+8上午12:20:11,Dan Povey写道:

Daniel Povey

unread,
Jul 7, 2019, 11:08:39 PM7/7/19
to kaldi-help
Probably getting that to work correctly would require a bit of work.  I'll wait until there is a good reason to do it.
In your case you could just build a biphone tree.  Scripts like train_deltas.sh and train_sat.sh can take options like:

--context-opts "--context-width 2 --central-position 1" 

which will give you a left-biphone system.


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Ziye Fan

unread,
Jul 8, 2019, 10:25:18 PM7/8/19
to kaldi-help
I tried biphone system with train_deltas.sh and it worked. Thank you!

I'm a little confused about the two concepts: "left-context phones" in grammarfst, and "left biphone context" in a biphone system, are they the same or related? Is it true that if we don't use a biphone system, "left-context phones" could just be ignored in grammarfst? If not, what is the purpose of left-context phones? Besides, what is the purpose of "#nonterm_reenter"?

What I want to do is to make grammarfst works on a monophone asr system, so that I can add new lexicon in the run time and have class-based lm enabled. In order to do that, I'm trying to understand the mechanism of grammarfst and then modify it. Is there any doc or code to start with?

Thank you for your help!

在 2019年7月8日星期一 UTC+8上午11:08:39,Dan Povey写道:

Daniel Povey

unread,
Jul 8, 2019, 10:37:15 PM7/8/19
to kaldi-help
The framework would have been substantially simpler in a monophone system; the left-context phones would not be needed.  In fact, it would be possible to accomplish the monophone case using only OpenFst's ReplaceFst, or a much simpler version of the GrammarFst object that does the same as OpenFst's ReplaceFst, but has state-ids that are pairs of integers like GrammarFst does.  (This saves memory as you don't have to store huge state tables or cached copies of FSTs).


Dan


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Ziye Fan

unread,
Jul 9, 2019, 10:00:30 PM7/9/19
to kaldi-help
Thanks. Will try it

在 2019年7月9日星期二 UTC+8上午10:37:15,Dan Povey写道:

Xiaobo Li

unread,
Sep 5, 2019, 6:39:17 AM9/5/19
to kaldi-help
Hi Dan,
How to work with triphone acoustic models??

在 2019年7月9日星期二 UTC+8上午10:37:15,Dan Povey写道:
The framework would have been substantially simpler in a monophone system; the left-context phones would not be needed.  In fact, it would be possible to accomplish the monophone case using only OpenFst's ReplaceFst, or a much simpler version of the GrammarFst object that does the same as OpenFst's ReplaceFst, but has state-ids that are pairs of integers like GrammarFst does.  (This saves memory as you don't have to store huge state tables or cached copies of FSTs).


Dan


Daniel Povey

unread,
Sep 5, 2019, 7:07:30 AM9/5/19
to kaldi-help
The grammar-fst framework doesn't work with triphone models.
Decoding with triphone models and dynamic grammars is significantly more complicated.
It is doable using OpenFst's lookahead-fst mechanism, but complex to set up.


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/0b6b7cd5-83aa-4758-aeac-4c9dee3a2551%40googlegroups.com.

Vishay Raina

unread,
Mar 5, 2021, 9:30:18 AM3/5/21
to kaldi-help
I tried to follow this, as even I have a monophone nnet-3 acoustic model. I use it to align 100 hrs of data and use that to create
biphone tree with train_deltas.sh. When I execute mkgraph.sh it fails with the error (incompatible tree and model?) as  follows:

make-h-transducer --nonterm-phones-offset=355 --disambig-syms-out=exp/new_tree/extvocab_nosp_top/disambig_tid.int --transition-scale=1.0 data/lang_nosp_basevocab//tmp/ilabels_2_1 exp/new_tree//tree exp/new_tree//final.mdl 
ERROR (make-h-transducer[5.5.0~1-2b62]:TupleToTransitionState():transition-model.cc:262) TransitionModel::TupleToTransitionState, tuple not found. (incompatible tree and model?)

[ Stack-Trace: ]
/opt/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x82c) [0x7f99acbca2aa]
make-h-transducer(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x21) [0x4035b5]
/opt/kaldi/src/lib/libkaldi-hmm.so(kaldi::TransitionModel::TupleToTransitionState(int, int, int, int) const+0xf2) [0x7f99ad55deb0]
/opt/kaldi/src/lib/libkaldi-hmm.so(kaldi::GetHmmAsFsa(std::vector<int, std::allocator<int> >, kaldi::ContextDependencyInterface const&, kaldi::TransitionModel const&, kaldi::HTransducerConfig const&, std::unordered_map<std::pair<int, std::vector<int, std::allocator<int> > >, fst::VectorFst<fst::ArcTpl<fst::TropicalWeightTpl<float> >, fst::VectorState<fst::ArcTpl<fst::TropicalWeightTpl<float> >, std::allocator<fst::ArcTpl<fst::TropicalWeightTpl<float> > > > >*, kaldi::HmmCacheHash, std::equal_to<std::pair<int, std::vector<int, std::allocator<int> > > >, std::allocator<std::pair<std::pair<int, std::vector<int, std::allocator<int> > > const, fst::VectorFst<fst::ArcTpl<fst::TropicalWeightTpl<float> >, fst::VectorState<fst::ArcTpl<fst::TropicalWeightTpl<float> >, std::allocator<fst::ArcTpl<fst::TropicalWeightTpl<float> > > > >*> > >*)+0x61f) [0x7f99ad567a54]
/opt/kaldi/src/lib/libkaldi-hmm.so(kaldi::GetHTransducer(std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, kaldi::ContextDependencyInterface const&, kaldi::TransitionModel const&, kaldi::HTransducerConfig const&, std::vector<int, std::allocator<int> >*)+0x5fd) [0x7f99ad56826d]
make-h-transducer(main+0x4e4) [0x402f9a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f99ac282830]
make-h-transducer(_start+0x29) [0x4029e9]


Here the final.mdl file used was the nnet3 AM and not the gmm model created by train_deltas.sh (and graph building fails because the number of targets is different?).

I could do make-h-transducer with the gmm - final.mdl file (and it would probably work because it has the same num-pdfs) but then do I have to retrain my nnet3 AM?

Maybe I have misunderstood this completely.

Daniel Povey

unread,
Mar 5, 2021, 9:56:33 AM3/5/21
to kaldi-help
If you want to use the neural net model you'll have to build the graph with its tree.
Might be easier to just build a left-biphone neural-net model and set the number of tree leaves very small.


Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group

---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

朱冰清

unread,
May 29, 2024, 9:26:50 AMMay 29
to kaldi-help

I'm a bit puzzled about left-context-phones too. Is there a way to use grammar-fst in systems that don't have left-context, say, like in TLG, without changing the original grammar-fst code? Or, in systems without left-context, can we just go ahead with the grammar-fst code, ignoring left-context-phones and just assuming they consist of only #nonterm_bos?


I hope my previous message was clear, but to expand on that—if I'm aiming to implement dynamic expansion using grammar-fst within a TLG system, could I potentially fabricate a left_context_phones.txt that only contains #nonterm_bos, and then proceed to construct the decoding graph using the mini_librispeech/s5 recipe? An additional question is about handling the #nonterm symbols in my tokens list: should I treat them as regular tokens, disambiguation symbols, or is there some other specific way they need to be handled? 


Any suggestions or references for making such modifications would be invaluable. 

It would be great if you can help me! Thank you in advance!


Best,

Reply all
Reply to author
Forward
0 new messages