Can lattice-align-words-lexicon work with prefix multi-pronunciation?

110 views
Skip to first unread message

Zhang Ge

unread,
Jul 25, 2015, 5:27:25 AM7/25/15
to kaldi...@googlegroups.com

Hi,

 

In my lexicon there might be some words with more than two pronunciations and one of them is the prefix of the other. For example:

cannot k_B ae_I n_I aa_I t_E

cannot k_B ae_I n_I aa_I

I think that LatticeLexiconWordAligner::ComputationState::TakeTransition will always take the second as its pronunciation when it is called in LatticeLexiconWordAligner::ProcessWordTransitions, for the second is matched first when getting the phone sequence “k_B ae_I n_I aa_I”. So “t_E” will be left behind and block subsequent alignment.

Am I making any mistakes? Is it possible to get correct alignment without removing one of the pronunciations?

 

Ge

Daniel Povey

unread,
Jul 25, 2015, 3:38:06 PM7/25/15
to kaldi-help
Firstly, lattice-align-words-lexicon is designed for the case when you are not using word-boundary tags in your lexicon, and since you are (_B, _I, _E etc.) you should just use lattice-align-words which is easier and probably faster.
Secondly, yes; lattice-align-words-lexicon can handle this case fine.  The algorithm is more sophisticated than that.
Dan


--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

prczh...@gmail.com

unread,
Jul 26, 2015, 10:33:22 PM7/26/15
to kaldi-help, dpo...@gmail.com
I've read the code of lattice-align-word-lexicon again. It can really handle the case.
I think both pronunciations will create an arc while the shorter one is not ViableIfAdvanced, if the longer is what I want.
Thanks.

Ge

在 2015年7月26日星期日 UTC+8上午3:38:06,Dan Povey写道:
Reply all
Reply to author
Forward
0 new messages