Can lattice-align-words-lexicon work with prefix multi-pronunciation?

Zhang Ge

unread,

Jul 25, 2015, 5:27:25 AM7/25/15

to kaldi...@googlegroups.com

Hi,

In my lexicon there might be some words with more than two pronunciations and one of them is the prefix of the other. For example:

cannot k_B ae_I n_I aa_I t_E

cannot k_B ae_I n_I aa_I

I think that LatticeLexiconWordAligner::ComputationState::TakeTransition will always take the second as its pronunciation when it is called in LatticeLexiconWordAligner::ProcessWordTransitions, for the second is matched first when getting the phone sequence “k_B ae_I n_I aa_I”. So “t_E” will be left behind and block subsequent alignment.

Am I making any mistakes? Is it possible to get correct alignment without removing one of the pronunciations?

Ge

Daniel Povey

unread,

Jul 25, 2015, 3:38:06 PM7/25/15

to kaldi-help

Firstly, lattice-align-words-lexicon is designed for the case when you are not using word-boundary tags in your lexicon, and since you are (_B, _I, _E etc.) you should just use lattice-align-words which is easier and probably faster.

Secondly, yes; lattice-align-words-lexicon can handle this case fine. The algorithm is more sophisticated than that.

Dan

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

prczh...@gmail.com

unread,

Jul 26, 2015, 10:33:22 PM7/26/15

to kaldi-help, dpo...@gmail.com

I've read the code of lattice-align-word-lexicon again. It can really handle the case.

I think both pronunciations will create an arc while the shorter one is not ViableIfAdvanced, if the longer is what I want.