lattice-align-words-lexicon outputting non linear lattices

remi....@gmail.com

unread,

Mar 3, 2016, 7:27:05 AM3/3/16

to kaldi-help

I'm trying to get an alignment of words in the same format than ali-to-phones --write-lengths.

prons-to-wordali seems to be doing that, but in the help message it says that it's deprecated, so I'm trying to use the same thing than in get_train_ctm, with this so far:

linear-to-nbest "ark:gunzip -c $dir/ali.1.gz|" "ark:utils/sym2int.pl --map-oov $oov -f 2- $lang/words.txt < $sdata/1/text |" '' '' ark:- | \
lattice-align-words-lexicon $lang/phones/align_lexicon.int $model ark:- ark:- | \
nbest-to-prons $model ark:- -

But then I get stuff like:

WARNING (nbest-to-prons:CompactLatticeToWordProns():lattice-functions.cc:1011) Lattice is not linear: num-arcs = 2
WARNING (nbest-to-prons:main():nbest-to-prons.cc:86) Format conversion failed for utterance npr-2013008-20130801_atc_02_00021
WARNING (nbest-to-prons:CompactLatticeToWordProns():lattice-functions.cc:1011) Lattice is not linear: num-arcs = 2
WARNING (nbest-to-prons:main():nbest-to-prons.cc:86) Format conversion failed for utterance npr-2013008-20130801_atc_02_00022
WARNING (lattice-align-words-lexicon:WordAlignLatticeLexicon():word-align-lattice-lexicon.cc:1015) [Lattice has input epsilons and/or is not input-deterministic (in Mohri sense)]-- i.e. lattice is not deterministic.  Word-alignment may be slow and-or blow up in memory.

Also am I going the right way to get what I want or is there a simpler way of doing this?

remi....@gmail.com

unread,

Mar 3, 2016, 11:28:27 AM3/3/16

to kaldi-help, remi....@gmail.com

I think that actually the format of nbest-to-prons is exactly what I need, however I still have the issue with lattices not being linear.

Daniel Povey

unread,

Mar 3, 2016, 2:17:00 PM3/3/16

to kaldi-help, Rémi Francis

As it happens, in the last few days I have been working on a program called lattice-arc-post that probably does what you need. I just pushed it to the 'chain' branch (it will be merged to master next time I merge; currently chain is usually ahead of master).

Dan

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

remi....@gmail.com

unread,

Mar 10, 2016, 12:38:59 PM3/10/16

to kaldi-help, remi....@gmail.com, dpo...@gmail.com

The thing is my lattices here are supposed to be linear, and I'd like to get a normal ctm like nbest-to-prons would output, but on some segments lattice-align-words-lexicon makes the lattice non linear.

Example of the output of lattice-align-words-lexicon:

npr-2013008-20130801_atc_02_00021 
3 - 0,0,4_16_18_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17 
1 <eps> 0,0,4_16_18_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17 
4 - 0,0,4_16_18 
2 <eps> 0,0,4_16_18 
5 - 0,0,3_12_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_18 
4 <eps> 0,0,4_16_18 
5 <eps> 0,0,3_12_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_18 
5 

Here the segment isn't great, as it is 1 second long with only one word "-" which represents a silence.

I'm using position independent phones, so I think that lattice-align-words-lexicon gets confused by the optional silence and duplicates the arcs.

Do you think that this can happen on non pathological utterances? I feel that this could happen whenever I have a transcript with a word that represents a silence with the same phone than the optional silence.

Daniel Povey

unread,

Mar 10, 2016, 1:44:30 PM3/10/16

to Rémi Francis, kaldi-help

You can insert lattice-1best before nbest-to-prons to fix this.

Your case is pretty unusual (no word-pos-dependent phones, having a word representing silence).

Dan

remi....@gmail.com

unread,

Mar 11, 2016, 6:31:03 AM3/11/16

to kaldi-help, remi....@gmail.com, dpo...@gmail.com

Thanks, it works.

By the way, do you have any paper to recommend that explains the benefits of position dependent phones?

Daniel Povey

unread,

Mar 11, 2016, 2:57:23 PM3/11/16

to Rémi Francis, kaldi-help

We don't have a paper. It's not a huge difference but enough to add the feature.

Rémi Francis

unread,

May 13, 2016, 7:40:54 AM5/13/16

to kaldi-help, remi....@gmail.com, dpo...@gmail.com

I've made the test: with position independent phones I got 9.96 WER whereas with the position dependent ones I got 9.41.

Is it the kind of improvement you expect? This is quite more than I thought.

Daniel Povey

unread,

May 13, 2016, 1:45:04 PM5/13/16

to Rémi Francis, kaldi-help, Rémi Francis

It's within the range we expect. It's probably language dependent (etc.).
Dan

Karel Veselý

unread,

Jul 24, 2018, 12:14:55 PM7/24/18

to kaldi-help

Aha, okay, so...

the 'lattice-align-words-lexicon' with 'linear' lattice (1best or nbest) on input may produce a lattice with 'non-linear' output.
It is not a 'bug', it is a 'feature'.

Good to know, I just used this piece of info ;)
Thanks,
Karel

Reply all

Reply to author

Forward