lattice-align-words-lexicon outputting non linear lattices

718 views
Skip to first unread message

remi....@gmail.com

unread,
Mar 3, 2016, 7:27:05 AM3/3/16
to kaldi-help
I'm trying to get an alignment of words in the same format than ali-to-phones --write-lengths. 
prons-to-wordali seems to be doing that, but in the help message it says that it's deprecated, so I'm trying to use the same thing than in get_train_ctm, with this so far:
linear-to-nbest "ark:gunzip -c $dir/ali.1.gz|" "ark:utils/sym2int.pl --map-oov $oov -f 2- $lang/words.txt < $sdata/1/text |" '' '' ark:- | \
lattice-align-words-lexicon $lang/phones/align_lexicon.int $model ark:- ark:- | \
nbest-to-prons $model ark:- -

But then I get stuff like:
WARNING (nbest-to-prons:CompactLatticeToWordProns():lattice-functions.cc:1011) Lattice is not linear: num-arcs = 2
WARNING
(nbest-to-prons:main():nbest-to-prons.cc:86) Format conversion failed for utterance npr-2013008-20130801_atc_02_00021
WARNING
(nbest-to-prons:CompactLatticeToWordProns():lattice-functions.cc:1011) Lattice is not linear: num-arcs = 2
WARNING
(nbest-to-prons:main():nbest-to-prons.cc:86) Format conversion failed for utterance npr-2013008-20130801_atc_02_00022
WARNING
(lattice-align-words-lexicon:WordAlignLatticeLexicon():word-align-lattice-lexicon.cc:1015) [Lattice has input epsilons and/or is not input-deterministic (in Mohri sense)]-- i.e. lattice is not deterministic.  Word-alignment may be slow and-or blow up in memory.

Also am I going the right way to get what I want or is there a simpler way of doing this?

remi....@gmail.com

unread,
Mar 3, 2016, 11:28:27 AM3/3/16
to kaldi-help, remi....@gmail.com
I think that actually the format of nbest-to-prons is exactly what I need, however I still have the issue with lattices not being linear. 

Daniel Povey

unread,
Mar 3, 2016, 2:17:00 PM3/3/16
to kaldi-help, Rémi Francis
As it happens, in the last few days I have been working on a program called lattice-arc-post that probably does what you need.  I just pushed it to the 'chain' branch (it will be merged to master next time I merge; currently chain is usually ahead of master).

Dan


--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

remi....@gmail.com

unread,
Mar 10, 2016, 12:38:59 PM3/10/16
to kaldi-help, remi....@gmail.com, dpo...@gmail.com
The thing is my lattices here are supposed to be linear, and I'd like to get a normal ctm like nbest-to-prons would output, but on some segments lattice-align-words-lexicon makes the lattice non linear.
Example of the output of lattice-align-words-lexicon:
npr-2013008-20130801_atc_02_00021 
0 3 - 0,0,4_16_18_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17 
0 1 <eps> 0,0,4_16_18_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17_17 
1 4 - 0,0,4_16_18 
1 2 <eps> 0,0,4_16_18 
2 5 - 0,0,3_12_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_18 
3 4 <eps> 0,0,4_16_18 
4 5 <eps> 0,0,3_12_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_10_18 

Here the segment isn't great, as it is 1 second long with only one word "-" which represents a silence. 
I'm using position independent phones, so I think that lattice-align-words-lexicon gets confused by the optional silence and duplicates the arcs. 
Do you think that this can happen on non pathological utterances? I feel that this could happen whenever I have a transcript with a word that represents a silence with the same phone than the optional silence.

Daniel Povey

unread,
Mar 10, 2016, 1:44:30 PM3/10/16
to Rémi Francis, kaldi-help
You can insert lattice-1best before nbest-to-prons to fix this.
Your case is pretty unusual (no word-pos-dependent phones, having a word representing silence).
Dan

remi....@gmail.com

unread,
Mar 11, 2016, 6:31:03 AM3/11/16
to kaldi-help, remi....@gmail.com, dpo...@gmail.com
Thanks, it works.
By the way, do you have any paper to recommend that explains the benefits of position dependent phones?

Daniel Povey

unread,
Mar 11, 2016, 2:57:23 PM3/11/16
to Rémi Francis, kaldi-help
We don't have a paper.  It's not a huge difference but enough to add the feature.

Rémi Francis

unread,
May 13, 2016, 7:40:54 AM5/13/16
to kaldi-help, remi....@gmail.com, dpo...@gmail.com
I've made the test: with position independent phones I got 9.96 WER whereas with the position dependent ones I got 9.41.
Is it the kind of improvement you expect? This is quite more than I thought.

Daniel Povey

unread,
May 13, 2016, 1:45:04 PM5/13/16
to Rémi Francis, kaldi-help, Rémi Francis
It's within the range we expect. It's probably language dependent (etc.).
Dan

Karel Veselý

unread,
Jul 24, 2018, 12:14:55 PM7/24/18
to kaldi-help
Aha, okay, so... 
the 'lattice-align-words-lexicon' with 'linear' lattice (1best or nbest) on input may produce a lattice with 'non-linear' output.
It is not a 'bug', it is a 'feature'.

Good to know, I just used this piece of info ;)
Thanks,
Karel
Reply all
Reply to author
Forward
0 new messages