Lattice to ctm conversion wer

820 views
Skip to first unread message

Sana Khamekhem

unread,
Sep 26, 2016, 1:39:19 PM9/26/16
to kaldi-help
I'have generated the lat.1.gz after a decoding stage using kaldi.
The best WER is obtained using : beam=17,lattice_beam=8.0,acoustic scale=0.0883 and lmwt=11.
I would like to convert this lattice to ctm (the best path) with score confidence.
The code is the following:
     $cmd  $dir/scoring/logctm/get_ctm.5.log \
      set -o pipefail '&&' mkdir -p $dir/score_5/ '&&' \
    lattice-to-ctm-conf --decode-mbr=true --inv-acoustic-scale=5  "ark:gunzip -c $dir/lat.*.gz|" - \| \
    utils/int2sym.pl -f 5  $lang/words.txt  \
    '>' $dir/score_5/5.ctm || exit 1;
But the WER is not like generated from rescoring the lattice with the cite parameters.
How can I perform best WER using  lattice-to-ctm-conf ??

Daniel Povey

unread,
Sep 26, 2016, 3:11:56 PM9/26/16
to kaldi-help
The --inv-acoustic-scale should be set to the same as what you set the
lmwt to be. Also set --decode-mbr=false to get exactly the same
number.
The times in the ctm might not be quite right because there is no
lattice-align-words or lattice-align-words-lexicon in there.
Dan
> --
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Sana Khamekhem

unread,
Sep 26, 2016, 10:03:36 PM9/26/16
to kaldi-help, dpo...@gmail.com
The new code is:
lmw=20
       $cmd  $dir/scoring/logctm/get_ctm.$lmw.log \
       set -o pipefail '&&' mkdir -p $dir/score_$lmw/ '&&' \
       lattice-to-ctm-conf --decode-mbr=false --inv-acoustic-scale=$lmw  "ark:gunzip -c $dir/lat.*.gz|" - \| \
         utils/int2sym.pl -f 5  $lang/words.txt  \
'>' $dir/score_$lmw/$lmw.ctm || exit 1;

But, I don't have the same wer as rescoring using lattice, and there is 2 utterances not existant in the generated ctm.
I'm testing this using kaldi and eesen(2 systems)
what is wrong in my code?

Daniel Povey

unread,
Sep 26, 2016, 10:14:55 PM9/26/16
to Sana Khamekhem, kaldi-help
If some utterances are missing from the CTM, it could just be because
the utterances were decoded as empty, i.e. nothing there-- the CTM
format doesn't have a natural representation for empty utterances.
I'm not sure the way to fix this. I tend to avoid dealing too much
with the NIST scoring tools as that stuff is pretty complicated.

Dan


On Mon, Sep 26, 2016 at 10:03 PM, Sana Khamekhem

star633669

unread,
Apr 5, 2017, 9:46:36 PM4/5/17
to kaldi-help, dpo...@gmail.com

Hi Dan

can I use lattice-align-words or lattice-align-words-lexicon before  lattice-to-ctm-conf ?


I will appreciate any help

Dan Povey於 2016年9月27日星期二 UTC+8上午3時11分56秒寫道:

Daniel Povey

unread,
Apr 5, 2017, 9:48:25 PM4/5/17
to star633669, kaldi-help
yes.

Daniel Povey

unread,
Apr 6, 2017, 1:24:09 PM4/6/17
to 王星月, kaldi-help
If you do lattice-1best within that pipeline, all the confidences will be one because lattice-to-ctm-conf works based on lattice posteriors, and if you retain only one path in the lattice then that posterior for that path will be one.  You have to remove lattice-1best.


On Thu, Apr 6, 2017 at 7:08 AM, 王星月 <u001...@gmail.com> wrote:
when I using lattice-align-words-lexicon in this case 

    for lmw in 16 ; do

      $cmd  $dir/scoring/logctm/get_ctm.$lmw.log \
        set -o pipefail '&&' mkdir -p $dir/score_$lmw/ '&&' \
        lattice-1best "ark:gunzip -c $lats|" ark:- \| \
        lattice-align-words-lexicon $lang/phones/align_lexicon.int $model ark:- ark:- \| \
        lattice-1best ark:- ark:- \| \
        lattice-to-ctm-conf --decode-mbr=false --inv-acoustic-scale=$lmw ark:- - \| \
        utils/int2sym.pl -f 5  $lang/words.txt  \
        '>' $dir/score_$lmw/$lmw.ctm || exit 1;
    done

all of the confidence scores just print  "1.00"
and before that, I just using lattice-to-ctm-conf to get confidence scores but The times in the ctm doesn't be quite right.
I want to get both the well times in the ctm and confidence score.
what should I do to achieve?

I will appreciate any help
Reply all
Reply to author
Forward
0 new messages