One best hypothesis Lattice to CTM

1,085 views
Skip to first unread message

Sana Khamekhem

unread,
Oct 3, 2016, 6:52:26 AM10/3/16
to kaldi-help
Hi all,
I have generated ctm files from lattice. Differents lm-scale and acwt were experimented but no results is similar to the best WER obtained from decode.sh.
I would like to get the one best transcription without using the command align-word-lexicon.
I have also experimented the option when decode-mbr=true, but it not giving the best result.
Please, is there some parameter or other command to get the best alignment.
The ctm convertion script is the following :
lm=(1 3 5 8 10)
for l in "${lm[@]}"
do
mkdir -p $dir/scoringlm$l
mkdir -p $dir/scorelm$l
mkdir -p $dir/scoringlm$l/logctm
$cmd ACWT=$min_acwt:$max_acwt $dir/scoringlm$l/logctm/get_ctm.ACWT.log \
mkdir -p $dir/scorelm$l/score_ACWT/ '&&' \
lattice-to-ctm-conf-kaldi --decode-mbr=false --inv-acoustic-scale=ACWT --lm-scale=$l "ark:gunzip -c $dir/lat.1.gz|" - \| \
utils/int2sym.pl -f 5  $lang/words.txt \
'>' $dir/scorelm$l/score_ACWT/$name.ctm || exit 1;
done





Daniel Povey

unread,
Oct 3, 2016, 2:35:22 PM10/3/16
to kaldi-help
You probably want the pipeline lattice-1best (with suitable LMWT),
then lattice-align-words or lattice-align-words-lexicon, then
nbest-to-ctm.
You should also figure out what the script that decode.sh calls (i.e.
local/score.sh) is doing.
Dan
> --
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Sana Khamekhem

unread,
Oct 3, 2016, 2:44:49 PM10/3/16
to kaldi...@googlegroups.com
But nbest-to-ctm doese not provide confidence scores, however, I need these scores for post-processing step.
And when using lattice-to-ctm-conf, it provides height scores even the word is not correct (score=1).
Is there a way to extract some confidence score per recognized token?
as example: 
AHTD3A0002_Para2_1 AHTD3A0002_Para2_1 1.10 1.39 aaAlaBtoBaaElaBbaE 0.94 
AHTD3A0002_Para2_1 AHTD3A0002_Para2_1 2.49 0.79 aaAlaBaeEkhMyaBraMteA 0.81 
AHTD3A0002_Para2_1 AHTD3A0002_Para2_1 3.28 1.44 shMheMdaMtaA 1.00 
AHTD3A0002_Para2_1 AHTD3A0002_Para2_1 4.72 0.24 faMyaA 1.00
The word shMheMdaMtaA  is not correct.


> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/VR8DAV6os8c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Best regards,
   Sana Khamekhem Jemni
_________________________________________
PhD Student 
MIRACL Laboratory
ENIS
_________________________________________

Daniel Povey

unread,
Oct 3, 2016, 2:55:20 PM10/3/16
to kaldi-help
Those confidence scores are not very well calibrated as they just
reflect the posterior in the lattice. Getting good confidences is
very hard. Karel has a script somewhere but for now I think it's for
"advanced users only".
lattice-to-ctm-conf with --decode-mbr=false should give you just the
lattice 1-best output so should be the same as the regular scoring
script if you use the best inv-acoustic-scale (corresponding to the
LMWT of the best setup used in scoring) and the same word penalty.
You need to understand what the regular scoring script is doing.

Dan
>> > email to kaldi-help+...@googlegroups.com.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "kaldi-help" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/kaldi-help/VR8DAV6os8c/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> kaldi-help+...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
> Best regards,
> Sana Khamekhem Jemni
> _________________________________________
> PhD Student
> MIRACL Laboratory
> ENIS
> _________________________________________
>
> --
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.

Jan Trmal

unread,
Oct 3, 2016, 2:58:04 PM10/3/16
to kaldi-help
Also, the differences in scoring can be because the decode.sh output was scored different way (compute-wer) than the ctm (for which I assume you use sctk)
y.


>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "kaldi-help" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/kaldi-help/VR8DAV6os8c/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to

>> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
> Best regards,
>    Sana Khamekhem Jemni
> _________________________________________
> PhD Student
> MIRACL Laboratory
> ENIS
> _________________________________________
>
> --
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

Sana Khamekhem

unread,
Oct 4, 2016, 9:58:57 AM10/4/16
to kaldi-help

I use the compute-wer for computing wer% after I have converted the ctm file to .tra file.
Can I convert the .tra generated from the best path to ctm format without scores.
The code used for best path is : (LMWT=19)
$cmd LMWT=$min_lmwt:$max_lmwt $dir/scoring/log/best_path.LMWT.log \
 lattice-scale --inv-acoustic-scale=LMWT "ark:gunzip -c $dir/lat.*.gz|" ark:- \| \
 lattice-add-penalty --word-ins-penalty=$word_ins_penalty ark:- ark:- \| \
 lattice-best-path --word-symbol-table=$symtab \
 ark:- ark,t:$dir/scoring/LMWT.tra || exit 1;
Just align this to ctm format.

>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "kaldi-help" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/kaldi-help/VR8DAV6os8c/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to

>> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
> Best regards,
>    Sana Khamekhem Jemni
> _________________________________________
> PhD Student
> MIRACL Laboratory
> ENIS
> _________________________________________
>
> --
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
Message has been deleted

Sreelakshmi K R

unread,
Feb 21, 2018, 12:05:33 AM2/21/18
to kaldi-help
@Yenda Sir, can you be more specific? I know its so old post but I am facing the problem. Score_kaldi.sh (computing wer) using compute_wer. In local/score.sh I am getting several ctm files according to lmwt. But none of which is corresponds to best wer as given by score_kaldi.sh. How can I use sctk for finding the best ctm file that corresponds to best wer?

Jan Trmal

unread,
Feb 21, 2018, 6:38:38 AM2/21/18
to kaldi-help
Your question is very general, you don't provide any details and actually, I'm not even sure I understand what is your question, so I'm not able to help.
If you score the same decoding directory using two different methods, in your case probably compute-wer and sctk, and expect the same results
a) make sure the references (i.e. data/dataset/text  and data/dataset/stm (or whatever you use for sctk scoring as reference) are equivalent
b) the postprocessing of the ctm vs text is equivalent
c) in both cases you are using mbr decoding (or not using it, but both results have to be obtained using the same mbr flag)
e) sctk or more specifically sclite can be made much more lenient by using flags allowing to fragments match (i.e. no penalty) with any word starting with the same letters, for example, run- vs anything from the set runner, running, runt, runaway... would match
f) sclite can be used to do something that is called time-mediated scoring which will probably give you very different results 

If you won't be able to find the problem, ask the proper way, i.e. describe your setup, your expectations, how observation differs from the expectation...
Make sure you verified all the previous suggestions are not the culprit
It's also possible no-one will be able to help you, sctk is a very specific, grumpy and old software.
y.
  

On Wed, Feb 21, 2018 at 12:05 AM, Sreelakshmi K R <srilak...@gmail.com> wrote:
@Yenda Sir, can you be more specific? I know its so old post but I am facing the problem. Score_kaldi.sh (computing wer) using compute_wer. In local/score.sh I am getting several ctm files according to lmwt. But none of which is corresponds to best wer as given by score_kaldi.sh. How can I use sctk for finding the best ctm file that corresponds to best wer?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

Sana Khamekhem

unread,
Feb 22, 2018, 4:19:34 AM2/22/18
to kaldi...@googlegroups.com
As I have understood, you like to get the ctm file for the best path. So, you should scale your lattice, then generate the ctm file like this:

lattice-scale --acoustic-scale=16 --ascale-factor=0.1  "ark:gunzip -c $dir/lat.1.gz|" "ark:|gzip -c > $dir/lat.scaled.gz"   || exit 1;

16  is the lmwt for the best wer

then,

$cmd $dir/scoringctm/log/get_ctm.log \
mkdir -p $dir/scorectm/score/ '&&' \
lattice-to-ctm-conf "ark:gunzip -c $dir/lat.scaled.gz|" - \| \
utils/int2sym.pl -f 5 $symtab \
'>' $dir/scorectm/score/$name.ctm || exit 1;

Try this and let me know if it works.


Garanti sans virus. www.avast.com

2018-02-21 6:05 GMT+01:00 Sreelakshmi K R <srilak...@gmail.com>:
@Yenda Sir, can you be more specific? I know its so old post but I am facing the problem. Score_kaldi.sh (computing wer) using compute_wer. In local/score.sh I am getting several ctm files according to lmwt. But none of which is corresponds to best wer as given by score_kaldi.sh. How can I use sctk for finding the best ctm file that corresponds to best wer?
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/VR8DAV6os8c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages