About confidence score issue

5406...@qq.com

unread,

Jun 20, 2018, 9:33:03 PM6/20/18

to kaldi-help

Hi All,

I am trying to get the confidence score of per word, when prepare data, I put some waves into egs/thchs30/s5/thchs30_openslr/data_thchs30/test

Test waves list:

A0_101.wav and A2_*.wav (junk waves, the result not in words.txt)

A8_*.wav (correct waves, the result in words.txt)

use below cmd to get confidence scores, the junk waves and correct waves test result seems no differences, they all close to 1.0.

i) command:

./lattice-push ark:"gunzip -c lat.1.gz |" ark:- | ./lattice-align-words-lexicon ./align_lexicon.int ./40.mdl ark:- ark:- | ./lattice-to-ctm-conf --acoustic-scale=0.0769 --frame-shift=0.01 --print-silence=true ark:- - | ./int2sym.pl -f 5 ./words.txt 2>/dev/null

ii) test result:

refer to attachment: scores.log.txt

BTW, is there any document introduce such parameters as below(I can not understand the meaning of the red parameters, does green parameter means confidence score???):

A08_101 1 0.00 0.94 最小风 1.00

Could anyone help me explain why junk waves has such high score and how to improve it?

Best regards,

Chenjiang

scores.log.txt

Daniel Povey

unread,

Jun 20, 2018, 9:38:05 PM6/20/18

to kaldi-help

I assume the lattices were not generated from a chain model (if they
were, the acoustic scale is very wrong).
The confidences won't always be very good, they are just derived from
the lattice posterior. They will be particularly poor if the language
model doesn't contain a lot of short words (which could act as a kind
of filler model).
Yes, the last field is the confidence; the fields in red are "channel"
(normally 1, 2, A or B), start-time, duration.

Dan

> --
> Go to http://kaldi-asr.org/forums.html find out how to join
> ---
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> To post to this group, send email to kaldi...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kaldi-help/2ec529cf-b204-420e-a2e6-bba34737bcf5%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

chenji...@gmail.com

unread,

Jun 21, 2018, 3:12:38 AM6/21/18

to kaldi-help

Hi Dan,

Thanks for your reply!

1. We use GMM-HMM generate the lattice, this model is not a chain model, is this correct?

I refer to this document, http://kaldi-asr.org/doc/chain.html, a chain model is a type of DNN-HMM model.

2. If GMM-HMM model is not a chain model, how can I get the confidence score?

Best Wishes,

Jiang Chen

在 2018年6月21日星期四 UTC+8上午9:38:05，Dan Povey写道：

Reply all

Reply to author

Forward