Explication about Confidence scores

Sana Khamekhem

unread,

Dec 11, 2018, 6:43:28 AM12/11/18

to kaldi-help

Hi all,

I have performed decoding using different lexical unit levels (word and sub-words). I have observed that the confidence score of one word is different using the two levels.

For example, the word "someone" : 0.98

"some" : 0.76 "one" : 0.76 --> "someone" : 0.76 (the average)

I would to understand the explication behind this behaviour.

Thank you in advance.

Daniel Povey

unread,

Dec 11, 2018, 1:55:50 PM12/11/18

to kaldi-help

To really understand it you'd have to think about the algorithms underlying the whole process. But the simple explanation is that if you decode with sub-word units, there are more alternative choices for the transcript, which might not be allowed in a word-based system.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/16bbeb38-4a30-4fea-bb92-e98a8834c345%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sana Khamekhem

unread,

Dec 11, 2018, 2:30:54 PM12/11/18

to kaldi-help

Thank you for your response Dan, but how more alternatives impact the confidence score measure of sub-words?

Really, I think that the sub-word LM weights influent these scores. Is that correct?

Reply all

Reply to author

Forward