Explication about Confidence scores

77 views
Skip to first unread message

Sana Khamekhem

unread,
Dec 11, 2018, 6:43:28 AM12/11/18
to kaldi-help
Hi all,

I have performed decoding using different lexical unit levels (word and sub-words). I have observed that the confidence score of one word is different using the two levels.
For example, the word "someone" : 0.98
                                     "some" : 0.76           "one" : 0.76          -->         "someone" : 0.76  (the average)
I would to understand the explication behind this behaviour.

Thank you in advance.

Daniel Povey

unread,
Dec 11, 2018, 1:55:50 PM12/11/18
to kaldi-help
To really understand it you'd have to think about the algorithms underlying the whole process.  But the simple explanation is that if you decode with sub-word units, there are more alternative choices for the transcript, which might not be allowed in a word-based system.


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/16bbeb38-4a30-4fea-bb92-e98a8834c345%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sana Khamekhem

unread,
Dec 11, 2018, 2:30:54 PM12/11/18
to kaldi-help
Thank you for your response Dan, but how more alternatives impact the confidence score measure of sub-words? 
Really, I think that the sub-word LM weights influent these scores. Is that correct?
Reply all
Reply to author
Forward
0 new messages