inferring quality of transcription from confidence scores (or other indicators...)

Skip to first unread message


Aug 20, 2017, 4:45:18 PM8/20/17
to kaldi-help

this is not really related to Kaldi, but some speech technologists might have a few hints.
The thing here is the old problem of inferring the quality of the transcription (I mean a rough idea about the quality) just from the transcription itself. Exploiting the word confidence scores (computed from the lattices by mbr decode) is hard, they tell something but not all. Let's say that an average word confidence score < 0.7 usually means bad transcription (high WER, > 50%) and > 0.9 good transcription (the WER is usually reasonably low according to my observations, like < 30%); but then again, the vast majority of those scores fall between 0.7 and 0.9 where every quality of transcription can be observed. I mean, even transcription with a wrong language (basically 100% WER) might have confidence scores reasonably good
I was thinking of calibrating those score with a logistic regression model, annotating a bunch of transcriptions simply as good or bad; but then again  I'm not sure what the variables of this model would be; I was thinking number of words in the transcription, language of the transcription, for example, but I'm not sure at all any meaningful pattern can be extracted
Does someone have any ideas or tried stuff like that before?

Daniel Povey

Aug 20, 2017, 4:47:19 PM8/20/17
to kaldi-help
A good method is to take a weak system and a strong system and see how
much difference there is in the transcripts. There is a very good
correlation between that and the word error rate. (George Saon may
have a paper about this; he used GMM systems before and after fMLLR
since that was before DNNs). You can choose which type of weak system
you want, e.g. a GMM system, or one with a weak language model.
> --
> Go to find out how to join
> ---
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
> For more options, visit


Aug 20, 2017, 5:14:45 PM8/20/17
to kaldi-help,
You mean that if the edit distance of the two transcripts (from weak and strong system) is high then the transcript from the strong one is supposed to be good?
then again, it assumes to do two transcriptions, which might not be feasible in my case, even though one of the two might be done faster, at least

Daniel Povey

Aug 20, 2017, 5:16:00 PM8/20/17
to Armando, kaldi-help
no, I mean if the edit distance is high then the transcripts will be worse.

Nickolay Shmyrev

Aug 21, 2017, 4:43:49 AM8/21/17
to kaldi-help,,
Something like this probably


Bhiksha Raj, Rita Singh and James Baker 

Reply all
Reply to author
0 new messages