Word confidence versus symbol confidence

50 views
Skip to first unread message

farhad khalafi

unread,
Oct 12, 2018, 12:40:11 AM10/12/18
to tesseract-ocr
I am totally puzzled with how the confidence reported at Word level relates to the confidences assigned to the characters of the same word. 

I used the attached TIFF image to recognize a simple MICR line of a check. 

The recognized text had two words:

495096 700000b01b205xX0eL00007010717

The confidence percentiles for the words were 59% and 38% respectively.

The confidence percentiles for the characters of the first word were (rounded):

4    97%
9    99%
5    100%
0    100%
9    99%
6    96%

I would like to know how with such high confidence scores for individual characters, one can compute the word level confidence at 59%.

I ran this test using fast training data for English with no training of my own. I am not worried about the accuracy, just curious about how to interpret confidence scores.

Thanks!



Check.tif

Soumik Ranjan Dasgupta

unread,
Oct 12, 2018, 3:20:51 AM10/12/18
to tesser...@googlegroups.com
Could you tell how did you get the confidence percentiles? I would like to know that. :)

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1a83aa4d-5961-4265-9871-1bcac85e73e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Regards,
Soumik Ranjan Dasgupta
Message has been deleted

farhad khalafi

unread,
Oct 12, 2018, 11:16:04 AM10/12/18
to tesseract-ocr
        /// <summary>
        
/// Gets the confidence percentile for the current element at the specified level.
        
/// </summary>
        
public float GetConfidence(PageIteratorLevel level)
        
{
            
return TessApi.TessResultIteratorConfidence(Handle, level);
        
}


I use a custom .NET layer somewhat similar to Tesseract.net. 
The Handle is returned from TessBaseAPIGetIterator() after a page is recognized, and level defines the element type (PageIteratorLevel.Symbol for characters, PageIteratorLevel.Word for words)
Reply all
Reply to author
Forward
0 new messages