Word confidence versus symbol confidence

farhad khalafi

unread,

Oct 12, 2018, 12:40:11 AM10/12/18

to tesseract-ocr

I am totally puzzled with how the confidence reported at Word level relates to the confidences assigned to the characters of the same word.

I used the attached TIFF image to recognize a simple MICR line of a check.

The recognized text had two words:

495096 700000b01b205xX0eL00007010717

The confidence percentiles for the words were 59% and 38% respectively.

The confidence percentiles for the characters of the first word were (rounded):

4 97%

9 99%

5 100%

0 100%

9 99%

6 96%

I would like to know how with such high confidence scores for individual characters, one can compute the word level confidence at 59%.

I ran this test using fast training data for English with no training of my own. I am not worried about the accuracy, just curious about how to interpret confidence scores.

Thanks!

Check.tif

Soumik Ranjan Dasgupta

unread,

Oct 12, 2018, 3:20:51 AM10/12/18

to tesser...@googlegroups.com

Could you tell how did you get the confidence percentiles? I would like to know that. :)

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1a83aa4d-5961-4265-9871-1bcac85e73e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Regards,

Soumik Ranjan Dasgupta

Message has been deleted

farhad khalafi

unread,

Oct 12, 2018, 11:16:04 AM10/12/18

to tesseract-ocr

        /// <summary>
        /// Gets the confidence percentile for the current element at the specified level.
        /// </summary>
        public float GetConfidence(PageIteratorLevel level)
        {
            return TessApi.TessResultIteratorConfidence(Handle, level);
        }

I use a custom .NET layer somewhat similar to Tesseract.net.

The Handle is returned from TessBaseAPIGetIterator() after a page is recognized, and level defines the element type (PageIteratorLevel.Symbol for characters, PageIteratorLevel.Word for words)

Reply all

Reply to author

Forward