Cast word confidence success rate ?

237 views
Skip to first unread message

emre

unread,
Aug 5, 2011, 3:29:46 AM8/5/11
to tesseract-ocr
Hi. I am getting the word confidences in the html ouput of tesseract
ocr engine. But confidences give me negative numbers how can i get
success rate for a word from negative confidence values ?


Thanks

Dmitri Silaev

unread,
Aug 6, 2011, 2:49:47 AM8/6/11
to tesser...@googlegroups.com
Use this formula:

confidence = min(100, max(0, 100 + 5*certainty))

where "confidence" is the value you need, "certainty" - the value
returned by Tess

Warm regards,
Dmitri Silaev
www.CustomOCR.com

> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-oc...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

Andy Hotmail

unread,
Aug 6, 2011, 11:04:37 AM8/6/11
to tesser...@googlegroups.com
Hi Dimitri

You kindly added me to the group when it wouldn't let me subscribe. It now
won't let me unsubscribe so if you could unsubscribe me please it would be
much appreciated.

Thank You

Andy Syme

Dmitri Silaev

unread,
Aug 6, 2011, 3:13:41 PM8/6/11
to tesser...@googlegroups.com
Hi Andy,

Unfortunately I can't help you - I'm not in charge for moderation of
this forum. Please ask official moderators for this

Warm regards,
Dmitri Silaev
www.CustomOCR.com

Sven Pedersen

unread,
Aug 6, 2011, 3:33:58 PM8/6/11
to tesser...@googlegroups.com
This should help -- have you tried these methods?
http://groups.google.com/support/bin/answer.py?answer=46608
--Sven

--
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

Yunus Emre Cavusoglu

unread,
Aug 8, 2011, 2:42:26 AM8/8/11
to tesser...@googlegroups.com
Thank you very much Dmitri
Message has been deleted

Gunasekaran Velu

unread,
Apr 8, 2015, 1:44:54 AM4/8/15
to tesser...@googlegroups.com
Hi Dmitri

Does your formula only for negative confidence score or for all?

Because i am getting confidence score for "Name" - 215(positive value) Is it correct or not? or Does i do any calculation for that?

Looking forward your reply.


Regards
Guna

Dmitri Silaev

unread,
Apr 8, 2015, 4:37:13 AM4/8/15
to tesser...@googlegroups.com
It seems you're confusing "certainty" and "confidence" here. Please pay close attention to what you're writing or rephrase your question. The formula itself allows no values out of the [0, 100] range.

Best regards,
Dmitri Silaev
www.CustomOCR.com





On Wed, Apr 8, 2015 at 8:37 AM, Gunasekaran Velu <mail2...@gmail.com> wrote:
Hi Dmitri

Does your formula only for negative confidence score or for all?

Because i am getting confidence score for "Name" - 215 Is it correct or not? or Does i do any calculation for that?

Looking forward your reply.


Regards
Guna

On Saturday, August 6, 2011 at 12:19:47 PM UTC+5:30, Dmitri Silaev wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a1b9d579-f6e3-438c-b946-d3d06b1be607%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gunasekaran Velu

unread,
Apr 8, 2015, 6:21:39 AM4/8/15
to tesser...@googlegroups.com
Really sorry for the mistake.

I am getting certainty value from tesseract for Text "Name" 215(Positive value). 

Does your formula applicable for this certainty value?

Kindly do the needful.

Regards
Guna

Dmitri Silaev

unread,
Apr 10, 2015, 5:52:04 AM4/10/15
to tesser...@googlegroups.com
That formula is not mine. It's synthesized based on the code from TessBaseAPI::AllWordConfidences() or LTRResultIterator::Confidence():


confidence = min(100, max(0, 100 + 5*certainty))

By further simple study of comments in the code you can find that:
- "confidence": ...should be interpreted as a percent probability (0.0f-100.0f)
- "certainty" in the formula: ...is the min (worst) certainty of the individual blobs in the word.
- "certainty" of the blob: ...is a number in [-20, 0] indicating the classifier certainty of the choice. In terms of probability, certainty = 20 (k log p) where k is defined as above to normalize -klog p to the range [0, 1].

That said, it seems that finally, "confidence" is the "p" of the worst blob, in percents. So it can be in [0, 100], and "certainty" is clearly defined to be in [-20, 0].

Why you're getting a positive certainty value for your word - that's a question for developers. You may supply your test case to https://code.google.com/p/tesseract-ocr/issues/list and find out the answer.

Gunasekaran Velu

unread,
Apr 11, 2015, 2:50:59 AM4/11/15
to tesser...@googlegroups.com
Thank you sir.

I will check that.

Regards
Guna
Reply all
Reply to author
Forward
0 new messages