Confidence value for each character

98 views
Skip to first unread message

hvthaibk

unread,
Jun 28, 2009, 5:45:56 PM6/28/09
to tesseract-ocr
Hi,

I am wondering if there is anyway to get the confidence value for each
recognized character in tesseract.

waiting for the replies..

Thanks

Thai

Ray Smith

unread,
Jun 29, 2009, 9:16:19 PM6/29/09
to tesser...@googlegroups.com
You can use TessBaseAPI::TesseractExtractResult, but you will have to hack the code a bit to do it, as it is a protected member. If we can correct the way ocropus uses tesseract, we can make this a useful single public member that anyone can use.
Ray.

Yury Tarasievich

unread,
Jun 30, 2009, 5:27:18 AM6/30/09
to tesser...@googlegroups.com
Ray Smith wrote:
> You can use TessBaseAPI::TesseractExtractResult, but you will have to
> hack the code a bit to do it, as it is a protected member. If we can
> correct the way ocropus uses tesseract, we can make this a useful single
> public member that anyone can use.

That would be a useful functionality, yes.
Previously, I wondered whether tesseract may be
made to use, e.g., confidence thresholds if needed.

--

hvthaibk

unread,
Jul 3, 2009, 9:43:36 AM7/3/09
to tesseract-ocr
Thank you very much for your reply.

I have tried using TessBaseAPI::TesseractExtractResult but the
confidence values turn out to be exactly the same for those characters
belonging to one word. Is there any way to get the confidence values
for each character regardless of their neighborings?

Moreover, the confidence values usually above 100. Is there anything
wrong here as tesseract produces confidence values in the range 0-100
only?

Thai

On Jun 30, 11:27 am, Yury Tarasievich <yury.tarasiev...@gmail.com>
wrote:

Ray Smith

unread,
Jul 3, 2009, 7:22:49 PM7/3/09
to tesser...@googlegroups.com
OK, you are right. The individual character confidences are lost deep in the guts of the code in 2.04.
There is a change in 3.00 to allow them to be passed back to the api level.

The function in baseapi.cpp to convert the ratings for ocropus is incorrect. Maybe this explains why ocropus doesn't do so well.
In the call to rating_to_cost, it should be using best_choice->certainty() instead of rating(), and in rating_to_cost itself, it should be using
100 + 5*rating; instead of 100+ rating. This will give a better scale and leave it with 100 = best and 0 = worst.

Ray.

hvthaibk

unread,
Jul 6, 2009, 7:14:58 PM7/6/09
to tesseract-ocr
Thank you very much Ray, that solves the problem. I am looking forward
to version 3.00

Thai
Reply all
Reply to author
Forward
0 new messages