different outputs from same image while using Tesseract 3.0.5

94 views
Skip to first unread message

Canberk Ozdemir

unread,
Sep 28, 2017, 10:47:06 AM9/28/17
to tesseract-ocr
Hello all,

I couldn't find any post related to the subject. Is it possible that Tesseract can -slightly- give different outputs running on the same image in different runs (with same configurations.)

In some scanned documents having some noises, I get some different characters in outputs. I can say %95 of characters are the same.

Regards,

Canberk

David Sixela

unread,
Sep 28, 2017, 11:20:50 AM9/28/17
to tesseract-ocr
I had the same problem, getting different outputs between 3.04 and 3.05, the only solution i've found was to upgrade to tesseract 4.0

Canberk Ozdemir

unread,
Sep 29, 2017, 1:59:07 AM9/29/17
to tesseract-ocr
Hello David,

Yesterday, a colleague of mine figured out that when a Tesseract variable called classify_enable_learning is disabled, it no longer gives different results. 

I think these variables are stated here with their descriptions --> http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version 

Regards

David Sixela

unread,
Sep 30, 2017, 1:25:29 PM9/30/17
to tesseract-ocr
Hi Canberk,

Thanks for your answer, i didn't know about that variable.
Reply all
Reply to author
Forward
0 new messages