Tesseract 3.02 fails to identify some characters

35 views
Skip to first unread message

Thilina Jayathilaka

unread,
May 19, 2017, 1:01:43 PM5/19/17
to tesseract-ocr

I'm working on a c++ project where I need to OCR some text fields. I'm using Tesseract version 3.02 c++ API functions to achieve this. But the OCR results differ from the image.

The following image reads as "31 SW19 SQU" when I use GetUTF8Text() function. 


"31 SW19 SQU"


and the following image as "31 SW19 3OU".

 "31 SW19 3OU"


One problem is tesseract identifies the first character as "3" and fails to identify it within "3QU" correctly.

Can someone explain to me why the tesseract fails to identify these images or any guidance to fix the issue?

akhil katpally

unread,
May 19, 2017, 2:34:58 PM5/19/17
to tesseract-ocr
Don't know exactly why it is recognizing incorrectly .... but, here is what i would suggest to try ...  
Try tesseract 4.0 with neural network, i found much better than original tesseract...  
try to see bounding boxes on each character ... it may give you an idea ..  
Reply all
Reply to author
Forward
0 new messages