[ask] unrecognized text in particular layout

30 views
Skip to first unread message

denny.m...@gmail.com

unread,
Jun 25, 2016, 2:45:51 PM6/25/16
to tesseract-ocr
hi all,

could anyone kindly explain why the text "YTOVWG" in ocr3.jpg is not recognized
but can be recognized in ocr4.jpg ?

thank you

>tesseract ocr3.jpg f1  -l eng
Tesseract Open Source OCR Engine v3.02 with Leptonica

>type f1.txt
Booking Details
Booking Reference (PNR):

>tesseract ocr4.jpg f1  -l eng
Tesseract Open Source OCR Engine v3.02 with Leptonica

>type f1.txt
YTOVWG
ocr3.jpg
ocr4.jpg

Tom Morris

unread,
Jun 26, 2016, 12:21:04 PM6/26/16
to tesseract-ocr
On Saturday, June 25, 2016 at 2:45:51 PM UTC-4, denny.m...@gmail.com wrote:

could anyone kindly explain why the text "YTOVWG" in ocr3.jpg is not recognized
but can be recognized in ocr4.jpg ?

As a guess, I'd say because the mixed fonts on a single line and large gap before the booking reference make it think that's its a graphic rather than text.

Having said that, all the text gets recognized perfectly with 3.05dev. I'd upgrade to 3.04 and see if that fixes the problem/

Tom 
Reply all
Reply to author
Forward
0 new messages