Different behaviours with the same (part of) image.

34 views
Skip to first unread message

Carlo

unread,
Jun 6, 2016, 2:58:45 PM6/6/16
to tesseract-ocr
Hi All,

I am trying to perform an OCR recognition and I am noticing a strange behaviour: with image "test1.bmp" (in attach) the engine returns:

DOP Z IT001E34343434 (with a "Z" instead of ":")

With image "test2.bmp" (in attach), that is a portion wider then test1.bmp, the engine returns:

maggiore
DOP : IT001E34343434 (correct!)

Why, with the first image, the engine detects a "Z" instead of ":" and in the other case the engine detects correctly a ":"?

Many thanks.
Best Regards.

Carlo
test1.bmp
test2.bmp

Allistair

unread,
Jun 6, 2016, 3:58:49 PM6/6/16
to tesser...@googlegroups.com
I do not have a technical reason for you but I confirm that Tesseract is sensitive to padding around words you are trying to detect (perhaps something about its page segmentation). Best to make sure text has enough white space around it in my experience.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/acc8d2fe-c2f9-469c-bbf7-bd8bd02f4dc9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ashish Goel

unread,
Jun 6, 2016, 11:59:39 PM6/6/16
to tesseract-ocr
I generally do image resizing to help me to correct errors like this.

For ex, for your test1.bmp, I did:

 convert test1.bmp -resize 400% testnew.bmp

I used imagemagick to resize the image. After this, tesseract identified ':' correctly.

Though sometimes, image resizing introduces some other errors like detection of non existing spaces, which I am still trying to figure out how to avoid.
Reply all
Reply to author
Forward
0 new messages