Pretty bad result for non dictionary words

unread,

May 5, 2016, 8:23:13 AM5/5/16

to tesseract-ocr

I try to recoginze product codes written in images.

The results in tesseract 3.04.00 are pretty bad. Even when I try a primitive example (see attachment) it won't work.

Instead "ABC-DEF" I get "AECVDEF"

The example works flawlessy in gocr but guess I'm just using wrong settings or something similar in tesseract.

Or is tesseract in general not designed to recognize "random strings" and I should rather use another tool (recommendations?) for this case?

ocr.jpg

unread,

May 6, 2016, 12:41:06 AM5/6/16

to tesseract-ocr

If you resize with convert from ImageMagick (or any other tool):

convert ocr.jpg -resize 150% ocr2.jpg

then

tesseract ocr2.jpg ocr2 ; cat ocr2.txt

gives

ABC-DEF

ocr2.jpg

unread,

May 12, 2016, 5:39:23 PM5/12/16

to tesseract-ocr

Hi Rolf,

thank you for your response.

Is this the "right" way? I read that I should rather use proper settings in tesseract than doing manual processing.

Are smaller texts a problem in general?

unread,

May 14, 2016, 11:52:24 AM5/14/16

to tesseract-ocr

On Thursday, May 12, 2016 at 5:39:23 PM UTC-4, Christian Koch wrote:

Are smaller texts a problem in general?

Tom

Reply all

Reply to author

Forward