Pretty bad result for non dictionary words

117 views
Skip to first unread message

Christian Koch

unread,
May 5, 2016, 8:23:13 AM5/5/16
to tesseract-ocr
I try to recoginze product codes written in images.
The results in tesseract 3.04.00 are pretty bad. Even when I try a primitive example (see attachment) it won't work.

Instead "ABC-DEF" I get "AECVDEF"

The example works flawlessy in gocr but guess I'm just using wrong settings or something similar in tesseract.
Or is tesseract in general not designed to recognize "random strings" and I should rather use another tool (recommendations?) for this case?
ocr.jpg

Rolf Mertig

unread,
May 6, 2016, 12:41:06 AM5/6/16
to tesseract-ocr
If you resize with convert from ImageMagick (or any other tool):
convert ocr.jpg -resize 150% ocr2.jpg
then 
tesseract ocr2.jpg ocr2 ; cat ocr2.txt
gives 
ABC-DEF
ocr2.jpg

Christian Koch

unread,
May 12, 2016, 5:39:23 PM5/12/16
to tesseract-ocr
Hi Rolf,

thank you for your response.
Is this the "right" way? I read that I should rather use proper settings in tesseract than doing manual processing.

Are smaller texts a problem in general?

Tom Morris

unread,
May 14, 2016, 11:52:24 AM5/14/16
to tesseract-ocr
On Thursday, May 12, 2016 at 5:39:23 PM UTC-4, Christian Koch wrote:

Are smaller texts a problem in general?

Reply all
Reply to author
Forward
0 new messages