Poor results from simple images

109 views

Skip to first unread message

Brian Craig

unread,

Nov 2, 2016, 2:23:53 PM11/2/16

to tesseract-ocr

I'm attempting to parse some data from screenshots of a mobile game:

Since all the text is in predetermined areas I can easily grab the individual numbers to feed to tesseract:

The top image is recognized fine, for the bottom image i receive the following translation :

1 EM)

with -psm 6 or 7, without I just get EM)

As you can see I've done some modification, greyscale and inversion. I've tried scaling the image, it helps some times and breaks others.

At this point, I'm at a loss, I've done as much as I can think of to improve the results. The only thing left, is it due to the game using a weird font that Tesseract doesn't fully understand? If so, are there any good resources for training the tool for a new font? I'm pretty much a newbie when it comes to tesseract.

Anyone have any other suggestions?

-Brian

Kristofer Johansson

unread,

Nov 18, 2016, 10:54:55 AM11/18/16

to tesseract-ocr

Hello!

I am also very much a newbie but one thing I think could help a lot is if you make the image black and white (Binarisation) and not greyscale.

You could pull this off using Leptonica which seem to be commonly used with Tesseract.

(See attachment for example)

Hope that helps!

ocr_in.tif

Reply all

Reply to author

Forward

0 new messages