Reading game screenshots, completely lost.

130 views
Skip to first unread message

david...@gmail.com

unread,
Sep 13, 2016, 2:41:29 AM9/13/16
to tesseract-ocr

I just started using tesseract today with PyTesseract. I'm trying to have it read text from a game, it seems easy enough but I'm completely stuck. The picture is above is one I was trying to read. I thought it was a good test since there is just a black background and nothing to really mistake for a character. I can't get this to read at all though. If I scan in English it doesn't recognize Japanese. If I scan in Japanese it inserts kanji in place of English letters. If I use both it still misses a bunch of writing. I've tried just cropping down to the dialog box, but then it only reads the last 2 lines. If I shrink the image it'll only read the first half of the top two lines. Changing to gray-scale did nothing, forcing it to full contrast (either black or white pixels) made it worse. Also, when just scanning the dialog box it got basically every kanji wrong, it skipped other characters entirely, it also reads ら as a 6. Is this a font issue? Or does anyone know if there's some tricks to help with this? I'd appreciate any suggestions since I can't seem to get it to go anywhere.

david...@gmail.com

unread,
Sep 14, 2016, 8:19:01 PM9/14/16
to tesseract-ocr

I'm not sure if I'm supposed to reply to my own question, but I figured I'd share some progress in case someone can give me an idea to go from here. I changed the image to grayscale and then used win32api to get the color of the pixels in the words (probably should've just used paint.net). I've seen people say it does better with black text on white backgrounds so I made a function that checks each pixel to see if it's within a tolerance of the value I got for the words, this is because the words seem to have a weird 8-bit bevel effect so the pixels are lighter in the center of symbols and darker towards the sides. When a pixel fell in the tolerance I changed it's value to pure black, and all pixels than didn't were changed to white. I also scaled the image size up to 4x it's original size. Which helped with the reading a bit.

The results of the Tesseract scan were:
剛坤剛艶

縄〔廿鵬 Ti…曹
B震廿ー尋 鵬〔 2

典肛亦立叫匠
ー席 叫換立『

     

技~ アイテムをえらぶウィンドウを
問いていると~ 時間が止まります。
ゅっ くりと華糞りゃくを祖りながら
バ ト ルがで耆ます ゥ

A worry I have now is that the "Tolerance" I had was 65! Grayscale only has an intensity value between 0 and 255 as I understand it. So in order to get the words to show up I needed to look for pixels in a range of 130 just over half the intensity spectrum. If I were to move from this to a more busy screen that had sprites on it I'd imagine a bunch of pixels would be turned black while trying to get the writing to clear up and separating the actual text reading from the noisy garbage you get trying to read a black splotch is going to be hard. Is there any way to do this?

david...@gmail.com

unread,
Sep 16, 2016, 12:48:01 AM9/16/16
to tesseract-ocr
Well again I'm not sure if I'm supposed to reply to my own topic but as I thought using the original method I used on that selection screen leads to complete crap, too much of the image is left and tesseract literally gives up and returns an empty string. So I made another filter to pass the image through, so now with 2 filters on the image it can read relatively reliably, but still specs are left that it tries to read. The first picture is the original image and the other is that image after the 2 filters.



The results of the tesseract scan were:

」ブ「.'~ー ' .'~ー ' .'~ー ' .'~ー ' .'~ー ' .'~ー '

鱒` 私、 お藁り見に来たんだ。

` ねえ、 あなたこの町の人でしょ ?
一人じゃ面臼くないもん。


The first line is obviously it trying to read the specs left over on the top. and aside from an extra kanji and an apostrophe the reading is right on. I tried it on another screenshot with different sprites and got.

)“シナ「とつせゅつへ` こうふんして
` ねつけなかったんで しょ?
ま 、 建国千年のお祭りだから
無理ないけど ・・・・・・


Which is again pretty close, the extra )" and the つ in the first line is actually a う, and the へ is actually a べ which are easy to mess up. But the problem is the filters are really specific to this game at the moment and I was hoping to keep them more generalized also there's no way to really tell if a a kanji that's in the reading has been placed there in error. I want to make a program that periodically tries to read text from the game as I'm playing and perform some functions on it. Any ideas? I may just end up looking into another route, this one seemed the simplest but the errors could mess up the functionality I'm trying to achieve.

Rich Taylor

unread,
Sep 16, 2016, 11:12:01 AM9/16/16
to tesseract-ocr
You're not lost - doing quite well I think.

Tesseract OCR only really reads black text on white background, so your approach of
processing the image to get that is good (and would fix about 1/2 of the other issues
which people report here...)

The original text is white with a black drop-shadow (to the right & down directions).
So, process for white original pix to be black in OCR image and anything darker
to be white. (This may be what you've done already.). This is a combination of
Inversion and binarization.

These characters are fairly blocky - due to the low res original art. If they are still blocky
after conversion to b/w then you may be able to fill in the blocks by using a dialate and
erode sequence (std. image proc ops, look up...) to fill the gaps somewhat intelligently.
This may help the recognition rates.

I can think of two approaches to address the specks at the top - either a noise elimination
image processing step or, maybe, a windowed approach to binarization. The simplest
binarization technique is the one you are already using - a fixed threshold value for deciding
black or white. A more complex approach is to vary the threshold value based on a window of
surrounding pixel values. Research "Sauvola binarization" for details on a proven algorithm.

It's nicer to figure out what image processing is needed without extensive programming work.
Once you know what operations/algorithms are needed then you can call them from a
(hopefully) free and easy to use (and debugged) library (ex. OpenCV?). To experiment
like this I use the demo program for Accusoft's ScanFix library - it lets you process images
with a sequence of pretty low level ops. There are probably other "image processing
laboratory" apps available. A paint program or viewer (Paint.NET, IrfanView) can do a lot
of these processing ops, but often not in a way that gives you access to the low level
details (like, choice of binarization algorithm, etc)

Finally, no OCR system is perfect - if your project requires perfect OCR then maybe rethink it
(Or buy a commercial OCR engine that can recognize 99+%, though, still not perfect...)

- Rich

Reply all
Reply to author
Forward
0 new messages