Two images created in the same way. One works, the other doesn't.

87 views
Skip to first unread message

Jonas Pfannschmidt

unread,
Apr 20, 2016, 2:42:58 AM4/20/16
to tesseract-ocr
Hi,

I'm trying to automate UI tests using OCR. The goal is to have a test script with lines like: "click_text('Reports')" and it automatically clicks on the button "Report".

It works quite well ... sometimes. I've attached two sample screen captures. The text on 'works.png' gets recognized reasonably well, 'fails.png' returns only garbage. Both images have been created programmatically in the same way (capture screen, resize by factor 4, convert to greyscale). Does anybody know why one works and the other doesn't?

Best Regards,
Jonas
fails.png
works.png

Tom Morris

unread,
Apr 20, 2016, 12:59:35 PM4/20/16
to tesseract-ocr
Well, for one thing, the image that works has a lot more dark text on it, whereas the one that doesn't not only has less text, but some of the text that it has is greyed out.

At the end of the day Tesseract is going to be working on a bitonal image, since you've got a non-traditional application, I'd think you'd want to control as much of the image preprocessing as possible to make sure it's getting done in a way that's appropriate for your application, so rather than converting to greyscale, you should threshold and convert all the way down to bitonal.

Tom 

Jonas Pfannschmidt

unread,
Apr 22, 2016, 7:14:36 AM4/22/16
to tesseract-ocr
Thanks Tom. I tried your suggestion and it does work better. Python code for converting the image is here if someone is interested: https://github.com/JonasPf/ocr_testtool/blob/master/captest/ocr.py

While the problem is solved, I would still like to understand it a bit better. In 'fails.png' it doesn't even recognize the first row (Timesheet, Categories, ...) but in 'works.png' it does. Even though this part is the same in both images. I tried to cut out only that part from 'fails.png' and surprisingly the text gets recognized! So something in 'fails.png' throws it off completely to the point where it doesn't recognize text that it normally would if that something wasn't there. Any idea what that something is?
fails2.png

Jonas Pfannschmidt

unread,
Apr 22, 2016, 1:35:12 PM4/22/16
to tesseract-ocr
So I did a few more tests and I found out that I can improve the results by adding a black box (see attached image). If I combine my findings and Toms answer I come to the conclusion that tesseract probably chooses the wrong threshold because the image has not enough contrast

Thanks again!.
fails3.png

Tom Morris

unread,
Apr 25, 2016, 1:05:54 PM4/25/16
to tesseract-ocr
Yup, the adaptive thresholder is designed to work with printed pages. That's why I suggested you threshold to bitonal yourself rather than stopping at greyscale.

Tom
Reply all
Reply to author
Forward
0 new messages