Extracting black & white text from image

706 views

Skip to first unread message

Edoardo Conti

unread,

Aug 23, 2019, 11:42:48 PM8/23/19

to tesseract-ocr

I am using tesseract to extract a bunch of sparse numbers from an image for a Poker application I am working on. I have tweaked the settings a bit and am getting decent results, but am still missing several numbers from the image that I'd need. Specifically, I am missing all the player numbers (the 1 - 6 labels in the small circles), and the small $ values ($0.05, $0.15, $0.37, etc.). I think the issue is that the image contains both black and white text.

Any advice on preprocessing I could do to improve this or settings to change in tesseract would be appreciated.

Code below:

from PIL import Image
import pytesseract


img = Image.open(path).convert('L')

print(pytesseract.image_to_string(img, lang='eng', \
    config='--psm 11 -c tessedit_char_whitelist=0123456789$.'))

And output:

$ python test.py
08

$0.02$0.05

$1.50

$4.12

$2.56

3

$2.39

$4.33

$1.52

Clint William Theron

unread,

Aug 24, 2019, 10:24:17 AM8/24/19

to tesser...@googlegroups.com

Didi you try inverting the image? Like the attached image. Maybe grey scale too like so:

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/82355411-a164-4864-8b0f-5dd1ce08fa83%40googlegroups.com.