I am using tesseract to extract a bunch of sparse numbers from an image for a Poker application I am working on. I have tweaked the settings a bit and am getting decent results, but am still missing several numbers from the image that I'd need. Specifically, I am missing all the player numbers (the 1 - 6 labels in the small circles), and the small $ values ($0.05, $0.15, $0.37, etc.). I think the issue is that the image contains both black and white text.
Any advice on preprocessing I could do to improve this or settings to change in tesseract would be appreciated.
Code below:
from PIL import Image
import pytesseract
img = Image.open(path).convert('L')
print(pytesseract.image_to_string(img, lang='eng', \
config='--psm 11 -c tessedit_char_whitelist=0123456789$.'))And output:
$ python test.py
08
$0.02$0.05
$1.50
$4.12
$2.56
3
$2.39
$4.33
$1.52--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/82355411-a164-4864-8b0f-5dd1ce08fa83%40googlegroups.com.