OCR Failing to Consistenly Recongnize the single digit in my screenshot

93 views
Skip to first unread message

Sean Connell

unread,
May 7, 2019, 2:49:52 AM5/7/19
to tesseract-ocr
Currently my program searches for the picture of the word Opponents on the screen then moves a bit a takes a picture of the number below it.

When using pytesseract to try and convert the photo of the single number to a string it often grabs random characters such as "va, or "a" instead of the number causing the line which converts the string to an integer to fail.

Above is a attached screenshot of what my program is currently taking a picture of. Any help to give me an idea on how to better pre-process the screenshot so it gets read with better accuracy would be much appreciated as I'm new to this.

{Part of Code I'm Using}

import pyautogui
import time
import pytesseract
import cv2
import imutils
from PIL import Image

pytesseract.pytesseract.tesseract_cmd = r'S:\Tesseract\tesseract.exe'

opponent= None
while opponent is None:
    opponent=pyautogui.locateOnScreen(r'S:\Downloads 2\oppon.png', grayscale = True, confidence=.8)
print(opponent)
x=opponent[0]
y=opponent[1]
w=opponent[2]
h=opponent[3]
x=x+40
y=y+14
w=w-72

if opponent is not None:
    im2 = pyautogui.screenshot(r'S:\Downloads 2\my_screenshot2.png', region=(x,y,w,h))
    newim2=im2.resize((500,360), Image.ANTIALIAS)
    newim2.save(r'S:\Downloads 2\my_screenshot2.png', "PNG", optimize=True)
   
loopTest=(pytesseract.image_to_string(newim2, config='--psm 8 --oem 3'))
print(loopTest)
loopTest=int(loopTest)
my_screenshot2.png
oppon.png

Lorenzo Bolzani

unread,
May 7, 2019, 5:04:47 AM5/7/19
to tesser...@googlegroups.com
Hi, try to invert the images (black text on white) and use psm 6 or 7.

Increasing contrast may also help. 


Lorenzo 


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/039f25c3-d211-406e-8012-92a8bbe7edf5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sean Connell

unread,
May 7, 2019, 9:09:15 AM5/7/19
to tesseract-ocr
Thanks a bunch for the response. How would I go about inverting the image and increasing the contrast though. Sorry I'm still learning how all this works.

Zdenko Podobny

unread,
May 7, 2019, 9:52:21 AM5/7/19
to tesser...@googlegroups.com
modify last part of your code to this:

# invert image and convert to grayscale
inverted = PIL.ImageOps.invert(newim2).convert('LA')
loopTest = (pytesseract.image_to_string(
    inverted, config=tessdata_dir_config + '--psm 8 --oem 3'))
print(loopTest)
loopTest = int(loopTest)

Do not forget to import PIL and set up tessdata_dir_config as stated in pytesseract docs[1].


Zdenko



ut 7. 5. 2019 o 15:09 Sean Connell <nightfire...@gmail.com> napísal(a):
Thanks a bunch for the response. How would I go about inverting the image and increasing the contrast though. Sorry I'm still learning how all this works.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Sean Connell

unread,
May 7, 2019, 12:00:56 PM5/7/19
to tesseract-ocr
Thank you I'll give that a go and see if it works any better. Is it worth trying to increase the contrast as well?


On Tuesday, May 7, 2019 at 9:52:21 AM UTC-4, zdenop wrote:
modify last part of your code to this:

# invert image and convert to grayscale
inverted = PIL.ImageOps.invert(newim2).convert('LA')
loopTest = (pytesseract.image_to_string(
    inverted, config=tessdata_dir_config + '--psm 8 --oem 3'))
print(loopTest)
loopTest = int(loopTest)

Do not forget to import PIL and set up tessdata_dir_config as stated in pytesseract docs[1].


Zdenko



ut 7. 5. 2019 o 15:09 Sean Connell <nightfire...@gmail.com> napísal(a):
Thanks a bunch for the response. How would I go about inverting the image and increasing the contrast though. Sorry I'm still learning how all this works.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Zdenko Podobny

unread,
May 7, 2019, 12:41:33 PM5/7/19
to tesser...@googlegroups.com
This change was sufficient. BTW: I use data from best repository[1]

ut 7. 5. 2019 o 18:01 Sean Connell <nightfire...@gmail.com> napísal(a):
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Sean Connell

unread,
May 7, 2019, 1:57:18 PM5/7/19
to tesseract-ocr
So I added this line of code tessdata_dir_config = r'--tessdata-dir "S:\Tesseract\tessdata"' and downloaded the English repository from the link you provided but now I get an error (see attached picture).
pythonw_2019-05-07_13-54-48.png

Zdenko Podobny

unread,
May 7, 2019, 2:47:07 PM5/7/19
to tesser...@googlegroups.com
you need to add space  at the end of  tessdata_dir_config because later you add to it another string with configurations:

tessdata_dir_config = r'--tessdata-dir "S:\Tesseract\tessdata" '

Zdenko


ut 7. 5. 2019 o 19:57 Sean Connell <nightfire...@gmail.com> napísal(a):
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Sean Connell

unread,
May 7, 2019, 3:21:57 PM5/7/19
to tesseract-ocr
Thanks a bunch it works now. The only issue it has is when trying to detect the number 1 for some reason it just thinks nothing is there.
my_screenshot2.png
pythonw_2019-05-07_15-20-03.png

Lorenzo Bolzani

unread,
May 7, 2019, 3:26:32 PM5/7/19
to tesser...@googlegroups.com
This is where you need to improve contrast.


You need to play a little with PIL to find out what works best for your data.


Lorenzo

Il giorno mar 7 mag 2019 alle ore 21:21 Sean Connell <nightfire...@gmail.com> ha scritto:
Thanks a bunch it works now. The only issue it has is when trying to detect the number 1 for some reason it just thinks nothing is there.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Zdenko Podobny

unread,
May 7, 2019, 3:26:50 PM5/7/19
to tesser...@googlegroups.com
probably because it is recognized as "l" instead of 1 and you can not convert letter to integer.

Zdenko


ut 7. 5. 2019 o 21:21 Sean Connell <nightfire...@gmail.com> napísal(a):
Thanks a bunch it works now. The only issue it has is when trying to detect the number 1 for some reason it just thinks nothing is there.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Sean Connell

unread,
May 7, 2019, 3:41:17 PM5/7/19
to tesseract-ocr
I see so should I fool around with the contrast or is there a way to make it so it while only use number selection when reading the image?


On Tuesday, May 7, 2019 at 3:26:50 PM UTC-4, zdenop wrote:
probably because it is recognized as "l" instead of 1 and you can not convert letter to integer.

Zdenko


ut 7. 5. 2019 o 21:21 Sean Connell <nightfire...@gmail.com> napísal(a):
Thanks a bunch it works now. The only issue it has is when trying to detect the number 1 for some reason it just thinks nothing is there.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Zdenko Podobny

unread,
May 7, 2019, 4:04:19 PM5/7/19
to tesser...@googlegroups.com
Sorry I misread the python message: actually tesseract did not find there anything (which at the end is the same as there is l ;-) )

First of all I would stop with that idiotic resizing. For tesseract is AFAIR is best to have letter at size 13-30 px. Your as 240!
Try to provide original captured image. 


Zdenko


ut 7. 5. 2019 o 21:41 Sean Connell <nightfire...@gmail.com> napísal(a):
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Sean Connell

unread,
May 7, 2019, 4:36:20 PM5/7/19
to tesseract-ocr
Yeah my bad seems that just removing the resize code has made it able to actually detect the 1 as well thanks for all the help.

On Tuesday, May 7, 2019 at 4:04:19 PM UTC-4, zdenop wrote:
Sorry I misread the python message: actually tesseract did not find there anything (which at the end is the same as there is l ;-) )

First of all I would stop with that idiotic resizing. For tesseract is AFAIR is best to have letter at size 13-30 px. Your as 240!
Try to provide original captured image. 


Zdenko


ut 7. 5. 2019 o 21:41 Sean Connell <nightfire...@gmail.com> napísal(a):
I see so should I fool around with the contrast or is there a way to make it so it while only use number selection when reading the image?

On Tuesday, May 7, 2019 at 3:26:50 PM UTC-4, zdenop wrote:
probably because it is recognized as "l" instead of 1 and you can not convert letter to integer.

Zdenko


ut 7. 5. 2019 o 21:21 Sean Connell <nightfire...@gmail.com> napísal(a):
Thanks a bunch it works now. The only issue it has is when trying to detect the number 1 for some reason it just thinks nothing is there.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9e2877fe-bf89-4446-bfc7-021a5bb7de86%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
Reply all
Reply to author
Forward
0 new messages