Not very accurate in reading simple equations

75 views
Skip to first unread message

Raquel Natali

unread,
Nov 7, 2021, 1:37:41 AM11/7/21
to tesseract-ocr
I am trying to identify an simple equation from an image. I have been trying to treat the image but it still does not give me a good result (sometimes it does not detect the math operator, sometimes it confuses the + with 4).

Can someone give me a tip? I already tried a lot of things.

Here is my current code:

            import cv2
            import pytesseract
import numpy as np

            img = cv2.imread("image.png")
            img = cv2.resize(img, None, fx=1.2, fy=1.2, interpolation=cv2.INTER_CUBIC)
            img = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
            img = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
            img = cv2.bitwise_not(img)
            kernel = np.ones((1, 1), np.uint8)
            img = cv2.erode(img, kernel, iterations=1)
            kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 1))
            img = cv2.dilate(img, kernel, iterations=1)
            content = pytesseract.image_to_string(img, lang="eng+equ", config="--psm 13 -c tessedit_char_whitelist=0123456789+=")
            print(content) # prints 23431 instead of 23 + 31

Here is the image captcha.png Thank you!
Reply all
Reply to author
Forward
0 new messages