i am not able to get the *** characters properly

106 views
Skip to first unread message

arunc...@gmail.com

unread,
Mar 2, 2019, 1:32:34 AM3/2/19
to tesseract-ocr
I tried following code . I want to extract text along with *** symbol . I tired following code 

import cv2
import pytesseract
import numpy as np


def image_resize(image, width = None, height = None, inter = cv2.INTER_AREA):
    # initialize the dimensions of the image to be resized and
    # grab the image size
    dim = None
    (h, w) = image.shape[:2]

    # if both the width and height are None, then return the
    # original image
    if width is None and height is None:
        return image

    # check to see if the width is None
    if width is None:
        # calculate the ratio of the height and construct the
        # dimensions
        r = height / float(h)
        dim = (int(w * r), height)

    # otherwise, the height is None
    else:
        # calculate the ratio of the width and construct the
        # dimensions
        r = width / float(w)
        dim = (width, int(h * r))

    # resize the image
    resized = cv2.resize(image, dim, interpolation = cv2.INTER_LINEAR)

    # return the resized image
    return resized


img = cv2.imread('test.jpg' ,0)
img =  image_resize(img, height = 4000)


print(pytesseract.image_to_string(img,  config=' -c textord_heavy_nr=0 textord_noise_area_ratio =100 textord_max_noise_size = 154  --psm 11 ' ))
test.jpg

estel...@gmail.com

unread,
Mar 2, 2019, 7:42:05 PM3/2/19
to tesseract-ocr
I have similar issues.
The only thing that helped me - confidence level for those "words" is very low (about 0), so I could filter them out (it was acceptable in my case).
The same issue arises when there are multiple dots (>3) after normal text.

суббота, 2 марта 2019 г., 17:02:34 UTC+10:30 пользователь arunc...@gmail.com написал:

Juan Carlos Moreno Rogel

unread,
Mar 7, 2019, 2:19:25 PM3/7/19
to tesseract-ocr
I was able to get better results by playing with the psm

tesseract --psm 12 -l eng file.jpg output
Reply all
Reply to author
Forward
0 new messages