OCR not behaving well on clean image | tessaract

79 views
Skip to first unread message

boyapally srikanth

unread,
May 10, 2022, 8:12:54 AM5/10/22
to tesseract-ocr

I have been working on project which involves extracting text from an image. I have researched that tesseract is one of the best libraries available and I decided to use the same along with opencv. Opencv is needed for image manipulation.

I have been playing a lot with tessaract engine and it does not seems to be giving the expected results to me. I have attached the sample image as an reference. Output I got is:

1] =501 [

Instead, expected output is

TM10-50%L

What I have done so far:

  • Remove noise
  • Adaptive threshold
  • Sending it tesseract ocr engine

Are there any other suggestions to improve the algorithm?

Thanks in advance.

Snippet of the code:

import cv2
import sys
import pytesseract
 import numpy as np
 from PIL import Image
 if __name__ == '__main__': i
     f len(sys.argv) < 2:
          print('Usage: python ocr_simple.py image.jpg')
          sys.exit(1)
     # Read image path from command line
     imPath = sys.argv[1]
     gray = cv2.imread(imPath, 0)
     # Blur
      blur = cv2.GaussianBlur(gray,(9,9), 0)
     # Binarizing thres = cv2.adaptiveThreshold(blur, 255,   cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 5, 3)
     text = pytesseract.image_to_string(thresh)
     print(text)
6itc_cleanup.jpg

Zdenko Podobny

unread,
May 10, 2022, 1:14:19 PM5/10/22
to tesser...@googlegroups.com
You need to crop text area:
6itc_cleanup_cropped.jpg
tesseract 6itc_cleanup_cropped.jpg - --dpi 300
TH10-50%L

Zdenko


ut 10. 5. 2022 o 14:12 boyapally srikanth <srikanthbo...@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/73c2c2e1-431b-4343-9bb8-091286065159n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages