Unable to identify image for number 5 using Eng trained data

vinayb...@gmail.com

unread,

Oct 27, 2018, 2:52:45 AM10/27/18

to tesseract-ocr

I am using Pytesseract to recognise an image for number 5 and I'm stunned that even after applying various filters like GlaussianBlur and Threshold and applying dilation and erosion to remove the noise it still not able to identify the image.

I am using Eng Trained data by default. Not sure where I am going wrong. Do I need to include any other training file here?

Filters Tried:

1: cv2.threshold(cv2.GaussianBlur(img, (9, 9), 0), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1],

2: cv2.threshold(cv2.GaussianBlur(img, (7, 7), 0), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1],

3: cv2.threshold(cv2.GaussianBlur(img, (5, 5), 0), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1],

4: cv2.threshold(cv2.medianBlur(img, 5), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1],

5: cv2.threshold(cv2.medianBlur(img, 3), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1],

6: cv2.adaptiveThreshold(cv2.GaussianBlur(img, (5, 5), 0), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2),

7: cv2.adaptiveThreshold(cv2.medianBlur(img, 3), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2),

Training Data:

eng.traineddata

Original Image: See Attached

five_filter_5.jpg

five_filter_7.jpg

Vinod Gattani

unread,

Oct 27, 2018, 4:48:37 AM10/27/18

to tesser...@googlegroups.com

I used this command:

tesseract five_filter_5.jpg ocr.txt --oem 1 --psm 6 -l eng

I used "eng.traineddata" from tessdata_best repo.

It gave "5" in ocr.txt.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ec25b1e1-c9f3-4743-b2fd-6efdd2a978f6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

vinayb...@gmail.com

unread,

Oct 27, 2018, 6:16:04 AM10/27/18

to tesseract-ocr

Can you try this new attached image for Alphabet "M" ?

m.PNG

Vinod Gattani

unread,

Oct 27, 2018, 7:52:34 AM10/27/18

to tesser...@googlegroups.com

It gave "|" as text.

When resized to 50*50, text is "N\". You should check whether font used in the image, is a part of fonts on which English language was trained.

Thanks

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a5efe646-bc59-4492-802a-7b69320d4430%40googlegroups.com.

Vinay Babu

unread,

Oct 27, 2018, 8:12:14 AM10/27/18

to tesser...@googlegroups.com

Well it doesn't seems to be a problem with fonts Training. I tried capturing the same image without skewness and it perfectly worked out. Not sure why tesseract doesn't works with bit skewed texts in images..

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAN557az79nE5VXF0uxq%2BsUv-p%2BfhOM1bx%3DKK9YG1zLUau2hdow%40mail.gmail.com.

vinayb...@gmail.com

unread,

Oct 27, 2018, 8:15:44 AM10/27/18

to tesseract-ocr

Here is another image where text is skewed and tesseract fails to identify it.

On Saturday, October 27, 2018 at 12:22:45 PM UTC+5:30, vinayb...@gmail.com wrote:

mericarclean1.PNG

Reply all

Reply to author

Forward