Fails to recognize short codes

68 views
Skip to first unread message

Daniele

unread,
Mar 27, 2023, 3:00:39 PM3/27/23
to tesseract-ocr
Hi,
using a 3rd part software (NormCap) that internally uses tesseract, I tried to recognize this code:

immagine

The result was a mess, no letter are recognized.
Also the result using CLI and tesseract was a mess. I can't undestand with params I could use to get right output.

Can someone help me?
Thanks!

nguyen ngoc hai

unread,
Mar 29, 2023, 6:50:52 AM3/29/23
to tesseract-ocr
I think you may need to do some preprocessing for your image before send it to tesseract:

For example: 
```
---------------------- ----------- gray_image -----------
---------------------- ----------- blur1 -----------
---------------------- ----------- otsu -----------
---------------------- ----------- erosion -----------
---------------------- ----------- blur -----------
---------------------- 
SINGLE_LINE 
6KDYT?79M" 

 AUTO 
6KDYT?79M" 

 RAW_LINE 
6KDYT79M 

 SPARSE_TEXT_OSD 
6KDYT?79M" 

 SINGLE_WORD 
6KDYT79M

```
As you can see, 2 PSM modes could give the correct results:

Here is the full code in python:

image_org = cv2.imread("unnamed.png")
height, width = image_org.shape[:2]

# calculate the amount of pixels to crop from the border
x_border = int(width * 0.1)
y_border = int(height * 0.1)

image = image_org[y_border:height-y_border, x_border:width-x_border]
cv2_show("image", image, 600)

gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2_show("gray_image", gray_image, 600)

blur1 = cv2.GaussianBlur(gray_image,(21,21),0)
cv2_show("blur1", blur1, 600)


# global thresholding
ret, otsu = cv2.threshold(blur1,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
cv2_show("otsu", otsu, 800)

kernel = np.ones((3,3),np.uint8)
erosion = cv2.erode(otsu,kernel,iterations = 1)
cv2_show("erosion", erosion, 800)

blur = cv2.GaussianBlur(erosion,(5,5),0)
cv2_show("blur", blur, 600)


results = get_text(255-blur)
for ret in results:
    print(ret[0][0])
    print(ret[1][0])

Hope it helps.
Regards
Hai

nguyen ngoc hai

unread,
Mar 29, 2023, 7:53:15 AM3/29/23
to tesser...@googlegroups.com
Forgot to check if the images were properly attached. Here they are:

----------- image ----------- 
image.png
 ----------------------
----------- gray_image ----------- 
image.png
 ----------------------
----------- blur1 ----------- 
image.png
 ----------------------
----------- otsu ----------- 
image.png
 ----------------------
----------- erosion ----------- 
image.png
 ----------------------
----------- blur ----------- 
image.png
 ----------------------
SINGLE_LINE
6KDYT?79M"

AUTO
6KDYT?79M"

RAW_LINE
6KDYT79M

SPARSE_TEXT_OSD
6KDYT?79M"

SINGLE_WORD
6KDYT79M

--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/IGJhwUqqnIU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/13a2a36b-c5e8-4ecf-b894-043277607831n%40googlegroups.com.


--
Nguyen Ngoc Hai

Phone:  +81 1488 4168  (JP).
skype ID: nguyenngochaibkhn.



Daniele

unread,
Mar 29, 2023, 8:11:42 AM3/29/23
to tesseract-ocr
Cool!
Thank you very much!

Daniele
Reply all
Reply to author
Forward
0 new messages