Pytesseract used with captcha images unable to recognize characters with lines on top

454 views
Skip to first unread message

Rutanshu Jhaveri

unread,
May 7, 2018, 1:07:40 AM5/7/18
to tesseract-ocr
script.py

cv2.imwrite(filename, imgOP)
text = pytesseract.image_to_string(Image.open(filename))
print(text)

Within the files that I have attached, for Output1.png while using pytesseract I get the following in the console 

PGKQKf

Instead of getting 

PGKQKT

And as for the the second image - Output2.png there is no output in the console however the output should be

KEEXZ1

All these images have been processed using medianBlur filter and thresholding to remove noise and lines however sometimes thick lines often do stay hence such an output is seen.


Output1.png
Output2.png

Lorenzo Bolzani

unread,
May 7, 2018, 3:06:39 AM5/7/18
to tesser...@googlegroups.com
Try to get rid of all the noise/lines, you can use denoise before binarization or component analysis. Then remove the white border so all the fragments have the same size.
Try to do this with gimp and see if it helps before coding it.

Then try

psm=8

it means "single word" (this should fix the problem with the second image).

If you are using version 3.05 also use whitelist to limit chars to uppercase letters. Also try to recognize the same images a few times in a loop: you should see accuracy increasing while the adaptive learning kicks in.

I also suggest using tesserocr as python bindings (pytesseract invokes an external process every time and is very slow).


Bye

Lorenzo



--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/fb0e97c0-ae89-4c0b-bfe2-c9e8b1ae4d6b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages