PyTesseract not recognizing decimal points

134 views
Skip to first unread message

Andrew

unread,
Oct 6, 2020, 12:12:33 AM10/6/20
to tesseract-ocr
As per my question on StackOverflow:  PyTesseract not recognizing decimals

I'm using PyTesseract to recognise text in table cells. When it comes to recognising drug doses with decimal points, the OCR fails to recognise the period character ( . ) , though is accurate for everything else. I'm using tesseract v5.0.0-alpha.20200328 on Windows 10.

My pre-processing consists of upscaling by 400% using cubic, conversion to black and white, dilation and erosion, morphology, and blurring. I've tried a decent combination of all of these (as well as each on their own), and nothing has recognized the .

I've tried --psm of various values as well as a character whitelist. I believe the font is Sergoe UI.

Before processing:  S87rd.png

After processing:  OFjoL.png

PyTesseract output: 25mg »p

Processing code attached

code.py.txt

Shree Devi Kumar

unread,
Oct 6, 2020, 4:06:39 AM10/6/20
to tesseract-ocr
Have you tried cropping the image to remove the arrowhead to see if that improves the result?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5c754a36-a0e4-427f-9650-f41200a1cda5n%40googlegroups.com.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Andrew

unread,
Oct 19, 2020, 5:49:33 AM10/19/20
to tesseract-ocr
Fixed! Thank you, your suggestion worked.
Reply all
Reply to author
Forward
0 new messages