tesseract ocr wont read the letters in the attached chart

65 views
Skip to first unread message

JJ

unread,
Dec 11, 2018, 2:39:14 AM12/11/18
to tesseract-ocr

Hi

I have been trying to get tesseract ocr api and command line to recognize and locate the letters in the attached pic with no success.

I have modified the image, added blur and/or sharpened with no luck. To me it doesn't seem it should be that challenging

Anyone has any idea why?

I am using version 4.0 of the SDK
ImageLetterMatchImage#1 (3).jpg

Zdenko Podobny

unread,
Dec 13, 2018, 3:49:44 PM12/13/18
to tesser...@googlegroups.com
I am afraid you need to first implement some text detection algorithm for images like this... There is to much noise...

You wrote nothing about language you use. I just quickly search at internet I found e.g. this[1] solution in python, that correctly identified text regions in you image 


ut 11. 12. 2018 o 8:39 JJ <joaquin....@gmail.com> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1f2314ba-a1cb-4a60-9f58-894f2692f083%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
text_detect.png
Figure_1.png

JJ

unread,
Dec 18, 2018, 4:55:53 PM12/18/18
to tesseract-ocr

Thanks  a lot

Actually tesseract 3.0 does recognize it. Unfortunatlly the c# api was based on tesseract 3.1 so I had to write a pinvoke for  tesseract 3.0 and problem solved.
Reply all
Reply to author
Forward
0 new messages