tesseract-ocr is not identifying the words from the attached file

60 views
Skip to first unread message

rahul...@gmail.com

unread,
Jun 28, 2016, 5:47:24 AM6/28/16
to tesseract-ocr
Hello,

tesseract-ocr is not identifying the words from the attached .png file.
Please let me know what could be the reason

With Best Regards,
Rahul
tempTess.png

Allistair

unread,
Jun 28, 2016, 5:51:26 AM6/28/16
to tesser...@googlegroups.com
The font is very unusual - pixellated edges, close together and far too small. You might want to also try a different page segmentation parameter since the image contains a big blob of non-text.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/464acdf4-ca6f-44b1-a493-ad4bf992e968%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Allistair

unread,
Jun 28, 2016, 5:52:46 AM6/28/16
to tesser...@googlegroups.com
Also the magenta colour on black could be problematic - try outputting the intermediate Tesseract image to see how this gets turned into the input for Tesseract (post just last week on how to do this) to understand if you should pre-process your image e.g. if this is always the position of the patient's name then you could easily crop out a top rectangle with just the text before inputting to Tesseract.
Reply all
Reply to author
Forward
0 new messages