Unable to recognise the text with the traineddata

1,078 views
Skip to first unread message

koushik v

unread,
Jul 20, 2016, 9:07:11 AM7/20/16
to tesseract-ocr
Hi,

I took a screenshot of text with font helvetica neue which can be seen in the attachment.Then i trained tesseract using the helvetica and helvetica nue fonts available in mac and generated a training data. I hoped that tesseract would identify the text perfectly using the trained data but it did not work as can be seen in the screenshot.

Can anyone suggest what i am missing here to make this work?

Thank you 
extract0.981179.png
Screen Shot 2016-07-20 at 2.05.28 PM.png

ShreeDevi Kumar

unread,
Jul 22, 2016, 3:30:53 AM7/22/16
to tesser...@googlegroups.com

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/fc74c251-3603-4fbc-9aa2-6e1bea96fc75%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ashish goel

unread,
Jul 26, 2016, 12:52:34 AM7/26/16
to tesser...@googlegroups.com
Instead of retraining font, you should focus on pre-processing image. One option that worked in this particular case was resizing the image.

I did (tesseract was able to read the image)
$ convert a.png -resize 170% b.png
$ tesseract b.png stdout -l eng --tessdata-dir /usr/share/tesseract-ocr/tessdata

Error in pixGenHalftoneMask: pix too small: w = 301, h = 53
Kaushik


Ashish

Reply all
Reply to author
Forward
0 new messages