Unable to recognise the text with the traineddata

koushik v

unread,

Jul 20, 2016, 9:07:11 AM7/20/16

to tesseract-ocr

Hi,

I took a screenshot of text with font helvetica neue which can be seen in the attachment.Then i trained tesseract using the helvetica and helvetica nue fonts available in mac and generated a training data. I hoped that tesseract would identify the text perfectly using the trained data but it did not work as can be seen in the screenshot.

Can anyone suggest what i am missing here to make this work?

Thank you

extract0.981179.png

Screen Shot 2016-07-20 at 2.05.28 PM.png

ShreeDevi Kumar

unread,

Jul 22, 2016, 3:30:53 AM7/22/16

to tesser...@googlegroups.com

https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/fc74c251-3603-4fbc-9aa2-6e1bea96fc75%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ashish goel

unread,

Jul 26, 2016, 12:52:34 AM7/26/16

to tesser...@googlegroups.com

Instead of retraining font, you should focus on pre-processing image. One option that worked in this particular case was resizing the image.

I did (tesseract was able to read the image)
$ convert a.png -resize 170% b.png
$ tesseract b.png stdout -l eng --tessdata-dir /usr/share/tesseract-ocr/tessdata

Error in pixGenHalftoneMask: pix too small: w = 301, h = 53
Kaushik

Ashish

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVKi5px%2BFQ_-L7pahi9%3DxZCrD%2B3vVW_nouJWYAr%3DhQhbw%40mail.gmail.com.

Reply all

Reply to author

Forward