ImageMagick convert to tesseract-ocr fail!

82 views
Skip to first unread message

Gianfranco Cecconi

unread,
Dec 11, 2016, 12:07:46 PM12/11/16
to tesseract-ocr
Hi All,
I am a new tesseract users to forgive me if my question is naive. 

My problem is similar to what is described here. I generate perfect, hi-res text using ImageMagick's convert command line tool, and then give the result as an input to tesseract, but what I get is very bad quality. Lowercase "w" become uppercase, uppercase "X" become lowercase "h" etc. I've tested a few fonts - including OCR-A - used different color spaces, configured tesseract to ignore language dictionaries etc., I can't get to a settings that assures me a seamless conversion. However, I haven't used any training yet.

What am I missing? Is it about training? In your experience, have you found anything that assures no error while keeping the text human readable and using a non-copyrighted font?

Thanks!

Giacecco

Allistair

unread,
Dec 11, 2016, 12:16:35 PM12/11/16
to tesser...@googlegroups.com
It is usually helpful to see the image you are providing to Tesseract.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/381fb8eb-eea3-41e6-b818-558c41bd9626%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages