When I generate a TIFF from text file
with jTessBoxEditor, in the TIFF image all complex conjunct letters in my language (oriya) are
broken down into component letters. Here is a screenshot ! http://imgur.com/GTY7wt7
The one on left is how it should be and the one on right is the output from jTessBoxEditor. Each one correspond with their counterpart on right. The box file generated has the correct character but incorrect image data as the TIFF is wrong. So when I use the generated traineddata file, the simple letters get detected fine but the complex letters screw up.
Any suggestions?
Nguyen (program creator) says the problem is with java, so I've decided to use qtboxcreator to create boxes and the subsequent work is handled by jTessBoxEditor.
Thank you for the input. I'll surely check it out when I get a break.
--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/6yAO8LQHgps/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0aecf2ff-66df-47d1-9d7f-76021662c0ee%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.