put them to blacklist with variable tessedit_char_blacklist (search forum if you do not know how).In my simple testing, I find this most common problem, is there a way to instruct tesseract not to use those glyphs without limiting it to ASCII? I use tesseract 3.01 BTW
How did you format your config file? I tried adding the following line and it doesn't seem to work:tessedit_char_blacklist fi
-----
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/jO_4ZMMK9xw/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
How did you format your config file? I tried adding the following line and it doesn't seem to work:tessedit_char_blacklist fi
On Sunday, April 1, 2012 5:16:59 AM UTC-4, klo wrote:
--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
---
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/jO_4ZMMK9xw/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
Yes, I'm doing something similar in python. Do you know of a list of a ligatures so I can convert them to ascii? I know fi and fl are the most popular, but there are probably many more.