Tesseract does not recognize monospaced font

49 views
Skip to first unread message

Albrecht Hilker

unread,
Sep 27, 2017, 4:54:22 PM9/27/17
to tesseract-ocr

Hello


I have the word "CONFIGURATION" in an image.
But what Tesseract recognizes is "CONF I GURATION"


Always when there is the letter "I" or "1" in a word, Tesseract recognizes this as 2 or even 3 words.


This happens with Tesseract 3.03 and with 3.05.
I use mode -psm 6


I have trained the traineddata on my own.
I told Tesseract that the font is monospaced, but it does not work.


My font_properties contains:
FontName@M 0 0 1 0 0


The Tesseract documentation says that Tesseract recognizes monospaced fonts.
But it does NOT.
The spaces around the "I" are wider than the spaces between the other characters.
And Tesseract misinterprets this as the separation between two words.


Can anybody please direct into the right direction where to search ?


Is there any configuration while recognition that I must change ?
Or is there anything when building the traineddata that I must change ?


What is the name of this problem to search for further discussion about that topic ?

Reply all
Reply to author
Forward
0 new messages