How to fix word split error

193 views
Skip to first unread message

Albrecht Hilker

unread,
Sep 16, 2014, 9:07:08 PM9/16/14
to tesser...@googlegroups.com
Hello

I have the following problem:
Tesseract splits a word into two words.

The image below shows the thresholded image with the recognition results.
The yellow rectangles show the detected words.

The detected text is "Total $1 9,55" instead of "Total $19,55".

It is clearly wrong that Tesseract detects a word boundary between the "1" and the "9".
I see this error very frequently.

Is there any of the hundreds of undocumented settings that defines the minimum width for a space character ?
Or is there any way to tell the word chopper that I want to define a space as at least the width of another character in the same column ?
Word_Chopper_Error.gif

Andrzej

unread,
Jan 8, 2015, 9:21:34 AM1/8/15
to tesser...@googlegroups.com
Hello,

you must modify the value for the parameter textord_dotmatrix_gap, the default 3 on the other
Reply all
Reply to author
Forward
0 new messages