How to Configure Size of Gap Used To Determine Separation Between Words

43 views
Skip to first unread message

Dave Wood

unread,
Nov 27, 2019, 1:27:57 PM11/27/19
to tesseract-ocr
I need to be able to adjust the size of the gap which Tesseract uses to determine the separation between words.

Here is my setup:

-Tesseract Windows Version 5.0.0 from UB-Mannheim
-image cleaning and resizing using openCV (have put much effort into getting this as good as I can)
-parameters --psm 6 --oem 1 (LSTM engine)

In my case, I need to make that gap a bit smaller than Tesseract seems to use.  An example is the following:
OneOfThree.png
For this image, Tesseract returns "1of3", essentially treating this as one word with no spaces.

It seems like that configuration file parameter set starting with "tosp_" would be the ones to adjust this inter-word spacing, but I have experimented extensively with this parameter set with no results.  Perhaps these parameters are only relevant to the legacy engine.

So is there anyone who can tell me how to configure the size of the gap between words when using the LSTM engine?

Thanks,

Dave
Reply all
Reply to author
Forward
0 new messages