-c textord_min_linesize 3.25 in tesseract 4 give Errormessage

190 views
Skip to first unread message

Martin Jenniges

unread,
Nov 11, 2018, 9:39:56 AM11/11/18
to tesser...@googlegroups.com

Hello,


I have found the follow Tip for tesseract; but when I give this parameter with -c  textord_min_linesize 3.25 in tesseract 4, I receive a error message. What is wrong ?



Example 3: Line Size

Command

tesseract image.jpg outputfilename config

Command Line Arguments

None

Config Settings

textord_min_linesize 3.25

Notes
  • textord_min_linesize seems to have an affect on the line heights detected by Tesseract when it performs the layout analysis on the image.  The default value for this setting is 1.25.
  • When set to 3.25, the "broken" line problem in the original baseline output is corrected.  Lower settings (for example, 3.0) do not correct the "broken" lines.  
  • This settings causes other character recognition errors.
  • The text in the output that is highlighted in red is again correctly contained on a single line.
  • The words highlighted in blue include extra characters that are a results of "noise" (specks and imperfections in the image).  None of these have corrected, but no new ones have appeared.
  • Lines between "paragraphs" now appear in somewhat odd locations.  Again, there are NO lines between paragraphs on the source image.
  • The garbage words at the end of the page do not appear.
  • A small number of errors in individual words that appear in the original output were corrected, a few other incorrect words changed (but were still incorrect), a small number of correct words now are  incorrect.  These have been highlighted in purple.

Zdenko Podobny

unread,
Nov 12, 2018, 7:02:15 AM11/12/18
to tesser...@googlegroups.com
What kind of error message you get? 
Please share your image for testing too.

Zdenko


ne 11. 11. 2018 o 15:39 Martin Jenniges <martinj...@skynet.be> napísal(a):
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0611f640-034f-3251-932f-e29e6fea4773%40skynet.be.
For more options, visit https://groups.google.com/d/optout.

Martin Jenniges

unread,
Nov 13, 2018, 7:59:40 AM11/13/18
to tesser...@googlegroups.com
Hello,

my command-line is tesseract 3tmp3.png.tif s3.txt -l deu --oem 1 --psm 6 -c textord_min_linesize 3.25

and receive

read_params_file: Can't open 3.25
Missing in configvar assigment

Martin
3tmp3.png.tif

Vinod Gattani

unread,
Nov 13, 2018, 8:01:54 AM11/13/18
to tesser...@googlegroups.com
Correct command

tesseract 3tmp3.png.tif s3.txt -l deu --oem 1 --psm 6 -c "textord_min_linesize=3.25 "

Martin Jenniges

unread,
Nov 13, 2018, 8:32:46 AM11/13/18
to tesser...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages