Unclear error message when running tesstrain.sh

13 views
Skip to first unread message

David Maung

unread,
Sep 16, 2019, 10:56:48 AM9/16/19
to tesseract-ocr
Hello,

I attempted to run the following command

src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only --noextract_font_properties --langdata_dir ~/tesstutorial/langdata --tessdata_dir ~/tesstutorial/tesseract/tessdata --output_dir ~/tesstutorial/engtrain

(which is copied from the document Training Tesseract 4.00 in the section TessTutorial.

Everything seems to be going fine until it (spuriously?) generates an error message in the log file:

Rendered page 3355 to file /tmp/eng-2019-09-14.GmB/eng.Arial_Italic.exp0.tif
Rendered page 3370 to file /tmp/eng-2019-09-14.GmB/eng.Arial_Bold_Italic.exp0.tif
ERROR: Program text2image failed. Abort.
Rendered page 3367 to file /tmp/eng-2019-09-14.GmB/eng.Arial.exp0.tif
Rendered page 3356 to file /tmp/eng-2019-09-14.GmB/eng.Arial_Italic.exp0.tif
...

After this, training will continue and then end without copying anything out of the /tmp directory.  In my case, it generated 7 of 8 box files as seen by a directory of /tmp/eng-2019-09-14.GmB:

dmaung@Rhinegeist1:~/Tesseract-git/tesseract$ ls -1 /tmp/eng-2019-09-14.GmB/
eng.Arial_Bold.exp0.box
eng.Arial_Bold.exp0.tif
eng.Arial_Bold_Italic.exp0.box
eng.Arial_Bold_Italic.exp0.tif
eng.Arial.exp0.box
eng.Arial.exp0.tif
eng.Arial_Italic.exp0.box
eng.Arial_Italic.exp0.tif
eng.Courier_New_Bold.exp0.box
eng.Courier_New_Bold.exp0.tif
eng.Courier_New_Bold_Italic.exp0.box
eng.Courier_New_Bold_Italic.exp0.tif
eng.Courier_New.exp0.tif
eng.Courier_New_Italic.exp0.box
eng.Courier_New_Italic.exp0.tif
tesstrain.log

Can anyone suggest how to debug what is causing text2image to fail or how to get around it?
David




Reply all
Reply to author
Forward
0 new messages