Tesstutorial fails to generate lstmf files

142 views
Skip to first unread message

Khangaroo

unread,
Nov 5, 2019, 1:02:30 AM11/5/19
to tesseract-ocr
Hi. I'm trying to create a fine-tuned model for Tesseract, but the tesstrain.sh script always appears to fail on "Phase E: Generating lstmf files". I get a rather vague error message for each Tesseract instance:

Failed to read pages from /tmp/eng-2019-11-04.YNl/eng.Century_Schoolbook_L_Bold_Italic.exp0.tif
Error during processing.

I ran strace on one of the failed commands from tesstrain.sh and one line in particular stuck out:

openat(AT_FDCWD, "/tmp/eng-2019-11-04.YOY/eng.NimbusSanNovDSemBol.exp0.uzn", O_RDONLY) = -1 ENOENT (No such file or directory)

The only code I could find that referenced any uzn files in the entire repository was some code dedicated to reading it, not writing it. Is there any way around this?

Shree Devi Kumar

unread,
Nov 5, 2019, 3:50:39 AM11/5/19
to tesseract-ocr
It fails with latest code.



Try with an older commit.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/72d94549-73d1-49a9-b51c-15a8fd2346a8%40googlegroups.com.

Shree Devi Kumar

unread,
Nov 5, 2019, 3:51:39 AM11/5/19
to tesseract-ocr
Google search about uzn, there are utilities to generate them.

Khangaroo

unread,
Nov 6, 2019, 10:41:01 PM11/6/19
to tesseract-ocr
Reverting commit 94d0f77f56bb9123c4c33c97125e76e7bdb73159 worked. Thank you!

prashant patnaik

unread,
Nov 12, 2019, 3:07:07 AM11/12/19
to tesseract-ocr
I am also getting the same issue:
Tesseract Open Source OCR Engine v5.0.0-alpha-547-g9b46 with Leptonica
Page 1
Page 1
Page 1
Page 1
Page 1
Failed to read pages from /tmp/eng-2019-11-12.sAl/eng.Century_Schoolbook_L_Bold.exp0.tif
Error during processing.
Failed to read pages from /tmp/eng-2019-11-12.sAl/eng.Century_Schoolbook_L_Medium.exp0.tif
Failed to read pages from /tmp/eng-2019-11-12.sAl/eng.Century_Schoolbook_L_Italic.exp0.tif
Error during processing.
Error during processing.
Failed to read pages from /tmp/eng-2019-11-12.sAl/eng.Arial_Bold.exp0.tif
Failed to read pages from /tmp/eng-2019-11-12.sAl/eng.Arial.exp0.tif
Error during processing.
ERROR: Program tesseract failed. Abort.
prashant1@ubuntu:~/tesstutorial/tesseract$ ERROR: Program tesseract failed. Abort.
Error during processing.
ERROR: Program tesseract failed. Abort.
Failed to read pages from /tmp/eng-2019-11-12.sAl/eng.Century_Schoolbook_L_Bold_Italic.exp0.tif
Error during processing.
ERROR: Program tesseract failed. Abort.
ERROR: Program tesseract failed. Abort.
Failed to read pages from /tmp/eng-2019-11-12.sAl/eng.Arial_Italic.exp0.tif
ERROR: Program tesseract failed. Abort.
Error during processing.
Failed to read pages from /tmp/eng-2019-11-12.sAl/eng.Arial_Bold_Italic.exp0.tif
Error during processing.
ERROR: Program tesseract failed. Abort.
ERROR: Program tesseract failed. Abort.



Can you please tell me which version did you revert? Do you have any link to that version so that I can download it?

Thanks
Prashant
Reply all
Reply to author
Forward
0 new messages