Tesseract v3.05.02 Training Error During Processing

282 views
Skip to first unread message

James Lipham

unread,
Jul 1, 2018, 4:13:27 PM7/1/18
to tesseract-ocr
Good afternoon all!

I'm running Tesseract v3.05.02 on OSX Sierra (installed via Homebrew), and I'm trying to train a custom dataset with some fairly small images that are programmatically generated from a dot matrix display.

When running 
tesseract eng.dmd.exp0.tif eng.dmd.box nobatch box.train

I get the following information:

Tesseract Open Source OCR Engine v3.05.02 with Leptonica
Page 1
Detected 27 diacritics
Error during processing.

There is no additional information output to the console, so I really don't know what my error could be. I've looked and verified that the tif image doesn't have an alpha channel, and the box file appears to be in the appropriate format.

Has anyone run into this before? I'm thinking it's something absurdly simple. I've attached both the TIF and box files I'm using.

Thank you very very much!

-- James
eng.dmd.exp0.tif
eng.dmd.box

James Lipham

unread,
Jul 2, 2018, 8:40:26 AM7/2/18
to tesseract-ocr
I have also updated the image to have everything as the same font/size/etc, but still, tesseract just says "Error during processing." with seemingly zero information as to why.

Has anyone ever experienced this? If I can't find anything else out, I guess I'll just have to step through the page processing code and add in a bunch of printf statements just to see where tesseract is blowing up, which seems a bit overkill.

-- James

Quan Nguyen

unread,
Jul 2, 2018, 7:51:07 PM7/2/18
to tesseract-ocr
Wrong filename format. The box should be named `eng.dmd.exp0.box`.
Reply all
Reply to author
Forward
0 new messages