Expected output of LSTMTRAINING

tc...@zips.uakron.edu

unread,

Jan 7, 2019, 11:03:45 AM1/7/19

to tesseract-ocr

Hey all,

After some wrangling, I've been able to get Tesseract to successfully train on my dataset (i.e. lstmtraining application runs to completion without critical errors)

However, it's not clear in the wiki what exactly the output of lstmtraining is. In the output directory I set for training output, there are two files present after training: base_checkpoint and basetrain.txt.

Are these the expected output files or is there something I'm missing? Is the traineddata file modified for use with recognition or is that file only used for training?

Thanks,

-Tim

Shree Devi Kumar

unread,

Jan 7, 2019, 11:08:48 AM1/7/19

to tesser...@googlegroups.com

You need to convert the checkpoint to a traineddata file.

Please see https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#combining-the-output-files

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6fe0fda7-fc57-4fc3-a8ec-4d7784a822bd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Timothy Snyder

unread,

Jan 7, 2019, 11:40:03 AM1/7/19

to tesser...@googlegroups.com

Great! Thanks, Shree. I totally missed that section.

To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVhnnmjGiSMDKP903U4gy7cRBmFv3j1xj%2Bpa-ED%2BkO_2w%40mail.gmail.com.

tc...@zips.uakron.edu

unread,

Jan 7, 2019, 12:58:36 PM1/7/19

to tesseract-ocr

So I was able to successfully get a traineddata file from lstmtraining buthave encounterd a new error. When I try to run Tesseract against an image as follows:

tesseract ../test.png out -l lso --oem 1 --psm 7

I get the following error:

Failed to read boxes from ../test.png

Any insight as to what may be going wrong here? Note: I trained the file with 15 images (each with 8 text lines) and a single font with 800 iterations. Right now I'm testing with a small dataset. I have a much larger dataset to train with once I have the process better figured out.

tc...@zips.uakron.edu

unread,

Jan 7, 2019, 3:12:56 PM1/7/19

to tesseract-ocr

Nevermind. It seems like it wasn't working because I wasn't explicitly setting the --tessdata-dir flag to the correct /tessdata/ on my system.

Reply all

Reply to author

Forward