Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

How can I train the 22MB eng model?

20 views
Skip to first unread message

TheComplete BookOfMormon

unread,
Apr 17, 2025, 9:27:37 AMApr 17
to tesseract-ocr
I am using the following 22MB eng.traineddata in my app and it is working very well
https://github.com/tesseract-ocr/tessdata/blob/main/eng.traineddata

There were some corner cases I thought I'd be able to train the model with
https://github.com/TheBookOfMormon/TheCompleteBookOfMormon/tree/master/Data/Sources/1830PalmyraEdition/03-OCRTraining/1830PalmyraEdition

I tried training this 22 MB file but it won't work because it is the integer version of the model.

I then tried this 15 MB file from tessdata_best
https://github.com/tesseract-ocr/tessdata_best/blob/main/eng.traineddata

It's a year older, and the results it produces aren't as good after I've trained it as the 22 MB file. In fact, even using this 15MB "best" file without training gives me results that are not as good as the 22MB file.

Where can I get the trainable version of the 22 MB file?
Reply all
Reply to author
Forward
0 new messages