Reduce the weight of eng.traineddata using only one font

79 views
Skip to first unread message

Brais Gabín Moreira

unread,
Sep 11, 2016, 8:02:54 AM9/11/16
to tesseract-ocr
I'm using tesseract to recognice some screenshots. I'm building this in an Android app so ~20MB of traineddata is a lot of weight. I know the font in those screenshots.

How can I reproduce the steps to generate the eng.traineddata? I want to use the same data: text, dictionary, patterns, etc. Once I have that, I'll strip out all the "useless" fonts and add the one I want.

Quan Nguyen

unread,
Sep 12, 2016, 8:18:50 AM9/12/16
to tesseract-ocr
You may consider using the old versions of eng.traineddata file, one of which is only 3MB.

Brais Gabín Moreira

unread,
Sep 12, 2016, 11:22:11 AM9/12/16
to tesseract-ocr
Wow! This file works as good as the 20MB! (at least in my case)

Any way it'll be great to know the steps to generate one of those files.

Quan Nguyen

unread,
Sep 12, 2016, 8:33:28 PM9/12/16
to tesseract-ocr
Reply all
Reply to author
Forward
0 new messages