I'm using tesseract to recognice some screenshots. I'm building this in an Android app so ~20MB of traineddata is a lot of weight. I know the font in those screenshots.
How can I reproduce the steps to generate the eng.traineddata? I want to use the same data: text, dictionary, patterns, etc. Once I have that, I'll strip out all the "useless" fonts and add the one I want.