How can I train eng.traindata by myself?

75 views
Skip to first unread message

larry

unread,
Jul 21, 2014, 6:13:15 PM7/21/14
to tesser...@googlegroups.com
Where is the training source (boxes, wordlist, font_properties ...) of eng.traindata?
And how can I train eng.traindata by myself?

Thanks!

Victoria A.

unread,
Jul 24, 2014, 8:47:34 AM7/24/14
to tesser...@googlegroups.com
Uncompress the eng.traineddata file with the combine_tessdata command. I did so and found many -dawg files. Also unicharset and unicharambigs files. Use the command dawg2wordlist to uncompress the -dawg files into wordlist, freqlist, etc. However, you cannot get the font_properties and .box files. 

The only way you can add to the eng.traineddata is adding new words into wordlist, new unambiguous rules into unicharambigs file, or bigram rules into bigram file. You cannot unpack all .box files and add a little the way you do when training a new language. 
Reply all
Reply to author
Forward
0 new messages