Information about fonts and word list used for preparing the Tam.Trained data from Google

40 views
Skip to first unread message

sibi kanagaraj

unread,
Apr 7, 2015, 12:27:51 AM4/7/15
to tesser...@googlegroups.com
Dear all ,

I have been using Tamil Language test data downloaded from Google Repo  and also from Viet OCR source forge . (Thanks to Shree ) .

Some times I use them as single entity , some times combined using a + . But I have been facing a problem .

While the data from Viet OCR has information like tam.font properties which helps  in finding out which fonts have been used , tam.txt (0,1,2,3,4) which helps in giving idea of what text has been used for training .

Is there a way to find out these files ?

-Sibi

sibi kanagaraj

unread,
Apr 7, 2015, 5:13:23 AM4/7/15
to tesser...@googlegroups.com
Update :

Though I have not been able to find out fonts and word list , I have been able unpack the trained data and find out the various files involved .

Command used :

combine_tessdata -u tam.traineddata /home/sibi/Desktop/Training\ /traceback/tam.

Documents which helped me : Quan Nguyen's reply here  and the combine_tessdata manual .

But still my question persists .

How to find out what all font families were trained and the training set involved .

-Sibi
Reply all
Reply to author
Forward
0 new messages