naga raja
unread,Mar 2, 2010, 3:12:02 AM3/2/10Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to indi...@googlegroups.com, tesser...@googlegroups.com
Hi guys,
This is Nagaraja, i m trianing Tesseract-Ocr for tamil languages
The are many questions arises while training tesseract for tamil langauges..i have posted below. plz reply me
1)when i install tesseract-ocr from source, its installed successfully but while testing it shows me an error like
unable to load "/usr/local/share/tessdata/eng.unicharset"
> i tried downloading the english tessdata and place it in tessdata folder, but still no luck
> i m using tesseract- 2.04 on ubuntu 9.04
2) Then i created the 8 files of tamil training data. 5 files are created by Training-tesseract GUI by debayan , and 3 files , i created by myself.
> The Error i face was X characters in inttemp whereis Y characters in tam.unicharset
> although i seached , i cant find a proper documentation except a single issue
3) How the tesseract-ocr recogonizes the text?For some images each character may be of different size and of different fonts. so while training do i need to train for all the fonts in all sizes?
4)Can we change the output format of tesseract rather than .txt?
5)Although i googled for my above queries , i can get the complete answers or documentation, once if i clear my doubts , i shall create a complete documentation for training the tesseract-ocr which may surely help the other peoples(beginners).
Thanks and Regards,
T.Nagaraja