Teseract OCR for sinhala language

49 views
Skip to first unread message

Kus WikzSL

unread,
Jun 30, 2015, 7:06:06 AM6/30/15
to tesser...@googlegroups.com
Hi All,
     I am currently doing my undergraduate project. It include a OCR part for "SInhala" language (primary language of sri lanka).
I hope to doing using teseract. But the problem is there is no train data for sinhala language. Can any one help me to describe how to train for a
new language. I follow  https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 , but i still have no idea how to do it.

   P.S  And also in  the google doc working properly for sinhala ocr. I think it also done by teseract. Can any one one know how to get the training data file of it.

Regards,
Kus
 

Jim O'Regan

unread,
Jun 30, 2015, 7:52:26 AM6/30/15
to tesser...@googlegroups.com
On 30 June 2015 at 11:46, Kus WikzSL <spkm...@gmail.com> wrote:
> Hi All,
> I am currently doing my undergraduate project. It include a OCR part
> for "SInhala" language (primary language of sri lanka).
> I hope to doing using teseract. But the problem is there is no train data
> for sinhala language. Can any one help me to describe how to train for a
> new language.

I think 'sin' is Sinhala:
https://github.com/tesseract-ocr/tessdata/blob/master/sin.traineddata?raw=true
(it was added a few days ago).

--
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you
Reply all
Reply to author
Forward
0 new messages