Teseract OCR for sinhala language

49 views

Skip to first unread message

Kus WikzSL

unread,

Jun 30, 2015, 7:06:06 AM6/30/15

to tesser...@googlegroups.com

Hi All,
I am currently doing my undergraduate project. It include a OCR part for "SInhala" language (primary language of sri lanka).
I hope to doing using teseract. But the problem is there is no train data for sinhala language. Can any one help me to describe how to train for a
new language. I follow https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 , but i still have no idea how to do it.

P.S And also in the google doc working properly for sinhala ocr. I think it also done by teseract. Can any one one know how to get the training data file of it.

Regards,
Kus

Jim O'Regan

unread,

Jun 30, 2015, 7:52:26 AM6/30/15

to tesser...@googlegroups.com

On 30 June 2015 at 11:46, Kus WikzSL <spkm...@gmail.com> wrote:
> Hi All,
> I am currently doing my undergraduate project. It include a OCR part
> for "SInhala" language (primary language of sri lanka).
> I hope to doing using teseract. But the problem is there is no train data
> for sinhala language. Can any one help me to describe how to train for a
> new language.

I think 'sin' is Sinhala:
https://github.com/tesseract-ocr/tessdata/blob/master/sin.traineddata?raw=true
(it was added a few days ago).

--
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you

Reply all

Reply to author

Forward

0 new messages