sinhala trained data

Thilina Mendis

unread,

Feb 23, 2015, 9:43:01 PM2/23/15

to tesser...@googlegroups.com

hi guys anyone know whether there is sinhala.traineddata file... for sinhala fonts? i have to use this. please do share if yall have any files regarding sinhala fonts :)

Thanks

Thilina Mendis

Ruwanka De Silva

unread,

Feb 24, 2015, 5:20:29 AM2/24/15

to tesser...@googlegroups.com

Hi,

There is no Sinhalese traineddata file, no one has published trained data for tesseract yet. There is a sinhala ocr developed by UCSC, but their traineddata file is not accessible. You can find Sinhalese traineddate file from this sinhala ocr but it is lack of accuracy. I am looking forward to train tesseract for Sinhalese (especially for the letters in old newspapers which don't have exact fonts). I'll post here if I succeed with training Sinhalese. Anyone has knowledge about training tesseract for Sinhalese in high accuracy please comment here or share training files.

Regards.

Manusha Dilan

unread,

Nov 10, 2015, 4:10:48 AM11/10/15

to tesseract-ocr

The thing you may searching for is this https://github.com/tesseract-ocr/tessdata/blob/master/sin.traineddata