Training data file for Sinhala language

138 views
Skip to first unread message

Kus WikzSL

unread,
Jul 6, 2015, 4:44:07 AM7/6/15
to tesser...@googlegroups.com
Hi,
  I  hope to create an OCR application for sinhala language ( first language of Sri Lanka). But there is no training data file for a sinhala language. I follow https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 but don't understand even how to start. Could someone explain me better, at least how to start?

* google doc works proper for sinhala ocr. If there any service to call sinhala training data file ???

Regards,
Tharakaz

Jim O'Regan

unread,
Jul 6, 2015, 5:05:07 AM7/6/15
to tesser...@googlegroups.com
On 6 July 2015 at 09:44, Kus WikzSL <spkm...@gmail.com> wrote:
> Hi,
> I hope to create an OCR application for sinhala language ( first language
> of Sri Lanka). But there is no training data file for a sinhala language.

As of 11 days ago, yes there is:
https://github.com/tesseract-ocr/tessdata/blob/master/sin.traineddata?raw=true


--
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you

Kusan Wikz

unread,
Jul 6, 2015, 5:19:34 AM7/6/15
to tesser...@googlegroups.com
Thanks for replying Jim. I didn't no about teseract published sinhala training data file. :-)



--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/8vC2lCmlZQQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAHh9-xu1WVqmTRCqxw7WQ77OrT2no3K5B6Q-3NZ6iyK8NYeLcQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages