Tesseract on .Net > End Line Character

153 views
Skip to first unread message

Ruwanthaka Ranasinghe

unread,
Sep 18, 2012, 12:52:48 AM9/18/12
to tesser...@googlegroups.com
Hi All,

I’m training new Language Pack for Sinhala Language @ Sri Lanka.


 It works as expected but at every line end word/ Character join with the next new line start word without any spaces added.

 Can anyone help me on this?


Ahanks in advance

Ruwanthaka

Lahiru Himash Madusanka

unread,
Sep 18, 2012, 6:02:53 AM9/18/12
to tesser...@googlegroups.com
I have developed a language pack for sinhala. It will be released soon. :)
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-oc...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>


--
Lahiru Himash Madusanka
-------------------------------------------------
http://119sinhala.blogspot.com
http://about.me/himesh

Gaara Sabaku

unread,
Sep 18, 2012, 8:20:20 PM9/18/12
to tesser...@googlegroups.com
can you send me an example of your training images, just one or two...it is the most critical part on your technique.
 
you probably need more distance between each character in the x axis in your trainer files.

Ruwanthaka Ranasinghe

unread,
Sep 18, 2012, 11:04:09 PM9/18/12
to tesser...@googlegroups.com
thanks for your reply ,
please fins the attachment of images which i use as training sample.

Ruwanthaka Ranasinghe

  0714803855 / 0112702810

Sin.cn.02.tif
Sin.cn.03.tif

Lahiru Himash Madusanka

unread,
Sep 19, 2012, 1:01:30 AM9/19/12
to tesser...@googlegroups.com
I don't think you can train tess with these kind of images, to
Sinhala. I have failed training with these kind of images
> *
> *
> *Ruwanthaka Ranasinghe*
> *
> *
> * 0714803855 / 0112702810*
> *ruwan...@gmail.com*

Ruwanthaka Ranasinghe

unread,
Sep 19, 2012, 1:38:18 AM9/19/12
to tesser...@googlegroups.com
any way currently it's working as expected way , and result almost over 90%.
only major issue i faced that end line issue which i pointed earlier.

Lahiru Himash Madusanka

unread,
Sep 19, 2012, 3:01:42 AM9/19/12
to tesser...@googlegroups.com
It will work with this image. But when using other images that we
didn't used for the training, It will give some conjunction errors. I
couldn't able to overcome this line error

Ruwanthaka Ranasinghe

unread,
Sep 19, 2012, 4:53:33 AM9/19/12
to tesser...@googlegroups.com
this is interesting , i used above 13 such files (all are same as above)  as training samples ,
now it 's working fine with above 90% of accuracy.

Ruwanthaka Ranasinghe

unread,
Sep 19, 2012, 4:54:28 AM9/19/12
to tesser...@googlegroups.com
can you send me the sample image with it's BOX file ?

Lahiru Himash Madusanka

unread,
Sep 19, 2012, 6:06:19 AM9/19/12
to tesser...@googlegroups.com
I'm in trouble @Ruwanthaka, While trying to upload these files. All
got deleted by accident. :( Failed recovery attempt due to low space
in my HDD. I'm totally lost. Only I got my image files.

Lahiru Himash Madusanka

unread,
Sep 19, 2012, 6:07:36 AM9/19/12
to tesser...@googlegroups.com
trying to re-train all my images :(

sheeyam shellvacumar

unread,
Jun 22, 2014, 7:43:27 AM6/22/14
to tesser...@googlegroups.com, ruwan...@gmail.com
Hi,

Does Tesseract support sinhala. How do u guys train them ??? Actually i am confused help me

Thanks
Sheeyam

Nick White

unread,
Jun 27, 2014, 4:24:15 PM6/27/14
to tesser...@googlegroups.com, ruwan...@gmail.com
Hi Sheeyam, sorry for not replying to your emails sooner.

On Sun, Jun 22, 2014 at 04:43:27AM -0700, sheeyam shellvacumar wrote:
> Does Tesseract support sinhala. How do u guys train them ??? Actually i am
> confused help me

It looks like some people have trained Tesseract for Sinhala; see
http://www.ucsc.cmb.ac.lk/sdu/research.html &
http://192.248.22.122/ocrsinhala/

However as far as I can see they aren't sharing their .traineddata
file, or the source files for it. It would be a good idea to contact
them and ask if they can share those with you, and with the
community more broadly, so we can potentially improve things in the
future.

If they don't respond, instructions on training Tesseract are on the
wiki:
https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3

Nick
Reply all
Reply to author
Forward
0 new messages