Article on tesseract training

317 views
Skip to first unread message

Nick White

unread,
Feb 25, 2013, 2:02:51 PM2/25/13
to tesser...@googlegroups.com
Hi tesseract folks,

Just a note to let you know that an article I wrote about training
tesseract for Ancient Greek has now been published. It is aimed to
be generally useful for people training tesseract with other
languages too, so anybody thinking about training may well find it
worth perusing.

Find it here:
http://eutypon.gr/eutypon/pdf/e2012-29/e29-a01.pdf

Any questions or comments would be warmly received.

Nick

Sriranga(78yrsold)

unread,
Feb 25, 2013, 10:34:16 PM2/25/13
to tesser...@googlegroups.com
nick,
it  would be nice to have pdf in English also, as the present one is in Ancient Greek - for benefit of users of other languages.
With regards,
-sriranga(79yrs)

--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

---
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



Nick White

unread,
Feb 26, 2013, 5:42:02 AM2/26/13
to tesser...@googlegroups.com
The PDF I gave a link to is written in English. It's about training
the Ancient Greek language, but doesn't require any knowledge of
Ancient Greek.

Nick White

unread,
Feb 27, 2013, 7:01:12 AM2/27/13
to tesser...@googlegroups.com
Hi Nick,

Glad you enjoyed it :)

> Did you do any quantitative evaluation of the end product OCR? I'm curious how
> Tesseract compares to hand transcription on your corpus.

No, I didn't do any real quantitative evaluation of the OCR. I'm
starting to do that now; see the recent emails in the "tesseract
testing suite" thread.

Nick

Sven Pedersen

unread,
Apr 15, 2013, 11:08:02 AM4/15/13
to tesser...@googlegroups.com
Please stop sending this message repeatedly. Someone will give you a reply when they have time to figure it out.
Thanks,
Sven


On Mon, Apr 15, 2013 at 9:45 AM, mama <mamat...@gmail.com> wrote:

Sir

I have try to trainde tesseract ocr for a new language. Like Bangla script my language consists of large character set. I could able to trained tesseract for the vowels and consonents and as a beginner I edit the box file manually.

However for a character having consonent with dependent modifier at right side, the generated box file able to identify the character but along with that get an extra chracter (2 character for 1). here i am attaching few lines of the box file, bold are correct ones. I use Tesseract for Ubuntu.

କା 376 3125 407 3162 0
ଛା 412 3125 417 3161 0
ଖା 441 3123 472 3161 0
ଶା 479 3124 484 3159 0
ଗା 508 3123 539 3160 0
ସା 546 3123 550 3158 0

So I a can't be proceed further, please help me as soon as possible.

Eagerly waiting for your reply.

Thanku


--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
 
---
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

mamata nayak

unread,
Apr 23, 2013, 12:50:49 AM4/23/13
to tesser...@googlegroups.com

Sir

I didn't get any reply, how the problem said below resolved  for Bangla language.

I have try to trained tesseract ocr for a new language. Like Bangla script my language consists of large character set. I could able to trained tesseract for the vowels and consonents and as a beginner I edit the box file manually.

However for a character having consonent with dependent modifier at right side, the generated box file able to identify the character but along with that get an extra chracter (2 character for 1). here i am attaching few lines of the box file, bold are correct ones. I use Tesseract for Ubuntu.

କା 376 3125 407 3162 0
ଛା 412 3125 417 3161 0
ଖା 441 3123 472 3161 0
ଶା 479 3124 484 3159 0
ଗା 508 3123 539 3160 0
ସା 546 3123 550 3158 0

So I a can't be proceed further, please help me as soon as possible.

Eagerly waiting for your reply.

Thanku


You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/vowksBpeazA/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.

Sven Pedersen

unread,
Apr 23, 2013, 4:11:21 PM4/23/13
to tesser...@googlegroups.com
Check the Indic language page

Sven

--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
 
---
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/vowksBpeazA/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
 
---
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply all
Reply to author
Forward
0 new messages