ICR using Tesseract

1,886 views
Skip to first unread message

pottmi

unread,
Sep 24, 2011, 8:24:54 PM9/24/11
to tesseract-ocr
We are attempting to use Tesseract to do ICR on handwritten block
characters.

We have been training using our own training data.
We are using a box around the characters to encourage users to write
in uniform manner.

For our very first tests with our small training sample we are getting
50% recognition.
We will expand our training samples and press on.

My questions for group are:
1) Is anyone successfully doing this in production?

2) How big of a training set should I expect to need; how may
different people contributing?

3) Are there any good techniques to programmatically remove the box so
that it does not interfere with the recognition? For experimentation
purposes we have been removing it by hand.

4) Does anyone have hand written block character training data that
they can share?

5) Is attempting to do ICR with tesseract like forcing a square peg in
a round hole and I should just find another solution?

Sven Pedersen

unread,
Sep 25, 2011, 8:17:43 AM9/25/11
to tesser...@googlegroups.com
Someone recently posted about this, and actually showed some of their
work. You could browse the archives -- they said it was handwriting,
so initially we were confused it was a connected script, but it was
actually not.
--Sven

> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-oc...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

--
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

merve t

unread,
Sep 26, 2011, 9:21:08 AM9/26/11
to tesser...@googlegroups.com
Hello,
I am trying to train Tesseract for handwriting(for now with not adjacent letters) and get good(%60 success) results.
My main problem now;

 "rin" is recognized as "m"

i do not want change unicharambigs because, some time m can be truely recognized.

I want to ask can i get the possibilities that tesseract generate before it presents the last result?

If i can get all the possibilities, i want to applicate a linguistic algorithm on the results in my language, Turkish.

Thanks


2011/9/25 Sven Pedersen <sven.p...@gmail.com>

Adam

unread,
Jul 6, 2016, 4:04:32 AM7/6/16
to tesseract-ocr
Hey I am having the same question
I have recently attempted the very same thing 
just wanted to know is it really possible?? n if yes how far have u reached
Reply all
Reply to author
Forward
0 new messages