We could first run the histogram minima algo. Once this algo gives us
a region of certainty, we can then do template matching in that
region. Since template matching is likely to be time intensive, this
will reduce it's running time considerably.
There are some ways to ascertain that the histogram minima method has
given us the wrong region. One way is to look at the height of the
region box. This height should be a constant percentage of the hight
of the line (or the word or the character) itself. If we see that the
height is beyond (or below) this percentage by some margin, we can
then go on and run the template matching code.
--
Debayan Banerjee
I agree template matching is not scalable.
What we could do is call Tesseract's own functions for recognition
from it's API (char* TesseractRect() from
http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.h
(line 277) ). We will anyways have to integrate our solutions within
Tesseract at the end of the day, so we will end up using a lot of
Tesseract functions. Most importantly Tesseract's character matchign
method is font size independent, because it matches size independent
characteristics of the image, and also stores size independent
characteristics of the template during training. See "Feature" heading
in http://tesseract-ocr.repairfaq.org/.
--
Debayan Banerjee
Everything you have described above, Tesseract already does.
http://tesseract-ocr.googlecode.com/files/TesseractOSCON.pdf
--
Debayan Banerjee
Everything you have described above, Tesseract already does.
http://tesseract-ocr.googlecode.com/files/TesseractOSCON.pdf
http://static.googleusercontent.com/external_content/untrusted_dlcp/www.google.com/en//research/pubs/archive/35248.pdf
is the best place to learn what Tesseract does.
Yes our aim would be to include whatever algos we are developing
inside the engine. That means we can use Tesseract's baseline finding
method, character recognition engine, dictionary among many other
things.
--
Debayan Banerjee
Well slide 11 does tell you that the character chopper talks to the 4
sub-systems that recognise.
--
Debayan Banerjee