Support for Devnagri

42 views
Skip to first unread message

Debayan Banerjee

unread,
May 26, 2008, 3:18:27 PM5/26/08
to tesser...@googlegroups.com
http://debayanin.googlepages.com/hackingtesseract
 
Please comment.
--
BE INTELLIGENT, USE LINUX

Ray Smith

unread,
May 29, 2008, 8:37:54 PM5/29/08
to tesser...@googlegroups.com
Nice work. Thanks! We (at Google) are actually very interested in Indic OCR in general, and the latest (2.03) Tesseract includes a lot of changes that came about from work on Kannada.

My question is: what is the maximum feasible accuracy of this method: What fraction of needed cuts does it not find, and what fraction of cuts that it finds are false positives? Another question is does it introduce any extra ambiguity that is now unresolvable because of the lost pixels?
Thanks,
Ray.

deba...@gmail.com

unread,
May 30, 2008, 5:25:57 AM5/30/08
to tesseract-ocr
Hi Mr. Ray,
I am yet to test it as comprehensively as your queries demand. Will
reply as soon as possible.

On May 30, 5:37 am, "Ray Smith" <theraysm...@gmail.com> wrote:
> Nice work. Thanks! We (at Google) are actually very interested in Indic OCR
> in general, and the latest (2.03) Tesseract includes a lot of changes that
> came about from work on Kannada.
>
> My question is: what is the maximum feasible accuracy of this method: What
> fraction of needed cuts does it not find, and what fraction of cuts that it
> finds are false positives? Another question is does it introduce any extra
> ambiguity that is now unresolvable because of the lost pixels?
> Thanks,
> Ray.
>
> On Mon, May 26, 2008 at 12:18 PM, Debayan Banerjee <debaya...@gmail.com>

deba...@gmail.com

unread,
Jun 2, 2008, 8:51:05 PM6/2/08
to tesseract-ocr


On May 30, 5:37 am, "Ray Smith" <theraysm...@gmail.com> wrote:
> Nice work. Thanks! We (at Google) are actually very interested in Indic OCR
> in general, and the latest (2.03) Tesseract includes a lot of changes that
> came about from work on Kannada.
>
> My question is: what is the maximum feasible accuracy of this method: What
> fraction of needed cuts does it not find, and what fraction of cuts that it
> finds are false positives? Another question is does it introduce any extra
> ambiguity that is now unresolvable because of the lost pixels?
> Thanks,
> Ray.
>

I tested my code using bbtesseract. Here are the results
http://debayanin.googlepages.com/hackingtesseract .
The erroneous boxings mentioned there are the only 2 type of errors i
have faced so far.
Reply all
Reply to author
Forward
0 new messages