Re: utf-8 supports unicode? That means indic too?

7 views

Skip to first unread message

74yrs old

unread,

Jan 11, 2010, 12:07:31 PM1/11/10

to indi...@googlegroups.com

Dear Shri Debayan Banerjee,
I read your research about dictionary of Tesseract-ocr. I wanted to perform similar experiment for
kannada also, I may kindly be informed whether the following your patches have been incorporated in your
tesseractindic-2.tar.gz.under download and feedback to you.
.
With warmest Regards,
-sriranga(77yrsold)

On Thu, Nov 26, 2009 at 12:43 AM, Debayan Banerjee <deba...@gmail.com> wrote:

2009/11/26 Debayan Banerjee <deba...@gmail.com>:

> Wait for my next post where I will analyse how to solve the
> Indic-dictionary bug.

Infact it was a single line change. Here is the patch. The change is
in dict/permute.cpp

--- tesseract-2.04/dict/permute.cpp 2008-11-14 23:07:17.000000000 +0530
+++ tessmod/dict/permute.cpp 2009-11-26 00:34:50.660737699 +0530
@@ -1077,6 +1077,7 @@
return (NULL);
if (permute_only_top)
return result_1;
+ any_alpha=1;
if (any_alpha && array_count (char_choices) <= MAX_WERD_LENGTH) {
result_2 = permute_words (char_choices, rating_limit);
if (class_probability (result_1) < class_probability (result_2)

For non-eng script the if condition was never getting satisfied and
hence the DAWG files were not being scanned properly. Adding a
any_alpha=1 on the top explicitly on the top solves this problem for
the time. There is probably a more elegant solution though.
By the way, I do not see this particular if condition in the trunk
anywhere in the file. Perhaps the deveopers have fixed it in the trunk
already.

--
Regards,
Debayan Banerjee

--

You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com.
To unsubscribe from this group, send email to tesseract-oc...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

Debayan Banerjee

unread,

Jan 11, 2010, 1:39:36 PM1/11/10

to indi...@googlegroups.com

On 11/01/2010, 74yrs old <withbl...@gmail.com> wrote:
> Dear Shri Debayan Banerjee,
> I read your research about dictionary of Tesseract-ocr. I wanted to
> perform similar experiment for
> kannada also, I may kindly be informed whether the following your patches
> have been incorporated in your
> tesseractindic-2.tar.gz.under download and feedback to you.