Re: utf-8 supports unicode? That means indic too?

7 views
Skip to first unread message

74yrs old

unread,
Jan 11, 2010, 12:07:31 PM1/11/10
to indi...@googlegroups.com
Dear  Shri Debayan Banerjee,
I read your research  about dictionary of Tesseract-ocr.  I wanted to perform similar experiment for
kannada also, I  may kindly be informed whether the following your patches have been incorporated in your
 tesseractindic-2.tar.gz.under download and feedback to you.
.
With warmest Regards,
-sriranga(77yrsold)


On Thu, Nov 26, 2009 at 12:43 AM, Debayan Banerjee <deba...@gmail.com> wrote:
2009/11/26 Debayan Banerjee <deba...@gmail.com>:

> Wait for my next post where I will analyse how to solve the
> Indic-dictionary  bug.

Infact it was a single line change. Here is the patch. The change is
in dict/permute.cpp

--- tesseract-2.04/dict/permute.cpp     2008-11-14 23:07:17.000000000 +0530
+++ tessmod/dict/permute.cpp    2009-11-26 00:34:50.660737699 +0530
@@ -1077,6 +1077,7 @@
    return (NULL);
  if (permute_only_top)
    return result_1;
+  any_alpha=1;
  if (any_alpha && array_count (char_choices) <= MAX_WERD_LENGTH) {
    result_2 = permute_words (char_choices, rating_limit);
    if (class_probability (result_1) < class_probability (result_2)


For non-eng script the if condition was never getting satisfied and
hence the DAWG files were not being scanned properly. Adding a
any_alpha=1 on the top explicitly on the top solves this problem for
the time. There is probably a more elegant solution though.
By the way, I do not see this particular if condition in the trunk
anywhere in the file. Perhaps the deveopers have fixed it in the trunk
already.





--
Regards,
Debayan Banerjee

--

You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com.
To unsubscribe from this group, send email to tesseract-oc...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.



Debayan Banerjee

unread,
Jan 11, 2010, 1:39:36 PM1/11/10
to indi...@googlegroups.com
On 11/01/2010, 74yrs old <withbl...@gmail.com> wrote:
> Dear Shri Debayan Banerjee,
> I read your research about dictionary of Tesseract-ocr. I wanted to
> perform similar experiment for
> kannada also, I may kindly be informed whether the following your patches
> have been incorporated in your
> tesseractindic-2.tar.gz.under download and feedback to you.

Yes. The latest tesseract-indic download supports dictionary lookup.

--
Regards,
Debayan Banerjee

Reply all
Reply to author
Forward
0 new messages