shapeclustering, do or not do?

1,112 views
Skip to first unread message

Shane Wee

unread,
Jul 14, 2013, 10:54:31 PM7/14/13
to tesser...@googlegroups.com
I am using tesseract 3.0.2, I trained my data with shapeclustering included, the result is not as good comparing with the traineddata I got from excluding shapeclustering.
Shapeclutering seems to cause error recognition on similar shape character such as 1 and I, O and Q, 5 and S. 
I am quite sure I follow the training steps correctly.
My question is whether shapeclustering is really important? If I exclude it from my training, will I miss out anything important?

Ray Smith

unread,
Jul 15, 2013, 12:31:36 AM7/15/13
to tesser...@googlegroups.com
The idea of shape clustering is that it should help to resolve exactly the errors that you observe! It doesn't work too well at the moment though for most languages. It currently should not be used except for the Indic languages, where it does seem to help.
Ray.


--
You received this message because you are subscribed to the Google Groups "tesseract-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-de...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Sriranga(79yrs)

unread,
Jul 15, 2013, 3:23:47 AM7/15/13
to tesser...@googlegroups.com

Tested for Kannada lang - shape clustering did not help for Kannada(one of the Indic) - vide attached files (1)with shapecluster- alphabet test.rtf and (2)without shapecluster -alpahabettest.rtf   no improvement at all.
with shape cluster-alphabettest.rtf
without shape cluster-2alphabettest.rtf

Shane Wee

unread,
Jul 16, 2013, 3:15:12 AM7/16/13
to tesser...@googlegroups.com
Thank you so much for the responses. In my case (training alphanumeric only), shapeclustering seems to make the results much worse than without shapeclustering. Thanks again for the confirmation.

mamat...@gmail.com

unread,
Aug 20, 2013, 11:30:44 AM8/20/13
to tesser...@googlegroups.com
Sir
I have followed the procedure to train an indic language which is similar to bangla language using Tesseract-3.01.
But not sucess. Should I use Tesseract-3.02
Reply all
Reply to author
Forward
0 new messages