Shapeclustering crashes on linux

78 views
Skip to first unread message

rkvsraman

unread,
Sep 22, 2016, 7:44:31 AM9/22/16
to tesseract-ocr

Hello,

I am running the shape clustering command and it crashes with following message.

/usr/local/bin/shapeclustering -D /tmp/tmp.0fGj1mVg2C/hin/ -U /tmp/tmp.0fGj1mVg2C/hin/hin.unicharset -O /tmp/tmp.0fGj1mVg2C/hin/hin.mfunicharset -F /home/raman/Desktop/lang/font_properties /tmp/tmp.0fGj1mVg2C/hin/hin.Noto_Sans_Devanagari.exp0.tr
Reading /tmp/tmp.0fGj1mVg2C/hin/hin.Noto_Sans_Devanagari.exp0.tr ...
Reading spacing from /tmp/tmp.0fGj1mVg2C/hin/hin.Noto_Sans_Devanagari.exp0.fontinfo for font 4293...
shapeclustering: ../ccutil/genericvector.h:663: T& GenericVector<T>::operator[](int) const [with T = int]: Assertion `index >= 0 && index < size_used_' failed.
Aborted (core dumped)

Attached is tesstrain.log

Any idea why?
tesstrain.log

ShreeDevi Kumar

unread,
Sep 22, 2016, 8:49:13 AM9/22/16
to tesser...@googlegroups.com
Warning: properties incomplete for index 93 = प्र
Warning: properties incomplete for index 94 = क्रि
Warning: properties incomplete for index 95 = २
Warning: properties incomplete for index 96 = ५

These errors will get eliminated / reduced if your langdata has the
Devanagari.unicharset
and
Devanagari.xheights
from
https://github.com/tesseract-ocr/langdata

I might have run the training without shapeclustering - please see the zip file with training log.

Also see http://stackoverflow.com/questions/34389159/tesseract-index-0-index-size-used-errorassert-failed-error



ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c1ef13db-90e8-4a13-8ade-2986e7a8df10%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

rkvsraman

unread,
Sep 22, 2016, 9:00:10 AM9/22/16
to tesseract-ocr
Let me try with Devanagari.* files

Thanks

-Raman
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

rkvsraman

unread,
Sep 22, 2016, 9:54:23 AM9/22/16
to tesseract-ocr
Hi,

Shapeclustering doesn't crash after i added those Devanagari files, but it is now running for past 45 minutes and still hasnt got done. Is that normal?

ShreeDevi Kumar

unread,
Sep 22, 2016, 11:41:30 AM9/22/16
to tesser...@googlegroups.com

From readme.md of langdata

To re-create the training of a single language, lang, you need the following:

All the data in the lang directory.
san/*.*

The corresponding unicharset/xheights files for the script(s) used by lang.
Devanagari.*

All the remaining non-lang-specific files in the top-level directory, such as 
font_properties.

You also need to obtain the fonts needed to train the language. Some languages were trained with commercially available fonts, so you will need to buy them in order to reproduce the training exactly, or use substitutes.

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

ShreeDevi Kumar

unread,
Sep 22, 2016, 2:26:18 PM9/22/16
to tesser...@googlegroups.com

Training can take long and can crash again later too.

You can try training without shape clustering also, mftraining I think will create a flat shape table in that case.

You can compare both.


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
Reply all
Reply to author
Forward
0 new messages