Creation of encoded unicharset failed While constructing LSTM training data.

62 views
Skip to first unread message

roberty...@gmail.com

unread,
Aug 10, 2017, 4:35:26 AM8/10/17
to tesseract-ocr
Hello,

I'm trying to finetune the end.traineddata model as the steps in the link: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-%C2%B1-a-few-characters

As the tutorail shows, I fine tuning for ± a few characters following the steps.

But when I execute the first command, to generate new training and eval data:
training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only \
  --noextract_font_properties --langdata_dir ../langdata \
  --tessdata_dir ./tessdata --output_dir ~/tesstutorial/trainplusminus

An error is prompted: Creation of encoded unicharset failed! While constructing LSTM training data.

More details refer to the image.

Can you help me? Thanks.


ShreeDevi Kumar

unread,
Aug 10, 2017, 12:28:24 PM8/10/17
to tesser...@googlegroups.com
​Seems to work fine for me.

Are you sure that you have relevant files in the  directories listed in that command?

check tessdata, langdata location.

Use tessdata/best/*.traineddata as the existing models.​

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1c40ba47-a6e5-4ec9-bf58-677bcdb2f74b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages