Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110

467 views
Skip to first unread message

roberty...@gmail.com

unread,
Aug 4, 2017, 2:33:41 AM8/4/17
to tesseract-ocr
Hello,

I use the 'git pull' command to update the code from the link https://github.com/tesseract-ocr/tesseract.git, and I recompile, reinstall the Tess4.0.

But when I execute the command (showed in below) to finetune the traineddata, an error appears: "mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110"

lstmtraining --model_output ~/tesstutorial/chituned_from_chisim/chituned \
--continue_from ~/tesstutorial/chituned_from_chisim/chi_sim.lstm \
--train_listfile ~/tesstutorial/chitest/chi_sim.training_files.txt \
--eval_listfile ~/tesstutorial/chitest/chi_sim.training_files.txt \
--target_error_rate 0.01



There is nothing wrong with the Tess before updating the code. But now, An assertion error crashes. Why? Can you help me?

roberty...@gmail.com

unread,
Aug 4, 2017, 4:54:29 AM8/4/17
to tesseract-ocr
The code seems to have changed a lot, as well as the training commands and corresponding tutorials. The changes can refer to https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00.

在 2017年8月4日星期五 UTC+8下午2:33:41,roberty...@gmail.com写道:

Ava Nimaee

unread,
Aug 7, 2017, 8:45:21 AM8/7/17
to tesseract-ocr
hi how can you solve it? i have this error too.
please help me

ShreeDevi Kumar

unread,
Aug 7, 2017, 1:58:05 PM8/7/17
to tesser...@googlegroups.com
You also need to provide a traineddata file as input

Please review the updated training instructions in the wiki and change the training commands accordingly.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7c66d368-f232-4eed-abfc-3bba2418f024%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ava Nimaee

unread,
Aug 14, 2017, 2:26:03 AM8/14/17
to tesseract-ocr
I have traineddata in this path: /home/zohreh/tesstutorial/engtrian/eng/eng.traineddata.
that with using :
training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng    --training_text training/langdata/eng/eng.training_text     --linedata_only \
  --noextract_font_properties --langdata_dir training/langdata \
  --tessdata_dir ./tessdata \
  --fontlist "Times New Roman," --output_dir ~/tesstutorial/engtrian
i created it.
And also i used the link that u sent me.
sorry shree but i  tried alot but i couldn't solve that.


On Monday, August 7, 2017 at 10:28:05 PM UTC+4:30, shree wrote:
You also need to provide a traineddata file as input

Please review the updated training instructions in the wiki and change the training commands accordingly.
On 07-Aug-2017 6:15 PM, "Ava Nimaee" <beigy....@gmail.com> wrote:
hi how can you solve it? i have this error too.
please help me

On Friday, August 4, 2017 at 11:03:41 AM UTC+4:30, roberty...@gmail.com wrote:
Hello,

I use the 'git pull' command to update the code from the link https://github.com/tesseract-ocr/tesseract.git, and I recompile, reinstall the Tess4.0.

But when I execute the command (showed in below) to finetune the traineddata, an error appears: "mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110"

lstmtraining --model_output ~/tesstutorial/chituned_from_chisim/chituned \
--continue_from ~/tesstutorial/chituned_from_chisim/chi_sim.lstm \
--train_listfile ~/tesstutorial/chitest/chi_sim.training_files.txt \
--eval_listfile ~/tesstutorial/chitest/chi_sim.training_files.txt \
--target_error_rate 0.01



There is nothing wrong with the Tess before updating the code. But now, An assertion error crashes. Why? Can you help me?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

roberty...@gmail.com

unread,
Aug 14, 2017, 4:30:02 AM8/14/17
to tesseract-ocr
 What problems do you encounter? Please give more information about the problems.

I later used the new tutorial (https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact) to train data, and I didn't have any problems. Hope to help you.

Ava Nimaee

unread,
Aug 15, 2017, 5:45:17 AM8/15/17
to tesseract-ocr
Hi thanks for your help
i used your link. but i got this error:
mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110
Segmentation fault (core dumped)
I wanna start train persian language.so im trying english first. i creat boxfile and unicharset .then eng.charset_size=110.txt ,eng.Times_New_Roman.exp0.lstmf , eng.traineddata , eng.training_files.txt , eng.unicharset
that all of those have created with this syntax:
training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng    --training_text training/langdata/eng/eng.training_text     --linedata_only \
  --noextract_font_properties --langdata_dir training/langdata \
  --tessdata_dir ./tessdata \
  --fontlist "Times New Roman," --output_dir ~/tesstutorial/engtrian
and now i have error that i told you

roberty...@gmail.com

unread,
Aug 15, 2017, 9:20:18 PM8/15/17
to tesseract-ocr
Hi, I don't encounter this error.

But you may check your traineddata whether in the correct directory, as well as some other paths.

在 2017年8月15日星期二 UTC+8下午5:45:17,Ava Nimaee写道:

Ava Nimaee

unread,
Aug 16, 2017, 3:38:34 AM8/16/17
to tesseract-ocr
Thanks alot. you're right .
the path shoulde be compelet i used /home/zohreh/Desktop/tesseract-master/z/engtrian/eng/eng.traineddata  insted of z/engtrain/eng/eng.traineddata.
it just can write the path from root

Ava Nimaee

unread,
Aug 16, 2017, 6:20:53 AM8/16/17
to tesseract-ocr
sorry i have a qustion:
what is the output of this syntax.because i after that i have alot of  base44.409_2195.checkpoint. but in tutorials i saw eng.lstm
and i have not that. whic syntax create eng.lstm?

I must thank you for your support at this time

ShreeDevi Kumar

unread,
Aug 16, 2017, 11:37:47 AM8/16/17
to tesser...@googlegroups.com
Please check the updated tutorials in the wiki. There have been many changes.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Ava Nimaee

unread,
Aug 21, 2017, 4:13:22 AM8/21/17
to tesseract-ocr
 Hi shree, Thanks alot for attention.
i corrected all syntax and i can generate some base70.229_1900.checkpoint and have just files hike it.
but in tutorials, there is eng.lstm. how can i create it . actually what is eng.lstm.
and what is lstm-punc-dawg? it is similar eng.punc's file that Mr.Smit put in landgata/eng?
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

ShreeDevi Kumar

unread,
Aug 21, 2017, 11:08:28 AM8/21/17
to tesser...@googlegroups.com
training/combine_tessdata -e tessdata/best/eng.traineddata \
  ~/tesstutorial/impact_from_full/eng.lstm

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

ShreeDevi Kumar

unread,
Aug 21, 2017, 11:11:41 AM8/21/17
to tesser...@googlegroups.com
lstm file is the language model. It is saved in traineddata file.

dawgs are a kind of compressed files, created from lists of words, punctuation or numbers.

You can use dawg2wordlist to unpack them.

Please follow the instructions on the training wiki page.

Ava Nimaee

unread,
Aug 26, 2017, 11:39:44 AM8/26/17
to tesseract-ocr
Thank alot for your attention.i follow the  instructions on the training wiki page but really it is confusing in somecircumstance.
So i say again thanks

Ava Nimaee

unread,
Aug 28, 2017, 5:31:59 AM8/28/17
to tesseract-ocr
Hi shree
I read instructions on the training wiki page but i dont have eng.lstm
non of the syntaxs create eng.lstm. how can i create it. even i check langdata which i download it form git amd there is't there.
i spend alot of time but i don't khonw how i can create it.
can you tell me.


On Monday, August 21, 2017 at 7:41:41 PM UTC+4:30, shree wrote:

ShreeDevi Kumar

unread,
Aug 28, 2017, 7:10:24 AM8/28/17
to tesser...@googlegroups.com

The following command extracts the .lstm file from the .traineddata file.

training/combine_tessdata -e tessdata/best/eng.traineddata \
  ~/tesstutorial/impact_from_full/eng.lstm

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
Reply all
Reply to author
Forward
0 new messages