mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file ../../src/lstm/lstmtrainer.h, line 110

67 views
Skip to first unread message

fady taher

unread,
Jun 18, 2019, 10:40:57 AM6/18/19
to tesseract-ocr
Am trying to fine tune tesseract

but I keep getting the error mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file ../../src/lstm/lstmtrainer.h, line 110  on the training statement.

My script looks as follows

cd /home/sw/repo/tesseract-ocr
  
mkdir -p ~/tesstutorial/
mkdir -p ~/tesstutorial/trainplusminus
mkdir -p ~/tesstutorial/evalplusminus


src/training/tesstrain.sh  --fontlist "Times New Roman" --lang eng --linedata_only   --noextract_font_properties --langdata_dir /home/sw/repo/langdata   --tessdata_dir /home/sw/repo/tessdata --output_dir ~/tesstutorial/trainplusminus

src/training/tesstrain.sh  --fontlist "Times New Roman" --lang eng --linedata_only   --noextract_font_properties --langdata_dir /home/sw/repo/langdata/eng   --tessdata_dir /home/sw/repo/tessdata   --output_dir ~/tesstutorial/evalplusminus


#eng.lstm file gets extracted correctly
src/training/combine_tessdata -e /home/sw/repo/tessdata/eng.traineddata   ~/tesstutorial/trainplusminus/eng.lstm

#this command fails and throws the error
src/training/lstmtraining --model_output ~/tesstutorial/trainplusminus/plusminus \
   --continue_from ~/tesstutorial/trainplusminus/eng.lstm  \
   --traineddata ~/tesstutorial/trainplusminus/eng/eng.traineddata   \
   --old_traineddata /home/sw/repo/tessdata/eng.traineddata   \
   --train_listfile ~/tesstutorial/trainplusminus/eng.training_files.txt   \
   --max_iterations 400
   

src/training/lstmtraining --stop_training \
  --continue_from ~/tesstutorial/trainplusminus/plusminus_checkpoint \
  --traineddata ~/tesstutorial/trainplusminus/eng/eng.traineddata \
  --model_output ~/tesstutorial/eng_final.traineddata
  
cp ~/tesstutorial/eng_final.traineddata /usr/share/tesseract/4/tessdata/eng.traineddata


I have download the eng.traineddata from "Best" repo though, anyone can help ?

Shree Devi Kumar

unread,
Jun 18, 2019, 10:46:59 AM6/18/19
to tesser...@googlegroups.com
Check ~/tesstutorial/trainplusminus
Did your earlier training complete correctly? Does ~/tesstutorial/trainplusminus/eng/eng.traineddata exist?

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/00310d99-1fc9-402f-b0fa-d048486d77b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

fady taher

unread,
Jun 18, 2019, 11:16:14 AM6/18/19
to tesser...@googlegroups.com
Nop, this file doesn't exist yet
only contains 
eng.charset_size=110.txt
eng.unicharset


Shree Devi Kumar

unread,
Jun 18, 2019, 11:18:22 AM6/18/19
to tesser...@googlegroups.com
That means 

src/training/tesstrain.sh  --fontlist "Times New Roman" --lang eng --linedata_only   --noextract_font_properties --langdata_dir /home/sw/repo/langdata   --tessdata_dir /home/sw/repo/tessdata --output_dir ~/tesstutorial/trainplusminus 

did not complete correctly. 


For more options, visit https://groups.google.com/d/optout.

fady taher

unread,
Jun 18, 2019, 11:21:39 AM6/18/19
to tesser...@googlegroups.com
the output  of

src/training/tesstrain.sh  --fontlist "Times New Roman" --lang eng --linedata_only   --noextract_font_properties --langdata_dir /home/sw/repo/langdata   --tessdata_dir /home/sw/repo/tessdata --output_dir ~/tesstutorial/trainplusminus

is

....
....
[Tue Jun 18 17:19:46 EET 2019] /usr/local/bin/combine_lang_model --input_unicharset /tmp/eng-2019-06-18.baG/eng.unicharset --script_dir /home/sw/repo/langdata --words /home/sw/repo/langdata/eng/eng.wordlist --numbers /home/sw/repo/langdata/eng/eng.numbers --puncs /home/sw/repo/langdata/eng/eng.punc --output_dir /home/sw/tesstutorial/trainplusminus --lang eng
Loaded unicharset of size 111 from file /tmp/eng-2019-06-18.baG/eng.unicharset
Setting unichar properties
Other case É of é is not in unicharset
Setting script properties
Warning: properties incomplete for index 95 = ~
Config file is optional, continuing...
Failed to read data from: /home/sw/repo/langdata/eng/eng.config
Null char=2
Reducing Trie to SquishedDawg
Error during conversion of wordlists to DAWGs!!


Shree Devi Kumar

unread,
Jun 18, 2019, 11:38:28 AM6/18/19
to tesser...@googlegroups.com
Have you modified any word lists, training_text etc?

What is your tesseract version?

Which o/s?

fady taher

unread,
Jun 18, 2019, 11:38:33 AM6/18/19
to tesser...@googlegroups.com
it seems the problem was copying langdata from windows to linux, I have redownload them on linux and it worked, will retry again

fady taher

unread,
Jun 18, 2019, 11:42:36 AM6/18/19
to tesser...@googlegroups.com
it seems the problem was copying langdata from windows to linux, I have redownload them on linux and it worked, will retry again.

Thanks alot shree for your support 

Reply all
Reply to author
Forward
0 new messages