Assert failed:in file weightmatrix.cpp, line 244

345 views
Skip to first unread message

Emiliano Isaza Villamizar

unread,
Jul 23, 2018, 4:56:45 PM7/23/18
to tesseract-ocr
Hello everyone,


'm trying to train tesseract to improve the detection of some prices such as: CN¥2,400.48. I got got to a point that I keep getting this error:

total=`cat data/all-lstmf | wc -l` \
   no=`echo "$total * 0.90 / 1" | bc`; \
   head -n "$no" data/all-lstmf > "data/list.train"
total=`cat data/all-lstmf | wc -l` \
   no=`echo "($total - $total * 0.90) / 1" | bc`; \
   tail -n "+$no" data/all-lstmf > "data/list.eval"
combine_lang_model \
  --input_unicharset data/unicharset \
  --script_dir /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master \
  --words /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.wordlist \
  --numbers /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.numbers \
  --puncs /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.punc \
  --output_dir data/ \
  --lang eng
Loaded unicharset of size 113 from file data/unicharset
Setting unichar properties
Other case É of é is not in unicharset
Setting script properties
Config file is optional, continuing...
Failed to read data from: /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.config
Null char=2
Reducing Trie to SquishedDawg
Reducing Trie to SquishedDawg
Reducing Trie to SquishedDawg
mkdir -p data/checkpoints
lstmtraining \
  --continue_from   /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm \
  --old_traineddata /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.traineddata \
  --traineddata data/eng/eng.traineddata \
  --model_output data/checkpoints/eng \
  --debug_interval -1 \
  --train_listfile data/list.train \
  --eval_listfile data/list.eval \
  --sequential_training \
  --max_iterations 3000
Loaded file /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 111 to 112!
Num (Extended) outputs,weights in Series:
  1,36,0,1:1, 0
Num (Extended) outputs,weights in Series:
  C3,3:9, 0
  Ft16:16, 160
Total weights = 160
  [C3,3Ft16]:16, 160
  Mp3,3:16, 0
  Lfys64:64, 20736
  Lfx96:96, 61824
  Lrx96:96, 74112
  Lfx512:512, 1247232
  Fc112:112, 0
Total weights = 1404064
Previous null char=110 mapped to 111
Continuing from /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm
Loaded 1/1 pages (1-1) of document /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/72b.lstmf
Loaded 1/1 pages (1-1) of document /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/67e.lstmf
Loaded 1/1 pages (1-1) of document /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/75c.lstmf
Loaded 1/1 pages (1-1) of document /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/48b.lstmf
Iteration 0: ALIGNED TRUTH : CN¥2,400.48
Iteration 0: BEST OCR TEXT : ₩₩₩N₩₩4₩0₩0₩4₩8
File /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/72b.lstmf page 0 :
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
Makefile:111: recipe for target 'data/checkpoints/eng_checkpoint' failed
make: *** [data/checkpoints/eng_checkpoint] Segmentation fault (core dumped)

I already tried to download the best/tessdata eng.traineddata and replacing it in the continue_from but I haven't been able to pass this mistake. Any thoughts?

Shree Devi Kumar

unread,
Jul 24, 2018, 1:05:22 AM7/24/18
to tesser...@googlegroups.com
Which version of tesseract are you using?

Please post output of

tesseract -v

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6152d324-0713-4de6-b646-162923273b63%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lorenzo Bolzani

unread,
Jul 24, 2018, 4:16:27 AM7/24/18
to tesser...@googlegroups.com
I had this error when I was mixing best models with non best models.

I would try to run again

combine_tessdata -e base_model/eng.traineddata base_model/eng.lstm

to generate the eng.lstm from the "_best" model (the ones from /usr/share/tessdata are not the "_best" models).

Then if the error is still there, just to be sure I do not really know if it matters, I would also recreate the lstmf files.


Lorenzo


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

Emiliano Isaza Villamizar

unread,
Jul 24, 2018, 10:07:37 AM7/24/18
to tesseract-ocr
I'm using OCR-D that uses 4.0.0-beta.1

Emiliano Isaza Villamizar

unread,
Jul 24, 2018, 11:16:55 AM7/24/18
to tesseract-ocr
I'm using OCR-D I compiled it again changing the .traineddata in the original file but it hasn't worked. I still get the same error.

Iteration 0: ALIGNED TRUTH : Zhejiang Huamei Holding Co Ltd
Iteration 0: BEST OCR TEXT : ₩Z₩h₩e₩j₩i₩a₩n₩ ₩₩u₩a₩m₩e ₩₩o₩₩d₩i₩n₩ ₩C₩o ₩L₩₩d
File /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/44c.lstmf page 0 :
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
Makefile:111: recipe for target 'data/checkpoints/eng_checkpoint' failed
make: *** [data/checkpoints/eng_checkpoint] Segmentation fault (core dumped)

I ran make clean and re run it to make the lstmf files but got the same error.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

shree

unread,
Jul 24, 2018, 12:40:34 PM7/24/18
to tesseract-ocr
  --continue_from   /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm \
  --old_traineddata /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.traineddata \

Use eng.traineddata from tessdata_best

and extract the lstm file from it. 

Emiliano Isaza Villamizar

unread,
Jul 24, 2018, 5:41:45 PM7/24/18
to tesseract-ocr
It worked maybe I was using another eng.traineddata. Thank you for your time Shree and Lorenzo 

kind regards,
Emiliano 

On Tuesday, July 24, 2018 at 11:40:34 AM UTC-5, shree wrote:
  --continue_from   /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm \
  --old_traineddata /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/ \
Message has been deleted

Emiliano Isaza Villamizar

unread,
Jul 24, 2018, 5:52:59 PM7/24/18
to tesseract-ocr
If anyone is following this thread and are using OCR-D, I had to modify the .py file because I kept getting a Unicode error, just add these lines to the file:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')
Reply all
Reply to author
Forward
0 new messages