Assert failed:in file weightmatrix.cpp, line 244

Emiliano Isaza Villamizar

unread,

Jul 23, 2018, 4:56:45 PM7/23/18

to tesseract-ocr

Hello everyone,

'm trying to train tesseract to improve the detection of some prices such as: CN¥2,400.48. I got got to a point that I keep getting this error:

total=`cat data/all-lstmf | wc -l` \

no=`echo "$total * 0.90 / 1" | bc`; \

head -n "$no" data/all-lstmf > "data/list.train"

total=`cat data/all-lstmf | wc -l` \

no=`echo "($total - $total * 0.90) / 1" | bc`; \

tail -n "+$no" data/all-lstmf > "data/list.eval"

combine_lang_model \

--input_unicharset data/unicharset \

--script_dir /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master \

--words /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.wordlist \

--numbers /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.numbers \

--puncs /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.punc \

--output_dir data/ \

--lang eng

Loaded unicharset of size 113 from file data/unicharset

Setting unichar properties

Other case É of é is not in unicharset

Setting script properties

Config file is optional, continuing...

Failed to read data from: /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.config

Null char=2

Reducing Trie to SquishedDawg

mkdir -p data/checkpoints

lstmtraining \

--continue_from /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm \

--old_traineddata /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.traineddata \

--traineddata data/eng/eng.traineddata \

--model_output data/checkpoints/eng \

--debug_interval -1 \

--train_listfile data/list.train \

--eval_listfile data/list.eval \

--sequential_training \

--max_iterations 3000

Loaded file /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm, unpacking...

Warning: LSTMTrainer deserialized an LSTMRecognizer!

Code range changed from 111 to 112!

Num (Extended) outputs,weights in Series:

1,36,0,1:1, 0

Num (Extended) outputs,weights in Series:

C3,3:9, 0

Ft16:16, 160

Total weights = 160

[C3,3Ft16]:16, 160

Mp3,3:16, 0

Lfys64:64, 20736

Lfx96:96, 61824

Lrx96:96, 74112

Lfx512:512, 1247232

Fc112:112, 0

Total weights = 1404064

Previous null char=110 mapped to 111

Continuing from /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm

Loaded 1/1 pages (1-1) of document /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/72b.lstmf

Loaded 1/1 pages (1-1) of document /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/67e.lstmf

Loaded 1/1 pages (1-1) of document /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/75c.lstmf

Loaded 1/1 pages (1-1) of document /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/48b.lstmf

Iteration 0: ALIGNED TRUTH : CN¥2,400.48

Iteration 0: BEST OCR TEXT : ₩₩₩N₩₩4₩0₩0₩4₩8

File /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/72b.lstmf page 0 :

!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244

Makefile:111: recipe for target 'data/checkpoints/eng_checkpoint' failed

make: *** [data/checkpoints/eng_checkpoint] Segmentation fault (core dumped)

I already tried to download the best/tessdata eng.traineddata and replacing it in the continue_from but I haven't been able to pass this mistake. Any thoughts?

Shree Devi Kumar

unread,

Jul 24, 2018, 1:05:22 AM7/24/18

to tesser...@googlegroups.com

Which version of tesseract are you using?

Please post output of

tesseract -v

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6152d324-0713-4de6-b646-162923273b63%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lorenzo Bolzani

unread,

Jul 24, 2018, 4:16:27 AM7/24/18

to tesser...@googlegroups.com

I had this error when I was mixing best models with non best models.

I would try to run again

combine_tessdata -e base_model/eng.traineddata base_model/eng.lstm

to generate the eng.lstm from the "_best" model (the ones from /usr/share/tessdata are not the "_best" models).

Then if the error is still there, just to be sure I do not really know if it matters, I would also recreate the lstmf files.

Lorenzo

--

You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

Emiliano Isaza Villamizar

unread,

Jul 24, 2018, 10:07:37 AM7/24/18

to tesseract-ocr

I'm using OCR-D that uses 4.0.0-beta.1

Emiliano Isaza Villamizar

unread,

Jul 24, 2018, 11:16:55 AM7/24/18

to tesseract-ocr

I'm using OCR-D I compiled it again changing the .traineddata in the original file but it hasn't worked. I still get the same error.

Iteration 0: ALIGNED TRUTH : Zhejiang Huamei Holding Co Ltd

Iteration 0: BEST OCR TEXT : ₩Z₩h₩e₩j₩i₩a₩n₩ ₩₩u₩a₩m₩e ₩₩o₩₩d₩i₩n₩ ₩C₩o ₩L₩₩d

File /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/44c.lstmf page 0 :

!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244

Makefile:111: recipe for target 'data/checkpoints/eng_checkpoint' failed

make: *** [data/checkpoints/eng_checkpoint] Segmentation fault (core dumped)

I ran make clean and re run it to make the lstmf files but got the same error.

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

shree

unread,

Jul 24, 2018, 12:40:34 PM7/24/18

to tesseract-ocr

--continue_from /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm \
--old_traineddata /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.traineddata \

Use eng.traineddata from tessdata_best

https://github.com/tesseract-ocr/tessdata_best

and extract the lstm file from it.

Emiliano Isaza Villamizar

unread,

Jul 24, 2018, 5:41:45 PM7/24/18

to tesseract-ocr

It worked maybe I was using another eng.traineddata. Thank you for your time Shree and Lorenzo

kind regards,

Emiliano

On Tuesday, July 24, 2018 at 11:40:34 AM UTC-5, shree wrote:

--continue_from /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm \
--old_traineddata /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/ \

Message has been deleted

Emiliano Isaza Villamizar

unread,

Jul 24, 2018, 5:52:59 PM7/24/18

to tesseract-ocr

If anyone is following this thread and are using OCR-D, I had to modify the .py file because I kept getting a Unicode error, just add these lines to the file:

import sys

reload(sys)

sys.setdefaultencoding('utf-8')

Reply all

Reply to author

Forward