Assert failed:in file weightmatrix.cpp, line 249

22 views
Skip to first unread message

Kristóf Horváth

unread,
Feb 4, 2019, 7:32:26 AM2/4/19
to tesseract-ocr
Im using Cygwin (64, on win10) to compile tesseract and  I ran the following commands and got the following error:
kh@DSAD-6 /usr/share/tessdata
$ tesstrain.sh --fonts_dir /usr/share/fonts --fontlist "Arial" "Impact Condensed" --lang eng --linedata_only --noextract_font_properties --langdata_dir ~/langdata/ --tessdata_dir ./ --output_dir ~/tesstutorial/engtrain

=== Starting training for language 'eng'
[Mon, Feb 4, 2019 1:17:48 PM] /usr/bin/text2image --fonts_dir=/usr/share/fonts --font=Arial --outputbase=/tmp/font_tmp.bEkR4qa83g/sample_text.txt --text=/tmp/font_tmp.bEkR4qa83g/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.bEkR4qa83g
Rendered page 0 to file /tmp/font_tmp.bEkR4qa83g/sample_text.txt.tif

=== Phase I: Generating training images ===
Rendering using Arial
[Mon, Feb 4, 2019 1:17:51 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.bEkR4qa83g --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/eng-2019-02-04.pCA/eng.Arial.exp0 --max_pages=0 --font=Arial --text=/home/kh/langdata//eng/eng.training_text
Rendering using Impact Condensed
Rendered page 0 to file /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.tif
[Mon, Feb 4, 2019 1:17:52 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.bEkR4qa83g --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0 --max_pages=0 --font=Impact Condensed --text=/home/kh/langdata//eng/eng.training_text
Rendered page 1 to file /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.tif
Rendered page 0 to file /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.tif
Rendered page 1 to file /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.tif

=== Phase UP: Generating unicharset and unichar properties files ===
[Mon, Feb 4, 2019 1:17:55 PM] /usr/bin/unicharset_extractor --output_unicharset /tmp/eng-2019-02-04.pCA/eng.unicharset --norm_mode 1 /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.box /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.box
Extracting unicharset from box file /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.box
Extracting unicharset from box file /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.box
Other case É of é is not in unicharset
Wrote unicharset file /tmp/eng-2019-02-04.pCA/eng.unicharset
[Mon, Feb 4, 2019 1:17:55 PM] /usr/bin/set_unicharset_properties -U /tmp/eng-2019-02-04.pCA/eng.unicharset -O /tmp/eng-2019-02-04.pCA/eng.unicharset -X /tmp/eng-2019-02-04.pCA/eng.xheights --script_dir=/home/kh/langdata/
Loaded unicharset of size 111 from file /tmp/eng-2019-02-04.pCA/eng.unicharset
Setting unichar properties
Other case É of é is not in unicharset
Setting script properties
Warning: properties incomplete for index 25 = ~
Writing unicharset to file /tmp/eng-2019-02-04.pCA/eng.unicharset

=== Phase E: Generating lstmf files ===
Using TESSDATA_PREFIX=./
[Mon, Feb 4, 2019 1:17:56 PM] /usr/local/bin/tesseract /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.tif /tmp/eng-2019-02-04.pCA/eng.Arial.exp0 --psm 6 lstm.train
[Mon, Feb 4, 2019 1:17:56 PM] /usr/local/bin/tesseract /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.tif /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0 with Leptonica
Page 1
Tesseract Open Source OCR Engine v4.0.0 with Leptonica
Page 1
Page 2
Page 2
Loaded 49/49 pages (1-49) of document /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.lstmf
Loaded 52/52 pages (1-52) of document /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.lstmf

=== Constructing LSTM training data ===
[Mon, Feb 4, 2019 1:17:57 PM] /usr/bin/combine_lang_model --input_unicharset /tmp/eng-2019-02-04.pCA/eng.unicharset --script_dir /home/kh/langdata/ --words /home/kh/langdata//eng/eng.wordlist --numbers /home/kh/langdata//eng/eng.numbers --puncs /home/kh/langdata//eng/eng.punc --output_dir /home/kh/tesstutorial/engtrain --lang eng
Loaded unicharset of size 111 from file /tmp/eng-2019-02-04.pCA/eng.unicharset
Setting unichar properties
Other case É of é is not in unicharset
Setting script properties
Config file is optional, continuing...
Failed to read data from: /home/kh/langdata//eng/eng.config
Null char=2
Reducing Trie to SquishedDawg
Reducing Trie to SquishedDawg
Reducing Trie to SquishedDawg

=== Moving lstmf files for training data ===
Moving /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.lstmf to /home/kh/tesstutorial/engtrain
Moving /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.lstmf to /home/kh/tesstutorial/engtrain

Created starter traineddata for language 'eng'


Run lstmtraining to do the LSTM training for language 'eng'


kh@DSAD-6 /usr/share/tessdata
$ tesstrain.sh --fonts_dir /usr/share/fonts --fontlist "Impact Condensed" --lang eng --linedata_only --noextract_font_properties --langdata_dir ~/langdata/ --tessdata_dir ./ --output_dir ~/tesstutorial/engeval

=== Starting training for language 'eng'
[Mon, Feb 4, 2019 1:21:10 PM] /usr/bin/text2image --fonts_dir=/usr/share/fonts --font=Impact Condensed --outputbase=/tmp/font_tmp.e96rRhOoQ5/sample_text.txt --text=/tmp/font_tmp.e96rRhOoQ5/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.e96rRhOoQ5
Rendered page 0 to file /tmp/font_tmp.e96rRhOoQ5/sample_text.txt.tif

=== Phase I: Generating training images ===
Rendering using Impact Condensed
[Mon, Feb 4, 2019 1:21:14 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.e96rRhOoQ5 --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0 --max_pages=0 --font=Impact Condensed --text=/home/kh/langdata//eng/eng.training_text
Rendered page 0 to file /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.tif
Rendered page 1 to file /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.tif

=== Phase UP: Generating unicharset and unichar properties files ===
[Mon, Feb 4, 2019 1:21:16 PM] /usr/bin/unicharset_extractor --output_unicharset /tmp/eng-2019-02-04.TL6/eng.unicharset --norm_mode 1 /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.box
Extracting unicharset from box file /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.box
Other case É of é is not in unicharset
Wrote unicharset file /tmp/eng-2019-02-04.TL6/eng.unicharset
[Mon, Feb 4, 2019 1:21:17 PM] /usr/bin/set_unicharset_properties -U /tmp/eng-2019-02-04.TL6/eng.unicharset -O /tmp/eng-2019-02-04.TL6/eng.unicharset -X /tmp/eng-2019-02-04.TL6/eng.xheights --script_dir=/home/kh/langdata/
Loaded unicharset of size 111 from file /tmp/eng-2019-02-04.TL6/eng.unicharset
Setting unichar properties
Other case É of é is not in unicharset
Setting script properties
Warning: properties incomplete for index 25 = ~
Writing unicharset to file /tmp/eng-2019-02-04.TL6/eng.unicharset

=== Phase E: Generating lstmf files ===
Using TESSDATA_PREFIX=./
[Mon, Feb 4, 2019 1:21:17 PM] /usr/local/bin/tesseract /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.tif /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0 --psm 6 lstm.train
Tesseract Open Source OCR Engine v4.0.0 with Leptonica
Page 1
Page 2
Loaded 49/49 pages (1-49) of document /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.lstmf

=== Constructing LSTM training data ===
[Mon, Feb 4, 2019 1:21:19 PM] /usr/bin/combine_lang_model --input_unicharset /tmp/eng-2019-02-04.TL6/eng.unicharset --script_dir /home/kh/langdata/ --words /home/kh/langdata//eng/eng.wordlist --numbers /home/kh/langdata//eng/eng.numbers --puncs /home/kh/langdata//eng/eng.punc --output_dir /home/kh/tesstutorial/engeval --lang eng
Loaded unicharset of size 111 from file /tmp/eng-2019-02-04.TL6/eng.unicharset
Setting unichar properties
Other case É of é is not in unicharset
Setting script properties
Config file is optional, continuing...
Failed to read data from: /home/kh/langdata//eng/eng.config
Null char=2
Reducing Trie to SquishedDawg
Reducing Trie to SquishedDawg
Reducing Trie to SquishedDawg

=== Moving lstmf files for training data ===
Moving /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.lstmf to /home/kh/tesstutorial/engeval

Created starter traineddata for language 'eng'


Run lstmtraining to do the LSTM training for language 'eng'


kh@DSAD-6 /usr/share/tessdata
$ combine_tessdata -e ./eng.traineddata  ~/tesstutorial/engoutput/eng.lstm
Extracting tessdata components from ./eng.traineddata
Wrote /home/kh/tesstutorial/engoutput/eng.lstm
Version string:4.00.00alpha:eng:synth20170629
17:lstm:size=401636, offset=192
18:lstm-punc-dawg:size=4322, offset=401828
19:lstm-word-dawg:size=3694794, offset=406150
20:lstm-number-dawg:size=4738, offset=4100944
21:lstm-unicharset:size=6360, offset=4105682
22:lstm-recoder:size=1012, offset=4112042
23:version:size=30, offset=4113054

kh@DSAD-6 /usr/share/tessdata
$ lstmtraining --model_output ~/tesstutorial/engoutput/impact --continue_from ~/tesstutorial/engoutput/eng.lstm --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata --old_traineddata ./eng.traineddata --max_iterations 3600 -train_listfile ~/tesstutorial/engtrain/eng.training_files.txt
Loaded file /home/kh/tesstutorial/engoutput/eng.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 111 to 110!
Num (Extended) outputs,weights in Series:
  1,36,0,1:1, 0
Num (Extended) outputs,weights in Series:
  C3,3:9, 0
  Ft16:16, 160
Total weights = 160
  [C3,3Ft16]:16, 160
  Mp3,3:16, 0
  Lfys48:48, 12480
  Lfx96:96, 55680
  Lrx96:96, 74112
  Lfx192:192, 221952
  Fc110:110, 0
Total weights = 364384
Previous null char=110 mapped to 109
Continuing from /home/kh/tesstutorial/engoutput/eng.lstm
Loaded 72/72 pages (1-72) of document /home/kh/tesstutorial/engtrain/eng.Arial.exp0.lstmf
Loaded 72/72 pages (1-72) of document /home/kh/tesstutorial/engtrain/eng.Impact_Condensed.exp0.lstmf
!int_mode_:Error:Assert failed:in file /cygdrive/d/cyg_pub/devel/tesseract/tesseract-ocr-4.0.0-1.x86_64/src/tesseract-4.0.0/src/lstm/weightmatrix.cpp, line 249
!int_mode_:Error:Assert failed:in file /cygdrive/d/cyg_pub/devel/tesseract/tesseract-ocr-4.0.0-1.x86_64/src/tesseract-4.0.0/src/lstm/weightmatrix.cpp, line 249
!int_mode_:Error:Assert failed:in file /cygdrive/d/cyg_pub/devel/tesseract/tesseract-ocr-4.0.0-1.x86_64/src/tesseract-4.0.0/src/lstm/weightmatrix.cpp, line 249
!int_mode_:Error:Assert failed:in file /cygdrive/d/cyg_pub/devel/tesseract/tesseract-ocr-4.0.0-1.x86_64/src/tesseract-4.0.0/src/lstm/weightmatrix.cpp, line 249

Shree Devi Kumar

unread,
Feb 4, 2019, 7:40:26 AM2/4/19
to tesser...@googlegroups.com
kh@DSAD-6 /usr/share/tessdata
$ combine_tessdata -e ./eng.traineddata  ~/tesstutorial/engoutput/eng.lstm
Extracting tessdata components from ./eng.traineddata
Wrote /home/kh/tesstutorial/engoutput/eng.lstm

You need the traineddata from tessdata_best repo for use with training. 

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/635612c6-2e9f-4034-9bad-f80eb044b298%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Kristóf Horváth

unread,
Feb 4, 2019, 8:03:16 AM2/4/19
to tesseract-ocr
thx see this could be in the documentation it would be super awsome but dont worry you dont have to do anything just answer my upcoming questions and i will write it, but also gonna need a review on my final draft just to make sure my wording and the facts i managed to dig up are correct
Reply all
Reply to author
Forward
0 new messages