kh@DSAD-6 /usr/share/tessdata$ tesstrain.sh --fonts_dir /usr/share/fonts --fontlist "Arial" "Impact Condensed" --lang eng --linedata_only --noextract_font_properties --langdata_dir ~/langdata/ --tessdata_dir ./ --output_dir ~/tesstutorial/engtrain=== Starting training for language 'eng'[Mon, Feb 4, 2019 1:17:48 PM] /usr/bin/text2image --fonts_dir=/usr/share/fonts --font=Arial --outputbase=/tmp/font_tmp.bEkR4qa83g/sample_text.txt --text=/tmp/font_tmp.bEkR4qa83g/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.bEkR4qa83gRendered page 0 to file /tmp/font_tmp.bEkR4qa83g/sample_text.txt.tif=== Phase I: Generating training images ===Rendering using Arial[Mon, Feb 4, 2019 1:17:51 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.bEkR4qa83g --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/eng-2019-02-04.pCA/eng.Arial.exp0 --max_pages=0 --font=Arial --text=/home/kh/langdata//eng/eng.training_textRendering using Impact CondensedRendered page 0 to file /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.tif[Mon, Feb 4, 2019 1:17:52 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.bEkR4qa83g --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0 --max_pages=0 --font=Impact Condensed --text=/home/kh/langdata//eng/eng.training_textRendered page 1 to file /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.tifRendered page 0 to file /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.tifRendered page 1 to file /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.tif=== Phase UP: Generating unicharset and unichar properties files ===[Mon, Feb 4, 2019 1:17:55 PM] /usr/bin/unicharset_extractor --output_unicharset /tmp/eng-2019-02-04.pCA/eng.unicharset --norm_mode 1 /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.box /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.boxExtracting unicharset from box file /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.boxExtracting unicharset from box file /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.boxOther case É of é is not in unicharsetWrote unicharset file /tmp/eng-2019-02-04.pCA/eng.unicharset[Mon, Feb 4, 2019 1:17:55 PM] /usr/bin/set_unicharset_properties -U /tmp/eng-2019-02-04.pCA/eng.unicharset -O /tmp/eng-2019-02-04.pCA/eng.unicharset -X /tmp/eng-2019-02-04.pCA/eng.xheights --script_dir=/home/kh/langdata/Loaded unicharset of size 111 from file /tmp/eng-2019-02-04.pCA/eng.unicharsetSetting unichar propertiesOther case É of é is not in unicharsetSetting script propertiesWarning: properties incomplete for index 25 = ~Writing unicharset to file /tmp/eng-2019-02-04.pCA/eng.unicharset=== Phase E: Generating lstmf files ===Using TESSDATA_PREFIX=./[Mon, Feb 4, 2019 1:17:56 PM] /usr/local/bin/tesseract /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.tif /tmp/eng-2019-02-04.pCA/eng.Arial.exp0 --psm 6 lstm.train[Mon, Feb 4, 2019 1:17:56 PM] /usr/local/bin/tesseract /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.tif /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0 --psm 6 lstm.trainTesseract Open Source OCR Engine v4.0.0 with LeptonicaPage 1Tesseract Open Source OCR Engine v4.0.0 with LeptonicaPage 1Page 2Page 2Loaded 49/49 pages (1-49) of document /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.lstmfLoaded 52/52 pages (1-52) of document /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.lstmf=== Constructing LSTM training data ===[Mon, Feb 4, 2019 1:17:57 PM] /usr/bin/combine_lang_model --input_unicharset /tmp/eng-2019-02-04.pCA/eng.unicharset --script_dir /home/kh/langdata/ --words /home/kh/langdata//eng/eng.wordlist --numbers /home/kh/langdata//eng/eng.numbers --puncs /home/kh/langdata//eng/eng.punc --output_dir /home/kh/tesstutorial/engtrain --lang engLoaded unicharset of size 111 from file /tmp/eng-2019-02-04.pCA/eng.unicharsetSetting unichar propertiesOther case É of é is not in unicharsetSetting script propertiesConfig file is optional, continuing...Failed to read data from: /home/kh/langdata//eng/eng.configNull char=2Reducing Trie to SquishedDawgReducing Trie to SquishedDawgReducing Trie to SquishedDawg=== Moving lstmf files for training data ===Moving /tmp/eng-2019-02-04.pCA/eng.Arial.exp0.lstmf to /home/kh/tesstutorial/engtrainMoving /tmp/eng-2019-02-04.pCA/eng.Impact_Condensed.exp0.lstmf to /home/kh/tesstutorial/engtrainCreated starter traineddata for language 'eng'Run lstmtraining to do the LSTM training for language 'eng'kh@DSAD-6 /usr/share/tessdata$ tesstrain.sh --fonts_dir /usr/share/fonts --fontlist "Impact Condensed" --lang eng --linedata_only --noextract_font_properties --langdata_dir ~/langdata/ --tessdata_dir ./ --output_dir ~/tesstutorial/engeval=== Starting training for language 'eng'[Mon, Feb 4, 2019 1:21:10 PM] /usr/bin/text2image --fonts_dir=/usr/share/fonts --font=Impact Condensed --outputbase=/tmp/font_tmp.e96rRhOoQ5/sample_text.txt --text=/tmp/font_tmp.e96rRhOoQ5/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.e96rRhOoQ5Rendered page 0 to file /tmp/font_tmp.e96rRhOoQ5/sample_text.txt.tif=== Phase I: Generating training images ===Rendering using Impact Condensed[Mon, Feb 4, 2019 1:21:14 PM] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.e96rRhOoQ5 --fonts_dir=/usr/share/fonts --strip_unrenderable_words --leading=32 --xsize=3600 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0 --max_pages=0 --font=Impact Condensed --text=/home/kh/langdata//eng/eng.training_textRendered page 0 to file /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.tifRendered page 1 to file /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.tif=== Phase UP: Generating unicharset and unichar properties files ===[Mon, Feb 4, 2019 1:21:16 PM] /usr/bin/unicharset_extractor --output_unicharset /tmp/eng-2019-02-04.TL6/eng.unicharset --norm_mode 1 /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.boxExtracting unicharset from box file /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.boxOther case É of é is not in unicharsetWrote unicharset file /tmp/eng-2019-02-04.TL6/eng.unicharset[Mon, Feb 4, 2019 1:21:17 PM] /usr/bin/set_unicharset_properties -U /tmp/eng-2019-02-04.TL6/eng.unicharset -O /tmp/eng-2019-02-04.TL6/eng.unicharset -X /tmp/eng-2019-02-04.TL6/eng.xheights --script_dir=/home/kh/langdata/Loaded unicharset of size 111 from file /tmp/eng-2019-02-04.TL6/eng.unicharsetSetting unichar propertiesOther case É of é is not in unicharsetSetting script propertiesWarning: properties incomplete for index 25 = ~Writing unicharset to file /tmp/eng-2019-02-04.TL6/eng.unicharset=== Phase E: Generating lstmf files ===Using TESSDATA_PREFIX=./[Mon, Feb 4, 2019 1:21:17 PM] /usr/local/bin/tesseract /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.tif /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0 --psm 6 lstm.trainTesseract Open Source OCR Engine v4.0.0 with LeptonicaPage 1Page 2Loaded 49/49 pages (1-49) of document /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.lstmf=== Constructing LSTM training data ===[Mon, Feb 4, 2019 1:21:19 PM] /usr/bin/combine_lang_model --input_unicharset /tmp/eng-2019-02-04.TL6/eng.unicharset --script_dir /home/kh/langdata/ --words /home/kh/langdata//eng/eng.wordlist --numbers /home/kh/langdata//eng/eng.numbers --puncs /home/kh/langdata//eng/eng.punc --output_dir /home/kh/tesstutorial/engeval --lang engLoaded unicharset of size 111 from file /tmp/eng-2019-02-04.TL6/eng.unicharsetSetting unichar propertiesOther case É of é is not in unicharsetSetting script propertiesConfig file is optional, continuing...Failed to read data from: /home/kh/langdata//eng/eng.configNull char=2Reducing Trie to SquishedDawgReducing Trie to SquishedDawgReducing Trie to SquishedDawg=== Moving lstmf files for training data ===Moving /tmp/eng-2019-02-04.TL6/eng.Impact_Condensed.exp0.lstmf to /home/kh/tesstutorial/engevalCreated starter traineddata for language 'eng'Run lstmtraining to do the LSTM training for language 'eng'kh@DSAD-6 /usr/share/tessdata$ combine_tessdata -e ./eng.traineddata ~/tesstutorial/engoutput/eng.lstmExtracting tessdata components from ./eng.traineddataWrote /home/kh/tesstutorial/engoutput/eng.lstmVersion string:4.00.00alpha:eng:synth2017062917:lstm:size=401636, offset=19218:lstm-punc-dawg:size=4322, offset=40182819:lstm-word-dawg:size=3694794, offset=40615020:lstm-number-dawg:size=4738, offset=410094421:lstm-unicharset:size=6360, offset=410568222:lstm-recoder:size=1012, offset=411204223:version:size=30, offset=4113054kh@DSAD-6 /usr/share/tessdata$ lstmtraining --model_output ~/tesstutorial/engoutput/impact --continue_from ~/tesstutorial/engoutput/eng.lstm --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata --old_traineddata ./eng.traineddata --max_iterations 3600 -train_listfile ~/tesstutorial/engtrain/eng.training_files.txtLoaded file /home/kh/tesstutorial/engoutput/eng.lstm, unpacking...Warning: LSTMTrainer deserialized an LSTMRecognizer!Code range changed from 111 to 110!Num (Extended) outputs,weights in Series:1,36,0,1:1, 0Num (Extended) outputs,weights in Series:C3,3:9, 0Ft16:16, 160Total weights = 160[C3,3Ft16]:16, 160Mp3,3:16, 0Lfys48:48, 12480Lfx96:96, 55680Lrx96:96, 74112Lfx192:192, 221952Fc110:110, 0Total weights = 364384Previous null char=110 mapped to 109Continuing from /home/kh/tesstutorial/engoutput/eng.lstmLoaded 72/72 pages (1-72) of document /home/kh/tesstutorial/engtrain/eng.Arial.exp0.lstmfLoaded 72/72 pages (1-72) of document /home/kh/tesstutorial/engtrain/eng.Impact_Condensed.exp0.lstmf!int_mode_:Error:Assert failed:in file /cygdrive/d/cyg_pub/devel/tesseract/tesseract-ocr-4.0.0-1.x86_64/src/tesseract-4.0.0/src/lstm/weightmatrix.cpp, line 249!int_mode_:Error:Assert failed:in file /cygdrive/d/cyg_pub/devel/tesseract/tesseract-ocr-4.0.0-1.x86_64/src/tesseract-4.0.0/src/lstm/weightmatrix.cpp, line 249!int_mode_:Error:Assert failed:in file /cygdrive/d/cyg_pub/devel/tesseract/tesseract-ocr-4.0.0-1.x86_64/src/tesseract-4.0.0/src/lstm/weightmatrix.cpp, line 249!int_mode_:Error:Assert failed:in file /cygdrive/d/cyg_pub/devel/tesseract/tesseract-ocr-4.0.0-1.x86_64/src/tesseract-4.0.0/src/lstm/weightmatrix.cpp, line 249
kh@DSAD-6 /usr/share/tessdata
$ combine_tessdata -e ./eng.traineddata ~/tesstutorial/engoutput/eng.lstm
Extracting tessdata components from ./eng.traineddata
Wrote /home/kh/tesstutorial/engoutput/eng.lstm
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/635612c6-2e9f-4034-9bad-f80eb044b298%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.