'm trying to train tesseract to improve the detection of some prices such as: CN¥2,400.48. I got got to a point that I keep getting this error:
total=`cat data/all-lstmf | wc -l` \
no=`echo "$total * 0.90 / 1" | bc`; \
head -n "$no" data/all-lstmf > "data/list.train"
total=`cat data/all-lstmf | wc -l` \
no=`echo "($total - $total * 0.90) / 1" | bc`; \
tail -n "+$no" data/all-lstmf > "data/list.eval"
combine_lang_model \
--input_unicharset data/unicharset \
--script_dir /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master \
--words /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.wordlist \
--numbers /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.numbers \
--puncs /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.punc \
--output_dir data/ \
--lang eng
Loaded unicharset of size 113 from file data/unicharset
Setting unichar properties
Other case É of é is not in unicharset
Setting script properties
Config file is optional, continuing...
Failed to read data from: /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/langdata-master/eng/eng.config
Null char=2
Reducing Trie to SquishedDawg
Reducing Trie to SquishedDawg
Reducing Trie to SquishedDawg
mkdir -p data/checkpoints
lstmtraining \
--continue_from /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm \
--old_traineddata /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.traineddata \
--traineddata data/eng/eng.traineddata \
--model_output data/checkpoints/eng \
--debug_interval -1 \
--train_listfile data/list.train \
--eval_listfile data/list.eval \
--sequential_training \
--max_iterations 3000
Loaded file /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 111 to 112!
Num (Extended) outputs,weights in Series:
1,36,0,1:1, 0
Num (Extended) outputs,weights in Series:
C3,3:9, 0
Ft16:16, 160
Total weights = 160
[C3,3Ft16]:16, 160
Mp3,3:16, 0
Lfys64:64, 20736
Lfx96:96, 61824
Lrx96:96, 74112
Lfx512:512, 1247232
Fc112:112, 0
Total weights = 1404064
Previous null char=110 mapped to 111
Continuing from /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/tessdata/eng.lstm
Loaded 1/1 pages (1-1) of document /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/72b.lstmf
Loaded 1/1 pages (1-1) of document /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/67e.lstmf
Loaded 1/1 pages (1-1) of document /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/75c.lstmf
Loaded 1/1 pages (1-1) of document /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/48b.lstmf
Iteration 0: ALIGNED TRUTH : CN¥2,400.48
Iteration 0: BEST OCR TEXT : ₩₩₩N₩₩4₩0₩0₩4₩8
File /home/tulipan1637/Documents/Emiliano/OCR/OCRtraining/ocrd-train/data/train/72b.lstmf page 0 :
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
Makefile:111: recipe for target 'data/checkpoints/eng_checkpoint' failed
make: *** [data/checkpoints/eng_checkpoint] Segmentation fault (core dumped)
I already tried to download the best/tessdata eng.traineddata and replacing it in the continue_from but I haven't been able to pass this mistake. Any thoughts?