Hello,everyone:
Now I want to recognize the character in the table,you can find the table sample in the attach file. It contains "0123456789-.LQX" only 15 different characters.
So, I think using fine tuning is a good way for recognition.
Here is my steps:
1. src/training/tesstrain.sh --fonts_dir /usr/share/fonts --training_text ../training_data/part.txt \
--langdata_dir ../langdata --tessdata_dir ./tessdata --lang eng --linedata_only --noextract_font_properties --output_dir ~/tesstutorial/engtest
part.txt is also in the attach file.
2. mkdir -p ~/tesstutorial/engtuned_from_eng
3. lstmtraining --model_output ~/tesstutorial/engtuned_from_eng/engtuned --continue_from ~/tesstutorial/engtuned_from_eng/eng.lstm \
--traineddata ../tessdata/eng.traineddata --train_listfile ~/tesstutorial/engtest/eng.training_files.txt --max_iterations 400
4. combine_tessdata -o ./tessdata/eng_new.traineddata \
~/tesstutorial/engtuned_from_eng/eng.lstm \
~/tesstutorial/engtest/eng.lstm-number-dawg \
~/tesstutorial/engtest/eng.lstm-punc-dawg \
~/tesstutorial/engtest/eng.lstm-word-dawg
But when I execute the 3rd step,there is a error.
Continuing from /home/yixin/tesstutorial/engtuned_from_eng/eng.lstm
Loaded 298/298 pages (1-298) of document /home/yixin/tesstutorial/engtest/eng.Arial_Bold.exp0.lstmf
Loaded 297/297 pages (1-297) of document /home/yixin/tesstutorial/engtest/eng.Century_Schoolbook_L_Medium.exp0.lstmf
Loaded 294/294 pages (1-294) of document /home/yixin/tesstutorial/engtest/eng.Arial.exp0.lstmf
Loaded 293/293 pages (1-293) of document /home/yixin/tesstutorial/engtest/eng.Courier_New_Bold.exp0.lstmf
Loaded 302/302 pages (1-302) of document /home/yixin/tesstutorial/engtest/eng.Century_Schoolbook_L_Bold_Italic.exp0.lstmf
Loaded 301/301 pages (1-301) of document /home/yixin/tesstutorial/engtest/eng.Arial_Italic.exp0.lstmf
Loaded 301/301 pages (1-301) of document /home/yixin/tesstutorial/engtest/eng.Century_Schoolbook_L_Bold.exp0.lstmf
Loaded 302/302 pages (1-302) of document /home/yixin/tesstutorial/engtest/eng.Century_Schoolbook_L_Italic.exp0.lstmf
Loaded 302/302 pages (1-302) of document /home/yixin/tesstutorial/engtest/eng.Arial_Bold_Italic.exp0.lstmf
Loaded 296/296 pages (1-296) of document /home/yixin/tesstutorial/engtest/eng.Courier_New_Bold_Italic.exp0.lstmf
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 249
Segmentation fault (core dumped)
This is the related code.
248 void WeightMatrix::MatrixDotVector(const int8_t* u, double* v) const {
249 assert(int_mode_);
250 if (IntSimdMatrix::intSimdMatrix) {
251 IntSimdMatrix::intSimdMatrix->matrixDotVectorFunction(
252 wi_.dim1(), wi_.dim2(), &shaped_w_[0], &scales_[0], u, v);
253 } else {
254 IntSimdMatrix::MatrixDotVector(wi_, scales_, u, v);
255 }
256 }
I am a new user of lstm training, is my method is okay for recognize only 15 different characters, or is there any good ideas to solve this problem and how to solve the assert error.
Thank you in advance.
Sorry for my poor English.