Hello,everyone:
I want to recognize the characters in the table(You can see find it in the attach file).In the past, I only recognize the english letters,and the result is pretty good,but now I want to recognize
english letters plus Chinese characters. So I retrained the model. here is my command:
1)src/training/tesstrain.sh --fonts_dir /usr/share/fonts --training_text ../training_data/chi_sim_tuned.txt \
--langdata_dir ../langdata --tessdata_dir ./tessdata --lang chi_sim --linedata_only --noextract_font_properties --exposures "0" \
--fontlist "AR PL UKai CN" \
"AR PL UKai HK" \
"AR PL UKai TW" \
"AR PL UKai TW MBE" \
"AR PL UMing CN Light" \
"AR PL UMing HK Light" \
"AR PL UMing TW Light" \
"AR PL UMing TW MBE Light" \
"NSimSun" \
"Noto Sans CJK JP" \
"Noto Sans CJK JP Bold" \
"Noto Sans CJK JP Heavy" \
"Noto Sans CJK JP Light" \
"Noto Sans CJK JP Medium" \
"Noto Sans CJK JP Semi-Light" \
"Noto Sans CJK JP Ultra-Light" \
"Noto Sans CJK KR" \
"Noto Sans CJK KR Bold" \
"Noto Sans CJK KR Heavy" \
"Noto Sans CJK KR Light" \
"Noto Sans CJK KR Medium" \
"Noto Sans CJK KR Semi-Light" \
"Noto Sans CJK KR Ultra-Light" \
"Noto Sans CJK SC" \
"Noto Sans CJK SC Bold" \
"Noto Sans CJK SC Heavy" \
"Noto Sans CJK SC Light" \
"Noto Sans CJK SC Medium" \
"Noto Sans CJK SC Semi-Light" \
"Noto Sans CJK SC Ultra-Light" \
"Noto Sans CJK TC" \
"Noto Sans CJK TC Bold" \
"Noto Sans CJK TC Heavy" \
"Noto Sans CJK TC Light" \
"Noto Sans CJK TC Medium" \
"Noto Sans CJK TC Semi-Light" \
"Noto Sans CJK TC Ultra-Light" \
"Noto Sans Mono CJK JP" \
"Noto Sans Mono CJK JP Bold" \
"Noto Sans Mono CJK KR" \
"Noto Sans Mono CJK KR Bold" \
"Noto Sans Mono CJK SC" \
"Noto Sans Mono CJK SC Bold" \
"Noto Sans Mono CJK TC" \
"Noto Sans Mono CJK TC Bold" \
"SimSun" \
"WenQuanYi Zen Hei Medium" \
"WenQuanYi Zen Hei Mono Medium" \
--output_dir ~/tesstutorial/chi_sim_train
2)mkdir -p ~/tesstutorial/chi_sim_tuned_from_chi_sim
3)combine_tessdata -e ../tessdata_best/chi_sim.traineddata ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm
4)lstmtraining --model_output ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned \
--continue_from ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm \
--traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata \
--old_traineddata ../tessdata_best/chi_sim.traineddata \
--train_listfile ~/tesstutorial/chi_sim_train/chi_sim.training_files.txt \
--max_iterations 10000
5)lstmtraining --stop_training --continue_from ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned_checkpoint \
--traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata --model_output ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned.traineddata
The result is not good, most strange is that the result contains some Chinese characters that do not exist in the training_text file, I really can not understand,
can some one help me,thanks a lot.
The training_text file and the result are also in the attach file.
Sorry for my poor english.