Hello,everyone:
I have focus the training eng + chi_sim for several days,but one urgent issue confused me. I have ask the questions before,but do not get good reply,so I ask the questions again. Sorry for disturbing you.
My steps is as follows:
src/training/tesstrain.sh --fonts_dir /usr/share/fonts --training_text ../training_data/chi_sim_tuned.txt \
--langdata_dir ../langdata --tessdata_dir ./tessdata --lang chi_sim --linedata_only --noextract_font_properties --exposures "0" \
--workspace_dir ./share/workspace/tmp \
--save_box_tiff \
--fontlist "NSimSun" \
"Times New Roman" \
"Arial Unicode MS" \
"SimSun" \
"Merchant Copy" \
"Merchant Copy Doublesize" \
"Noto Sans CJK SC" \
"Noto Sans Mono CJK SC" \
--output_dir ~/tesstutorial/chi_sim_train \
--overwrite
mkdir -p ~/tesstutorial/chi_sim_tuned_from_chi_sim
combine_tessdata -e ../tessdata_best/chi_sim.traineddata ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm
lstmtraining --model_output ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned \
--continue_from ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim.lstm \
--traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata \
--old_traineddata ../tessdata_best/chi_sim.traineddata \
--train_listfile ~/tesstutorial/chi_sim_train/chi_sim.training_files.txt \
--max_iterations 3000
lstmtraining --stop_training --continue_from ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned_checkpoint \
--traineddata ~/tesstutorial/chi_sim_train/chi_sim/chi_sim.traineddata --model_output ~/tesstutorial/chi_sim_tuned_from_chi_sim/chi_sim_tuned.traineddata
the train_text file is in the attachfile.
What confused me is that: the result contains some characters that do not in the train_text file.(only chi_sim character have the problem,eng is ok)。
Can anyone help me?Thanks a lot.
I also upload image and result file. Thanks in advance.
Thank you.