Dear all,
I follows the manuals in wiki, but still get errors at the end.
I work in Mac OS 10.15.6 Catalina
Tesseract 4.1.1
Lstmtraining 4.1.1
Here is my process:
# Create train data, language Viet for only Time New Roman FONT
PANGOCAIRO_BACKEND=fc \
~/tesseract/src/training/tesstrain.sh \
--fonts_dir /Library/Fonts \
--lang vie \
--linedata_only \
--noextract_font_properties \
--exposures "0" \
--langdata_dir ~/tesstutorial/langdata \
--tessdata_dir ~/tesstutorial/tesseract/tessdata \
--fontlist "Times New Roman" \
--training_text ~/tesstutorial/langdata/vie/vie.training_text \
--output_dir ~/tesstutorial/vietrain
in dir ~/tesstutorial/langdata: I put the best vie.traineddata, and vie.punc, vie.wordlist, vie.wordlist, vie.number (I don't know if it is necessary?)
# Create evaluation data, language Viet for only Time New Roman FONT
using other data
PANGOCAIRO_BACKEND=fc \
~/tesseract/src/training/tesstrain.sh \
--fonts_dir /Library/Fonts \
--lang vie \
--linedata_only \
--noextract_font_properties \
--exposures "0" \
--langdata_dir ~/tesstutorial/langdata \ (dir has best traineddata Sep 2017)
--tessdata_dir ~/tesstutorial/tesseract/tessdata \
--fontlist "Times New Roman" \
--training_text ~/tesstutorial/langdata/vie_eval/vie.training_text \
--output_dir ~/tesstutorial/vieeval
# Then I continue training using lstmtraining
lstmtraining \
--debug_interval 100 \
--traineddata ~/tesstutorial/vietrain/vie/vie.traineddata \
--net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \
--model_output ~/tesstutorial/vieoutput/base \
--learning_rate 20e-4 \
--train_listfile ~/tesstutorial/vietrain/vie.training_files.txt \
--eval_listfile ~/tesstutorial/vieeval/vie.training_files.txt \
--max_iterations 100000 &>~/tesstutorial/vieoutput/basetrain.log
So far, there is no error, there are several base...checkpoint generated
# Last step, combine output
Do I have to provide best traineddata so that the final output traineddata will have all required components?
I get error Must provide a --traineddata see training wiki
Here are what I tried
lstmtraining --stop_training \
--continue_from ~/tesstutorial/vieoutput/base_checkpoint \
--traineddata ~/tesstutorial/vietrain/vie/vie.traineddata \ (produced at the first step)
--model_output ~/tesstutorial/vieoutput/vie.traineddata
or
lstmtraining --stop_training \
--continue_from ~/tesstutorial/vieoutput/base_checkpoint \
--traineddata ~ /tesstutorial/vietrain/vie/vie.traineddata\ (produced at the first step)
--old_traineddata ~/tesstutorial/langdata/vie.traineddata \ (dir has best traineddata Sep 2017)
--model_output ~/tesstutorial/vieoutput/vie.traineddata
I read carefully wiki, but there is not any solutions.
Please, anyone can point out what wrong with my process?
Is there anything missing?
Many thanks,
TuPM