Help: Unable to Combine the Output Files - Training from Scratches

52 views

Skip to first unread message

minh...@gmail.com

unread,

Aug 18, 2020, 12:13:10 AM8/18/20

to tesseract-ocr

Dear all,

I follows the manuals in wiki, but still get errors at the end.

I work in Mac OS 10.15.6 Catalina

Tesseract 4.1.1

Lstmtraining 4.1.1

Here is my process:

# Create train data, language Viet for only Time New Roman FONT

PANGOCAIRO_BACKEND=fc \
~/tesseract/src/training/tesstrain.sh \
--fonts_dir /Library/Fonts \
--lang vie \
--linedata_only \
--noextract_font_properties \
--exposures "0" \
--langdata_dir ~/tesstutorial/langdata \
--tessdata_dir ~/tesstutorial/tesseract/tessdata \
--fontlist "Times New Roman" \
--training_text ~/tesstutorial/langdata/vie/vie.training_text \
--output_dir ~/tesstutorial/vietrain

in dir ~/tesstutorial/langdata: I put the best vie.traineddata, and vie.punc, vie.wordlist, vie.wordlist, vie.number (I don't know if it is necessary?)

# Create evaluation data, language Viet for only Time New Roman FONT

using other data

PANGOCAIRO_BACKEND=fc \
~/tesseract/src/training/tesstrain.sh \
--fonts_dir /Library/Fonts \
--lang vie \
--linedata_only \
--noextract_font_properties \
--exposures "0" \
--langdata_dir ~/tesstutorial/langdata \ (dir has best traineddata Sep 2017)
--tessdata_dir ~/tesstutorial/tesseract/tessdata \
--fontlist "Times New Roman" \
--training_text ~/tesstutorial/langdata/vie_eval/vie.training_text \
--output_dir ~/tesstutorial/vieeval

# Then I continue training using lstmtraining

lstmtraining \
--debug_interval 100 \
--traineddata ~/tesstutorial/vietrain/vie/vie.traineddata \
--net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \
--model_output ~/tesstutorial/vieoutput/base \
--learning_rate 20e-4 \
--train_listfile ~/tesstutorial/vietrain/vie.training_files.txt \
--eval_listfile ~/tesstutorial/vieeval/vie.training_files.txt \
--max_iterations 100000 &>~/tesstutorial/vieoutput/basetrain.log

So far, there is no error, there are several base...checkpoint generated

# Last step, combine output

Do I have to provide best traineddata so that the final output traineddata will have all required components?

I get error Must provide a --traineddata see training wiki

Here are what I tried

lstmtraining --stop_training \
--continue_from ~/tesstutorial/vieoutput/base_checkpoint \
--traineddata ~/tesstutorial/vietrain/vie/vie.traineddata \ (produced at the first step)
--model_output ~/tesstutorial/vieoutput/vie.traineddata

or

lstmtraining --stop_training \
--continue_from ~/tesstutorial/vieoutput/base_checkpoint \
--traineddata ~ /tesstutorial/vietrain/vie/vie.traineddata\ (produced at the first step)
--old_traineddata ~/tesstutorial/langdata/vie.traineddata \ (dir has best traineddata Sep 2017)
--model_output ~/tesstutorial/vieoutput/vie.traineddata

I read carefully wiki, but there is not any solutions.

Please, anyone can point out what wrong with my process?

Is there anything missing?

Many thanks,

TuPM

Reply all

Reply to author

Forward

0 new messages