Help: Unable to Combine the Output Files - Training from Scratches

52 views
Skip to first unread message

minh...@gmail.com

unread,
Aug 18, 2020, 12:13:10 AM8/18/20
to tesseract-ocr
Dear all, 

I follows the manuals in wiki, but still get errors at the end. 
I work in Mac OS 10.15.6 Catalina
Tesseract 4.1.1
Lstmtraining 4.1.1

Here is my process:

# Create train data, language Viet for only Time New Roman FONT

PANGOCAIRO_BACKEND=fc \
~/tesseract/src/training/tesstrain.sh \
--fonts_dir /Library/Fonts \
--lang vie \
--linedata_only \
--noextract_font_properties \
--exposures "0" \
--langdata_dir ~/tesstutorial/langdata \
--tessdata_dir ~/tesstutorial/tesseract/tessdata \
--fontlist "Times New Roman" \
--training_text ~/tesstutorial/langdata/vie/vie.training_text \
--output_dir ~/tesstutorial/vietrain

in dir ~/tesstutorial/langdata: I put the best vie.traineddata, and vie.punc, vie.wordlist, vie.wordlist, vie.number (I don't know if it is necessary?)

# Create evaluation data, language Viet for only Time New Roman FONT
using other data

PANGOCAIRO_BACKEND=fc \
~/tesseract/src/training/tesstrain.sh \
--fonts_dir /Library/Fonts \
--lang vie \
--linedata_only \
--noextract_font_properties \
--exposures "0" \
--langdata_dir ~/tesstutorial/langdata \       (dir has best traineddata Sep 2017)
--tessdata_dir ~/tesstutorial/tesseract/tessdata \  
--fontlist "Times New Roman" \
--training_text ~/tesstutorial/langdata/vie_eval/vie.training_text \
--output_dir ~/tesstutorial/vieeval

# Then I continue training using lstmtraining

lstmtraining \
--debug_interval 100 \
--traineddata ~/tesstutorial/vietrain/vie/vie.traineddata \
--net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \
--model_output ~/tesstutorial/vieoutput/base \
--learning_rate 20e-4 \
--train_listfile ~/tesstutorial/vietrain/vie.training_files.txt \
--eval_listfile ~/tesstutorial/vieeval/vie.training_files.txt \
--max_iterations 100000 &>~/tesstutorial/vieoutput/basetrain.log

So far, there is no error, there are several base...checkpoint generated

# Last step, combine output

Do I have to provide best traineddata so that the final output traineddata will have all required components?

I get error  Must provide a --traineddata see training wiki 

Here are what I tried

lstmtraining --stop_training \
--continue_from ~/tesstutorial/vieoutput/base_checkpoint \
--traineddata ~/tesstutorial/vietrain/vie/vie.traineddata \        (produced at the first step)
--model_output ~/tesstutorial/vieoutput/vie.traineddata

or 

lstmtraining --stop_training \
--continue_from ~/tesstutorial/vieoutput/base_checkpoint \
--traineddata ~ /tesstutorial/vietrain/vie/vie.traineddata\        (produced at the first step)
--old_traineddata ~/tesstutorial/langdata/vie.traineddata  \    (dir has best traineddata Sep 2017)
--model_output ~/tesstutorial/vieoutput/vie.traineddata

I read carefully wiki, but there is not any solutions.
Please, anyone can point out what wrong with my process?
Is there anything missing?

Many thanks,

TuPM 
Reply all
Reply to author
Forward
0 new messages