Generate lstm train file from existing Tif and Box for tesseract 5, MacOS

44 views
Skip to first unread message

minh...@gmail.com

unread,
Aug 9, 2020, 5:35:57 AM8/9/20
to tesseract-ocr
Dear friends,

I want to train tesseract lstm for some scan documents.
Since the scan files are not so good, I have tried to make their corresponding box with jTessBoxEditor, the boxes and the characters were not so good recognized and need to correct manually.
After few days, now I have 3 files: 
vie.timesnewromani.exp99.tif, 
vie.timesnewromani.exp99.box 

Now, I need to convert them into lstm for training, I have modified the tesstrain.sh

mkdir -p ${TRAINING_DIR}
tlog "\n=== Starting training for language '${LANG_CODE}'"

cp  ~/tesstutorial/langdata/${LANG_CODE}/*.box ${TRAINING_DIR}
cp  ~/tesstutorial/langdata/${LANG_CODE}/*.tif ${TRAINING_DIR}

source "$(dirname $0)/language-specific.sh"
set_lang_specific_parameters ${LANG_CODE}

I did copy all three files to langdata/vie/

but it seems that the files were not copied to the tmp train folder:

Please give me some advices, 

Many thanks,

TuPM




Reply all
Reply to author
Forward
0 new messages