Generate lstm train file from existing Tif and Box for tesseract 5, MacOS

44 views

Skip to first unread message

minh...@gmail.com

unread,

Aug 9, 2020, 5:35:57 AM8/9/20

to tesseract-ocr

Dear friends,

I want to train tesseract lstm for some scan documents.

Since the scan files are not so good, I have tried to make their corresponding box with jTessBoxEditor, the boxes and the characters were not so good recognized and need to correct manually.

After few days, now I have 3 files:

vie.timesnewromani.exp99.tif,

vie.timesnewromani.exp99.box

vie.timesnewromani.exp99.tr

Now, I need to convert them into lstm for training, I have modified the tesstrain.sh

mkdir -p ${TRAINING_DIR}

tlog "\n=== Starting training for language '${LANG_CODE}'"

cp ~/tesstutorial/langdata/${LANG_CODE}/*.box ${TRAINING_DIR}

cp ~/tesstutorial/langdata/${LANG_CODE}/*.tif ${TRAINING_DIR}

source "$(dirname $0)/language-specific.sh"

set_lang_specific_parameters ${LANG_CODE}

I did copy all three files to langdata/vie/

but it seems that the files were not copied to the tmp train folder:

Please give me some advices,

Many thanks,

TuPM

Reply all

Reply to author

Forward

0 new messages