Dear friends,
I want to train tesseract lstm for some scan documents.
Since the scan files are not so good, I have tried to make their corresponding box with jTessBoxEditor, the boxes and the characters were not so good recognized and need to correct manually.
After few days, now I have 3 files:
vie.timesnewromani.exp99.tif,
vie.timesnewromani.exp99.box
Now, I need to convert them into lstm for training, I have modified the tesstrain.sh
mkdir -p ${TRAINING_DIR}
tlog "\n=== Starting training for language '${LANG_CODE}'"
cp ~/tesstutorial/langdata/${LANG_CODE}/*.box ${TRAINING_DIR}
cp ~/tesstutorial/langdata/${LANG_CODE}/*.tif ${TRAINING_DIR}
source "$(dirname $0)/language-specific.sh"
set_lang_specific_parameters ${LANG_CODE}
I did copy all three files to langdata/vie/
but it seems that the files were not copied to the tmp train folder:
Please give me some advices,
Many thanks,
TuPM