Train Just a Few Layers

59 views

Skip to first unread message

Simon

unread,

Jan 9, 2024, 3:37:34 PM1/9/24

to tesseract-ocr

Hello everybody,

currently I am trying to train just a few layern of the eng_best.traineddata file. I already created 30,000 box gt.txt and .tif files for training specifically for my problem.

As I tried to follow the instructions for training tesseract 4 (https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html#training-just-a-few-layers) the following problems/questions occured:

1. I have to create lstmf files in order to execute

training/lstmtraining --debug_interval 100 \

--continue_from ~/tesstutorial/eng_from_chi/eng.lstm \

--traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \

--append_index 5 --net_spec '[Lfx256 O1c111]' \

--model_output ~/tesstutorial/eng_from_chi/base \

--train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \

--eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \

--max_iterations 3000 &>~/tesstutorial/eng_from_chi/basetrain.log

but how exactly do I create these lstmf files manually? I read they are created with tesstrain.sh but I dont find a proper description how.

I need the lstmf files for the --train_listfile and --eval_listfile parameter.

Is it also necessary to create an extra unicharset file for that like the workflow in the tesstrain (https://github.com/tesseract-ocr/tesstrain) repository? Or could I also use tesstrain repo for creating the lstmf files?

2. I also have to train the same Symbol twice. With different meanings. Its the same sign but once turned 90 degrees counter clockwise.

As an example assume it's "⊥" when this character is identified I want this output from my fully trained model: "⊥" but when the counter clockwise turned symbol is identified I want to get "turned⊥" as a string output back.

I really would appreciate any help. I'm at a dead end and can't find any information to help me.

Thanks in advance. If you have any questions about my problem I will provide you with any needed information.

Reply all

Reply to author

Forward

0 new messages