Hello everybody,
currently I am trying to train just a few layern of the eng_best.traineddata file. I already created 30,000 box gt.txt and .tif files for training specifically for my problem.
1. I have to create lstmf files in order to execute
training/lstmtraining --debug_interval 100 \
--continue_from ~/tesstutorial/eng_from_chi/eng.lstm \
--traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \
--append_index 5 --net_spec '[Lfx256 O1c111]' \
--model_output ~/tesstutorial/eng_from_chi/base \
--train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \
--eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \
--max_iterations 3000 &>~/tesstutorial/eng_from_chi/basetrain.log
but how exactly do I create these lstmf files manually? I read they are created with tesstrain.sh but I dont find a proper description how.
I need the lstmf files for the --train_listfile and --eval_listfile parameter.
Is it also necessary to create an extra unicharset file for that like the workflow in the tesstrain (
https://github.com/tesseract-ocr/tesstrain) repository? Or could I also use tesstrain repo for creating the lstmf files?
2. I also have to train the same Symbol twice. With different meanings. Its the same sign but once turned 90 degrees counter clockwise.
As an example assume it's "⊥" when this character is identified I want this output from my fully trained model:
"⊥" but when the counter clockwise turned symbol is identified I want to get "turned⊥" as a string output back.
I really would appreciate any help. I'm at a dead end and can't find any information to help me.
Thanks in advance. If you have any questions about my problem I will provide you with any needed information.