I need to train the default eng data, so that it can also recognize new characters. I created box files and lstm files and when running
cmd:
lstmtraining \ --model_output output/eng_latin \
--continue_from "/c/Program Files/Tesseract-OCR/ tessdata/eng.lstm" \
--append_index 5 \
--net_spec "[Lfx192 O1c129]" \
--traineddata "/c/Program Files/Tesseract-OCR/tessdata/eng.traineddata" \
--train_listfile training/training_files.txt \
--max_iterations 400
getting error
Loaded file C:/Program Files/Tesseract-OCR/tessdata/eng.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Continuing from C:/Program Files/Tesseract-OCR/tessdata/eng.lstm
Appending a new network to an old one!!Warning: given outputs 129 not equal to unicharset of 111.
Num outputs,weights in Series:
Lfx192:192, 221952
Fc111:111, 21423
Total weights = 243375
Built network:[1,36,0,1[C3,3Ft16]Mp3,3TxyLfys64Lfx96RxLrx96Lfx192Fc111] from request [Lfx192 O1c129]
Training parameters:
Debug interval = 0, weights = 0.1, learning rate = 0.001, momentum=0.5
null char=110
Deserialize header failed: 1.lstmf
Deserialize header failed: 2.lstmf
Deserialize header failed: 3.lstmf
Deserialize header failed: 4.lstmf
Deserialize header failed: 5.lstmf