For language tir (which has over 350 characters) only 272 are included in the existing lstm tir.traineddata. I have a file with all the missing charset included and I have a training text. I want to recreate tir.traineddata but I could not find the exact commands and parameters used to make it.
Basically, how to compile
https://github.com/tesseract-ocr/langdata_lstm/tree/master/tir so I can get the same output as
https://github.com/tesseract-ocr/tessdata_best/blob/master/tir.traineddata
But the final result is not that good. for example, I used --max_iterations 50000 and net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c352] but this parameters are copied from the eng example and may not be good fit for 'tir'. I would appreciate it if someone could tell me what commands are used to build tir.traineddata in tessdata_best.
I know I could use fine-tune or adding the missing chars instead of building from scratch, but I have more things to modify (like adding wordlist, and other improvements, fonts) which will improve the quality of 'tir' a lot. This language is not that big and it should not be a big task as rebuilding 'eng'.
Thanks,
Biniam