building tir.traineddata from scratch

50 views
Skip to first unread message

Biniam

unread,
Aug 4, 2020, 11:19:47 PM8/4/20
to tesseract-ocr
For language tir (which has over 350 characters) only 272 are included in the existing lstm tir.traineddata. I have a file with all the missing charset included and I have a training text. I want to recreate tir.traineddata but I could not find the exact commands and parameters used to make it. 

Basically, how to compile https://github.com/tesseract-ocr/langdata_lstm/tree/master/tir so I can get the same output as https://github.com/tesseract-ocr/tessdata_best/blob/master/tir.traineddata 


But the final result is not that good. for example, I used  --max_iterations 50000 and    net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c352] but this parameters are copied from the eng example and may not be good fit for 'tir'. I would appreciate it if someone could tell me what commands are used to build tir.traineddata in  tessdata_best. 

I know I could use fine-tune or adding the missing chars instead of building from scratch, but I have more things to modify (like adding wordlist, and other improvements, fonts) which will improve the quality of 'tir' a lot. This language is not that big and it should not be a big task as rebuilding 'eng'.

Thanks,
Biniam


Shree Devi Kumar

unread,
Aug 5, 2020, 12:10:10 AM8/5/20
to tesseract-ocr


Version string:4.00.00alpha:tir:synth20170629 LSTM training info:Network str:[1,36,0,1Ct3,3,16Mp3,3Lfys48Lfx96Lrx96Lfx128O1c1], flags=41, iteration=10498000, sample_iteration=10498000, null_char=267, learning_rate=0.001, momentum=0.5, adam_beta=0.999


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1b43703e-2816-40f0-8a23-41b2ed10c4eao%40googlegroups.com.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
Reply all
Reply to author
Forward
0 new messages