Fine Tuning for ± a few characters in jpn and sim_chi

32 views
Skip to first unread message

Hoang Vu

unread,
Sep 2, 2017, 3:29:49 PM9/2/17
to tesseract-ocr
As the wiki say :

New feature It is possible to add a few new characters to the character set and train for them by fine tuning, without a large amount of training data.


I'm trying to add some symbol to jpn.traineddata  by using fine tuning a few characters but i'm so wondering how many  training text line is good for japanese?
In english eng.training_text file have only 70 lines of text but in japanese this have 1670 lines of text. And i think if i add all of 1670 lines to training text i maybe got overfiting.
Must i have put all of this to  traning text or a few of lines?

Thanks for read!
Sorry for my bad english !
Reply all
Reply to author
Forward
0 new messages