Hi,
I am trying to follow the TessTutorial to train tesseract from scratch. I have some questions regarding the lang data to understand how the training is working.
The provided training text has some random English words. The questions regarding the training text:
1- Is using text from some scope will improve the performance of tesseract on that scope? For example, training tesseract on special names or vocabs that are not English but has Latin letters and numbers (a-z A-Z 0-9 and special chars). Example: pH_scale1
2 - Is generating words from random letters will do the same as using English words?
The provided eng.trainingtext has text such as :
"different New Articles page 23 a To Service ~~ a details DC that don't as 7 «« Date:"
What if I use something random like this:
"sqwrLwU2bo
BLiRDhvAoM
USyWtpBFi5
UwLgXyoz1e
UqiXudhrhz
dDKAdnI8Z2
YIl6T6d7m6
G2IVtTRbuu
Lh6NvWNLc3
CGD2SXOoNT"
Thanks