Foreign language characters should be in training data or not

26 views

Skip to first unread message

nijin...@gmail.com

unread,

Sep 20, 2020, 5:55:19 AM9/20/20

to tesseract-ocr

Hello.. Currently I have a lot of news domain data to train in tesseract for non-english language. But what I'd like to know is that in my news data, there are many english words and should I remove or add these english words to get the better accuracy. ( What I learned is that in tesseract, there are already english trained model and to predict the english words, we can use -l eng+(my_language)).

Reply all

Reply to author

Forward

0 new messages