Foreign language characters should be in training data or not

26 views
Skip to first unread message

nijin...@gmail.com

unread,
Sep 20, 2020, 5:55:19 AM9/20/20
to tesseract-ocr
Hello.. Currently I have a lot of news domain data to train in tesseract for non-english language. But what I'd like to know is that in my news data, there are many english words and should I remove or add these english words  to get the better accuracy.  ( What I learned is that in tesseract, there are already english trained model and to predict the english words, we can use -l eng+(my_language)).
Reply all
Reply to author
Forward
0 new messages