How to detect only letters and numbers for a foreign language in tesseract [russian]

353 views
Skip to first unread message

Bhaarat Sharma

unread,
Jan 13, 2016, 6:53:45 AM1/13/16
to tesseract-ocr

I'm using tesseract for russian language. I've downloaded the russian training data and kept it in tessdata folder. So I can OCR the language now, however, I would like to limit the OCR to only letters and numbers. i.e. a-z1-9

for english language I did the below. Created file called digits placed it in tessdata/configs with the contents below

tessedit_char_whitelist abcdefghijklmnopqurstvwxyz0123456789

Question

How can I do something similar for russian?

Reply all
Reply to author
Forward
0 new messages