Proper identify Symbols

26 views
Skip to first unread message

Felipe Vegini

unread,
Jun 10, 2020, 11:42:21 PM6/10/20
to tesseract-ocr
Hello Guys, I'm making an experiment with pytesseract and tesseocr to read some files receives in my company mailbox.

One problem i`m finding is with symbols. This particular file has some "borders" made with "*"
But the tesseract recognizes it only as a sequence of "r", "k" and"e" , like the one attached he translate as: "KRREKKKKKKK Shipping Instructions KREKKEKKKKKE".


Is there some configuration that I may insert informing that my text may have symbols in it?
Or at least ignore them instead of try to fit them into a character.
Image01.jpg
Reply all
Reply to author
Forward
0 new messages