Proper identify Symbols

26 views

Skip to first unread message

Felipe Vegini

unread,

Jun 10, 2020, 11:42:21 PM6/10/20

to tesseract-ocr

Hello Guys, I'm making an experiment with pytesseract and tesseocr to read some files receives in my company mailbox.

One problem i`m finding is with symbols. This particular file has some "borders" made with "*"

But the tesseract recognizes it only as a sequence of "r", "k" and"e" , like the one attached he translate as: "KRREKKKKKKK Shipping Instructions KREKKEKKKKKE".

Is there some configuration that I may insert informing that my text may have symbols in it?

Or at least ignore them instead of try to fit them into a character.

Image01.jpg

Reply all

Reply to author

Forward

0 new messages