Inconsistencies (sometimes) on similar characters. Is there a map for characters that are very similar?
45 views
Skip to first unread message
Deborah
unread,
Jun 16, 2024, 2:41:27 AM (14 days ago) Jun 16
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to tesseract-ocr
Hello, I am using Tesseract to extract some data from screenshots. I've noticed that sometimes there are mistakes in interpreting characters like '0' and 'O', 'P' and 'R' or '-' and '—' or the other way around. This happen with the same font. And it happens sometimes even with some preprocessing, like binarization. Is there a comprehensive map of all characters that are usually mistakenly recognised that are very similar? I need that map in order to calculate effective string distance with Levenshtein and adjust the cost for characters that are very similar. Thanks.
John Roxton
unread,
Jun 17, 2024, 7:53:52 AM (12 days ago) Jun 17
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to tesseract-ocr
Hello Deborah, Hopefully this isn't off-topic, and I don't mean to derail your thread, but I just wanted to chime in that I am having some very similar difficulties and considerations in the hopes that it will generate enough interest to yield an effective solution.