Recognizing known text (generating searchable PDF)
67 views
Skip to first unread message
Erik Jensen
unread,
Apr 1, 2015, 2:45:50 AM4/1/15
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to tesser...@googlegroups.com
I'm trying to generate a searchable and copyable PDF from a series of images. Using Tesseract works pretty well, but still results in a number of errors on each page. However, I already have a copy of the text that appears on each page, so all I really need is to find the location of each of the known glyphs on the page so I can put the overlay text in the correct location. Is there a way to use the known text to guide Tesseract's recognition to accomplish this?