Recognizing known text (generating searchable PDF)

67 views

Skip to first unread message

Erik Jensen

unread,

Apr 1, 2015, 2:45:50 AM4/1/15

to tesser...@googlegroups.com

I'm trying to generate a searchable and copyable PDF from a series of images. Using Tesseract works pretty well, but still results in a number of errors on each page. However, I already have a copy of the text that appears on each page, so all I really need is to find the location of each of the known glyphs on the page so I can put the overlay text in the correct location. Is there a way to use the known text to guide Tesseract's recognition to accomplish this?

Thanks.

Reply all

Reply to author

Forward

0 new messages