Recognizing known text (generating searchable PDF)

67 views
Skip to first unread message

Erik Jensen

unread,
Apr 1, 2015, 2:45:50 AM4/1/15
to tesser...@googlegroups.com
I'm trying to generate a searchable and copyable PDF from a series of images. Using Tesseract works pretty well, but still results in a number of errors on each page. However, I already have a copy of the text that appears on each page, so all I really need is to find the location of each of the known glyphs on the page so I can put the overlay text in the correct location. Is there a way to use the known text to guide Tesseract's recognition to accomplish this?

Thanks.
Reply all
Reply to author
Forward
0 new messages