Re: [PDFNet] Full Text Search

180 views
Skip to first unread message

James Borthwick

unread,
Nov 1, 2012, 9:19:19 PM11/1/12
to PDFTron PDFNet SDK on behalf of Thomas
PDFNet includes a sophisticated text extraction engine which could be used to create an index of the text found in a set of PDFs. For detailed information please see the documentation for the TextExtractor class, available online here: http://www.pdftron.com/pdfnet/documentation.html, and the sample project TextExtract, found online here: http://www.pdftron.com/pdfnet/samplecode.html#TextExtract

Support

unread,
Nov 1, 2012, 9:29:37 PM11/1/12
to pdfne...@googlegroups.com
There is also
 
class / sample.
 
With TextExtractor you can pass extracted text to Lucene for indexing.
 
If you need to highlight text, you can index text based on a page (say with help of Lucene).
Then run a quick page specific with help of TextSearch, this will give you bbox positioning for each match and you can also save hit results using XML highlight format (pdftron.PDF.Highlights.Save(...)).  PDFViewCtrl can load the selection from the file etc.
Reply all
Reply to author
Forward
0 new messages