Re: [PDFNet] Full Text Search

180 views

Skip to first unread message

James Borthwick

unread,

Nov 1, 2012, 9:19:19 PM11/1/12

to PDFTron PDFNet SDK on behalf of Thomas

PDFNet includes a sophisticated text extraction engine which could be used to create an index of the text found in a set of PDFs. For detailed information please see the documentation for the TextExtractor class, available online here: http://www.pdftron.com/pdfnet/documentation.html, and the sample project TextExtract, found online here: http://www.pdftron.com/pdfnet/samplecode.html#TextExtract

Support

unread,

Nov 1, 2012, 9:29:37 PM11/1/12

to pdfne...@googlegroups.com

There is also

http://www.pdftron.com/pdfnet/samplecode.html#TextSearch

class / sample.

With TextExtractor you can pass extracted text to Lucene for indexing.

If you need to highlight text, you can index text based on a page (say with help of Lucene).

Then run a quick page specific with help of TextSearch, this will give you bbox positioning for each match and you can also save hit results using XML highlight format (pdftron.PDF.Highlights.Save(...)). PDFViewCtrl can load the selection from the file etc.

Reply all

Reply to author

Forward

0 new messages