Extracting text from page

49 views
Skip to first unread message

Danny Venier

unread,
Feb 2, 2015, 11:55:16 AM2/2/15
to pdfnet-w...@googlegroups.com
With webviewer, I am looking to grab a snippet of text from a page of the document.  I've seen java examples with a TextExtractor class that can set a begin point and grab a substring of text.  Is there any such facility in webviewer that can provide that function?

Danny Venier

unread,
Feb 2, 2015, 1:56:34 PM2/2/15
to pdfnet-w...@googlegroups.com
Update: 

I figured out a way to get the text (albeit wasteful) using the document object.

myDoc.LoadPageText(pageNum, callback(pageText) {
...});

It grabs all the text on the page and then I can substring as desired.

Matt Parizeau

unread,
Feb 2, 2015, 7:39:31 PM2/2/15
to pdfnet-w...@googlegroups.com
Hi Danny,

Using the LoadPageText function would be the way to go. Note that behind the scenes the text for the page will be cached so subsequent requests shouldn't be too wasteful.

Matt Parizeau
Software Developer
PDFTron Systems Inc.
Reply all
Reply to author
Forward
0 new messages