How to match coords from highlights retrieved from a text search and WebViewer

629 views
Skip to first unread message

Daniel Soriano Gil

unread,
Oct 3, 2013, 11:54:30 AM10/3/13
to pdfnet-w...@googlegroups.com
We are evaluating PDFTron components and we have found some doubts as this.
We are extracting coordinates from Highglights as TextSearchTest sample code with one pdf document, and we want to higlight some text in webviewer as an example provided in this forum.
If we use that coordinates, highlight is not places as expected. Which are the transformations we have to do in order to place highlight in the right position?

TextSearch in Java:

Highlights hlts = result.getHighlights();
hlts.begin(doc);
while ( hlts.hasNext() )
{

Page cur_page= doc.getPage(hlts.getCurrentPageNumber());
double[] q = hlts.getCurrentQuads();
int quad_count = q.length/8;
System.out.println("The current highlight is from page: " + hlts.getCurrentPageNumber());

for ( int i = 0; i < quad_count; ++i )
{
//assume each quad is an axis-aligned rectangle
int offset = 8*i;
double x1 = Math.min(Math.min(Math.min(q[offset+0], q[offset+2]), q[offset+4]), q[offset+6]);
double x2 = Math.max(Math.max(Math.max(q[offset+0], q[offset+2]), q[offset+4]), q[offset+6]);
double y1 = Math.min(Math.min(Math.min(q[offset+1], q[offset+3]), q[offset+5]), q[offset+7]);
double y2 = Math.max(Math.max(Math.max(q[offset+1], q[offset+3]), q[offset+5]), q[offset+7]);
System.out.println("Quads: x1:"+x1+", y1:"+y1+", x2:"+x2+", y2:"+y2+"" );
}

hlts.next();
}


Highlight code added in ReaderControl.js:


mySelectText:function(coords){

var topLeft = { x: coords.x1 , y: coords.y1, pageIndex: coords.pageNumber};
var bottomRight = { x: coords.x2, y: coords.y2, pageIndex: coords.pageNumber};

var annot = new Annotations.TextHighlightAnnotation();
annot.SetPageNumber(currentPageNumber);
annot.FillColor = new Annotations.Color(0, 255, 255);

var textHighlightTool = new this.docViewer.ToolModes.TextHighlightCreate(this.docViewer);
textHighlightTool.annotation = annot;
textHighlightTool.pageCoordinates[0] = topLeft;
textHighlightTool.pageCoordinates[1] = bottomRight;
textHighlightTool.select(topLeft, bottomRight);
},

Matt Parizeau

unread,
Oct 3, 2013, 3:29:31 PM10/3/13
to pdfnet-w...@googlegroups.com
Hi,

One thing to note about the sample code here https://groups.google.com/forum/#!topic/pdfnet-webviewer/4XvtnIemEws is that this selects the text by adding highlight annotations.  This means that annotations need to be enabled in WebViewer (add enableAnnotations: true option) or else the annotations won't be shown.  One thing I noticed that you're missing from the JavaScript code is the line am.AddAnnotation(annot); right before creating the highlight tool.  If this isn't there then the highlight annotations won't be shown.

One other thing to be careful about is that the pageIndex property of topLeft and bottomRight is the zero-indexed value whereas in annot.SetPageNumber this is the one-indexed value!  So if the page you want to highlight is the first page then pageIndex should be zero and the annotation page number should be 1.

Now if you wanted to use the text selection tool instead of using highlight annotations then you could have the code:
var textSelectTool = new docViewer.ToolModes.TextSelect(docViewer);
textSelectTool
.select(topLeft, bottomRight);

The thing to note about this is that there can only be one text selection at a time as it's simulating a person highlighting some text on the page.  So for example if you make one selection and then start to make another selection it will automatically clear the previous selection.

Matt Parizeau
Software Developer
PDFTron Systems Inc.

Daniel Soriano Gil

unread,
Oct 3, 2013, 4:14:55 PM10/3/13
to pdfnet-w...@googlegroups.com
We have had into account the things you are referring. We also have noticed that was not required to do am.AddAnnotation because text is highlighted. Either, we have highlighted text with that method, 4 highlights at the same time.
What I would like to do is to highlight all terms found in a page, retrieved from our search content engine and after then, search inside pdf page objects and obtain the word coordinates in order to highlight these found terms.
How could we match word coordinates taken from page objets searched in server and coordinates for text selection in web viewer?

Matt Parizeau

unread,
Oct 3, 2013, 6:50:45 PM10/3/13
to pdfnet-w...@googlegroups.com
PDF and XOD do not have the same coordinate system.
XOD is a subset of XPS, and thus follows the XPS coordinate system.

Good news is that in general you can convert between to the two coordinate systems easily.

PDF: 72 DPI, (0,0) is bottom left of the page
XPS: 96 DPI, (0,0) is top left of the page

e.g.
var factor = (96 / 72);
var xodX = factor * pdfX;
var xodY = this.docViewer.GetDocument().GetPageInfo(pageIndex).height - (factor * pdfY);

This will work in the majority of cases but for example if the pdf page is rotated I believe you'll have to use information from the pdf's page matrix.  The ability to get this is currently not exposed on the WebViewer side (though may be in the future), but you may be able to use the getDefaultPageMatrix function on a page object to get it on the server side and do your calculations there instead.

Matt Parizeau
Software Developer
PDFTron Systems Inc.

Daniel Soriano Gil

unread,
Oct 4, 2013, 12:49:08 AM10/4/13
to pdfnet-w...@googlegroups.com
Thanks, going to try.
Reply all
Reply to author
Forward
0 new messages