FPDFLink_GetAnnotRect() - rectangle is sometimes off

133 views
Skip to first unread message

Andreas Falkenhahn

unread,
Jul 14, 2019, 11:16:48 AM7/14/19
to pdfium
I use FPDFLink_GetAnnotRect() to get the bounding rectangle of a link in a PDF page. For most PDFs this works just fine. For some PDFs, however, the rectangles returned by FPDFLink_GetAnnotRect() are quite a bit off. Here is a screenshot:

http://www.falkenhahn.com/tmp/screenshot_pdf.png

The red boxes mark the rectangles returned by FPDFLink_GetAnnotRect(). As you can see, they are shifted towards the top-right quite a bit. What could be the reason for this?

--
Best regards,
Andreas Falkenhahn mailto:and...@falkenhahn.com

Manoj Biswas

unread,
Jul 15, 2019, 3:07:22 AM7/15/19
to pdfium
This is most likely due to a bug in the authoring tool that generated the PDF. FPDFLink_GetAnnotRect() just returns values from /Rect entry in the link object definition. Below is an example of how a link annotation object is defined.

18 0 obj <<
 
/Type /Annot
 
/Subtype /Link
 
/BS <<
   
/W 0
  >>
  /Rect [83 440 178 453]
  /QuadPoints [83 453 178 453 83 440 178 440]
 
/A <<
   
/Type /Action
   
/URI (https://cs.chromium.org/chromium/src/third_party/pdfium/public/fpdf_text.h)
   
/S /URI
 
>>
 
/F 4
>>
endobj


FPDFLink_GetAnnotRect() would construct a rect from the green highlighted entry. Please note that that PDF authoring tool needs to ensure that these numbers represent the intended hit-test area.

I take this example also to point out that /QuadPoints are another set of 8 numbers. The authoring tool needs to ensure these points are consistent with /Rect. Incorrectly authored PDF may very well contain inconsistencies like this.

Andreas Falkenhahn

unread,
Jul 15, 2019, 6:47:25 AM7/15/19
to pdf...@googlegroups.com
On 15.07.2019 at 09:07 'Manoj Biswas' via pdfium wrote:

> This is most likely due to a bug in the authoring tool that generated the PDF. FPDFLink_GetAnnotRect() just returns values from /Rect entry in the link object definition. Below is an example of how a link annotation object is defined.

It's not a bug in the PDF authoring tool because the links are alright when viewing the PDF with Acrobat Reader or Chrome.

I've now examined the PDF source code and indeed, as you said, what I get from FPDFLink_GetAnnotRect() is exactly what is defined in the source. This is the source code for the first link in this screenshot: http://www.falkenhahn.com/tmp/screenshot_pdf.png (the red box around the text "{30}")

224 0 obj
<<
/A 225 0 R
/BS << /S /S /Type /Border /W 0 >>
/Border [ 0 0 0 ]
/H /I
/Rect [ 495.041 321.892 581.105 382.066 ]
/Subtype /Link
/Type /Annot

endobj

And the values I get from FPDFLink_GetAnnotRect() are: 495.041015625 321.89199829102 581.10498046875 382.06600952148 --> so this matches!

BUT, since the link is NOT off when viewing the PDF with Acrobat Reader or Chrome there must be something else. Could it be that the PDF uses some transformation matrix or some other settings that shift coordinates and that I have to apply to the values I get from FPDFLink_GetAnnotRect() in order to get the correct position for the link bounding box?

Gourab Kundu

unread,
Jul 15, 2019, 12:53:17 PM7/15/19
to pdfium
There could be difference since the values mentioned in the /Rect are in PDF co-ordinate system which get transformed to the screen co-ordinates when rendered.
You can look up the code in PDFiumPage::PageToScreen(...) which calls into FPDF_PageToDevice() when calculating the link rects in view for hit testing.

One of the other things which could apply a transform is XObjects. These can basically contain any graphics object and can mention their own transformation matrix, which would be applied to the objects it contains when rendered.
This could be transforming the position of the links while you get the link rects value same as that mentioned in the dictionary, since the values inside it are always w.r.t to the parent container, which could be a XObject in this case.

Andreas Falkenhahn

unread,
Jul 15, 2019, 4:17:20 PM7/15/19
to pdf...@googlegroups.com
On 15.07.2019 at 15:39 'Gourab Kundu' via pdfium wrote:

> There could be difference since the values mentioned in the /Rect are in PDF co-ordinate system which get transformed to the screen co-ordinates when rendered.

But then the values should be off for all PDFs, shouldn't they? But that's not the case. For most PDFs the values returned by FPDFLink_GetAnnotRect() are perfectly fine. It's just in this single PDF that they are off, but it's correct in Chrome and Acrobat Reader so I must be doing something wrong.

> One of the other things which could apply a transform is XObjects.

The page in question indeed starts with this code:

/Type /Page
/CropBox [ 29.736 29.736 624.547 871.162 ]
/MediaBox [ 0 0 654.283 900.898 ]
/Rotate 0
/Resources << /ExtGState << /GS0 2724 0 R /GS1 203 0 R /GS2 2774 0 R >>
/XObject << /X40 209 0 R /X41 210 0 R /X42 212 0 R /X43 2232 0 R >> /Font <<
/T1_0 2196 0 R /TT0 2166 0 R /TT1 2168 0 R /TT2 213 0 R /TT3 2193 0 R /TT4
2170 0 R /T1_1 2189 0 R /TT5 2174 0 R /C2_0 217 0 R /TT6 2176 0 R /T1_2
2178 0 R /TT7 2199 0 R /T1_3 2203 0 R /TT8 2182 0 R >> /ColorSpace << /CS0
2834 0 R >> >>
/Contents 223 0 R
/Annots [ 224 0 R 226 0 R 228 0 R 230 0 R 232 0 R 234 0 R 236 0 R ]
/ArtBox [ 29.736 29.736 624.547 871.162 ]
/BleedBox [ 29.736 29.736 624.547 871.162 ]
/Group 238 0 R
/StructParents 4
/TrimBox [ 29.736 29.736 624.547 871.162 ]
/Parent 2666 0 R

As you can see, there is some sort of XObject definition here. Could this be the reason for the behaviour I'm seeing?

Manoj Biswas

unread,
Jul 16, 2019, 5:58:54 AM7/16/19
to pdfium
In this case, the XObject seems to be unrelated to these links. These links (/Annots [ 224 0 R..) are parented to the page object. I noticed that there's a CropBox specified in this PDF (/CropBox [ 29.736 29.736 624.547 871.162 ]). FPDF_PageToDevice() eventually calls CPDF_Page::GetDisplayMatrix() which pre-multiplies CPDF_Page::m_PageMatrix. This member variable's translation components (e and f) are initialized in CPDF_Page::UpdateDimensions() with page's CropBox's top and left (PDF uses up-right coordinate system). This seems to be the reason for the offset (29.736, 29.736) that you are observing.

Andreas Falkenhahn

unread,
Jul 16, 2019, 6:24:03 AM7/16/19
to pdf...@googlegroups.com
Thanks, that makes sense. So does this mean that I have to use FPDFPage_GetCropBox() to get the page's crop box and then translate the coordinates I get from FPDFLink_GetAnnotRect() by those crop box offsets?

Manoj Biswas

unread,
Jul 16, 2019, 6:49:13 AM7/16/19
to pdfium
My recommendation would be to use FPDF_PageToDevice() to 'normalize' the link's bounding box as opposed to just applying crop-box offset. This API takes account of page rotation, MediaBox, CropBox and so on. This way, you are more likely to be in sync with chromium behavior.

Andreas Falkenhahn

unread,
Jul 16, 2019, 8:32:04 AM7/16/19
to pdf...@googlegroups.com
Thanks, but unfortunately, it doesn't work. All that calling FPDF_PageToDevice() on the coordinates I get from FPDFLink_GetAnnotRect() does is that it rounds the coordinates I pass in so what I'm getting back from FPDF_PageToDevice() is essentially the same as I passed in. Looking at the PDFium source code I can also see that FPDF_PageToDevice() doesn't seem to take the crop box into account at all or am I missing something here?

Asger Jørgensen

unread,
Jul 16, 2019, 12:06:56 PM7/16/19
to pdfium
Hi Andreas


tirsdag den 16. juli 2019 kl. 14.32.04 UTC+2 skrev Andreas Falkenhahn:
Thanks, but unfortunately, it doesn't work. All that calling FPDF_PageToDevice() on the coordinates I get from FPDFLink_GetAnnotRect() does is that it rounds the coordinates I pass in so what I'm getting back from FPDF_PageToDevice() is essentially the same as I passed in. Looking at the PDFium source code I can also see that FPDF_PageToDevice() doesn't seem to take the crop box into account at all or am I missing something here?

That depend on what information you give the function ?

Especially the start_X, start_Y,  size_X and size_Y, but I guess you already know that judging 
by the rest of this discussion, but in case you haven't seen it lately the explanation to the function
is below.
To my newbee ears it sound like maybe you could set start_X and start_Y to the coordinates 
of the crop box.

On Windows this function also turns everything upside down, as windows have pixel 0 in the
upper left corner of the window canvas.

Best regards
Asger


FPDF_PageToDevice(TPdfPagePtr, int start_x, int start_y, int size_x, int size_y, int rotate, double page_x, double page_y, int &device_x, int &device_y);

// Function: FPDF_PageToDevice
//          Convert the page coordinates of a point to screen coordinates.
// Parameters:
//          page        -   Handle to the page. Returned by FPDF_LoadPage.
//          start_x     -   Left pixel position of the display area in
//                          device coordinates.
//          start_y     -   Top pixel position of the display area in device
//                          coordinates.
//          size_x      -   Horizontal size (in pixels) for displaying the page.
//          size_y      -   Vertical size (in pixels) for displaying the page.
//          rotate      -   Page orientation:
//                            0 (normal)
//                            1 (rotated 90 degrees clockwise)
//                            2 (rotated 180 degrees)
//                            3 (rotated 90 degrees counter-clockwise)
//          page_x      -   X value in page coordinates.
//          page_y      -   Y value in page coordinate.
//          device_x    -   A pointer to an integer receiving the result X
//                          value in device coordinates.
//          device_y    -   A pointer to an integer receiving the result Y
//                          value in device coordinates.
// Return value:
//          Returns true if the conversion succeeds, and |device_x| and
//          |device_y| successfully receives the converted coordinates.
//
// Comments:
//          The page coordinate system has its origin at the left-bottom corner
//          of the page, with the X-axis on the bottom going to the right, and
//          the Y-axis on the left side going up.
//
//          NOTE: this coordinate system can be altered when you zoom, scroll,
//          or rotate a page, however, a point on the page should always have
//          the same coordinate values in the page coordinate system.
//
//          The device coordinate system is device dependent. For screen device,
//          its origin is at the left-top corner of the window. However this
//          origin can be altered by the Windows coordinate transformation
//          utilities.
//
//          You must make sure the start_x, start_y, size_x, size_y
//          and rotate parameters have exactly same values as you used in
//          the FPDF_RenderPage() function call.

Manoj Biswas

unread,
Jul 17, 2019, 8:43:44 AM7/17/19
to pdfium
The following comment in TransformPDFPageForPrinting is also worth noting:
  // Reset the media box and crop box. When the page has crop box and media box,
  // the plugin will display the crop box contents and not the entire media box.
...
I played a little with /CropBox and /MediaBox entries with the attached simple PDF. Attached also the.in file in case you want to experiment. (Please refer to https://pdfium.googlesource.com/pdfium/+/master/README.md#files on how to generate PDF from .in file). This I think demystifies the offset you are seeing. Most likely, you're using /MediaBox for page bounds in the rendering code, as opposed to chromium's use of /CropBox.

Thanks,
Manoj
link_crop_box.pdf
link_crop_box.in

Andreas Falkenhahn

unread,
Jul 18, 2019, 3:56:53 PM7/18/19
to pdf...@googlegroups.com
Thanks a lot for the example! I finally figured it out and, to my embarassment, I have to admit that your previous suggestion of using FPDF_PageToDevice() works perfectly. The reason why I said it didn't work was simply because I was passing an invalid pointer to it :( After fixing that code I can now use FPDF_PageToDevice() to get the correct coordinates very conveniently. So the problem is solved now, thanks a lot for your help!

Lei Zhang

unread,
Jul 18, 2019, 9:33:00 PM7/18/19
to Andreas Falkenhahn, pdfium
BTW, if you passed an invalid pointer into FPDF_PageToDevice(), then it should have immediately returned false if the invalid pointer was a nullptr. In which case, please remember to check the return result from FPDF_PageToDevice().

--
You received this message because you are subscribed to the Google Groups "pdfium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pdfium+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pdfium/1993514372.20190718215728%40falkenhahn.com.
Reply all
Reply to author
Forward
0 new messages