Public pdfium API to assert contents of PDF files?

307 views
Skip to first unread message

Miklos Vajna

unread,
Mar 18, 2017, 11:30:39 AM3/18/17
to pdf...@googlegroups.com, tse...@chromium.org, the...@chromium.org
Hi,

[ Tom / Lei, I'm just copying you as you are mentioned in the last
commit to public/fpdf_edit.h. ]

LibreOffice currently uses my simple PDF tokenizer when it comes to
automated testing of our PDF export code. It looks like this:

https://github.com/LibreOffice/core/blob/master/vcl/qa/cppunit/pdfexport/pdfexport.cxx

(That pdfio::PDFDocument is our tokenizer.)

If possible, I would like to get rid of this own PDF tokenizer (in the
long term) and use pdfium instead. Here is a pdfium port of a testcase:

https://github.com/vmiklos/vmexam/blob/master/pdfium/test.cxx

As far as I see I can't assert things like "the first object in the
XObject dictionary of the first page has a 'Ref' key" using the public
API. However, it looks reasonably straightforward to do this using the
internal API.

Needless to say, we want to use only the public API in LibreOffice. So
my questions:

1) In the above example, am I right that the current public API does not
provide such info?

2) Would it be welcome if I start pushing patches for review that add
public API to make this possible?

I guess the first step would be to expose CPDF_Object as an opaque type
and a function that wraps CPDF_Page::GetPageAttr().

I'm interested in contributing such patches, but I thought perhaps it's
a good idea to first ask here on this list if the idea in general is
welcome before I start working on this.

Thanks,

Miklos
signature.asc

Miklos Vajna

unread,
Mar 24, 2017, 4:07:29 AM3/24/17
to pdf...@googlegroups.com, tse...@chromium.org, the...@chromium.org, dsin...@chromium.org
Hi,

On Sat, Mar 18, 2017 at 04:30:34PM +0100, Miklos Vajna <vmi...@vmiklos.hu> wrote:
> I'm interested in contributing such patches, but I thought perhaps it's
> a good idea to first ask here on this list if the idea in general is
> welcome before I start working on this.

Any feedback on this, please? :-)

Thanks,

Miklos

Dan Sinclair

unread,
Mar 27, 2017, 9:12:24 AM3/27/17
to Thomas Sepez, Miklos Vajna, pdf...@googlegroups.com, Tom Sepez, Lei Zhang, Dan Sinclair
I'd have to agree with Tom that we don't want to expose the lower
level API. But, we do expose things like FPDFPage_CountObject and
FPDFPage_GetObject. I'm not really sure how XObject is stored
internally but could those methods be extended so we can return the
needed objects? Do you just need to know that it's in the object or do
you need to know that it's also a ref and other things?

Another option would be, instead of comparing the internals, can you
hash the mem document and compare that to a known good hash?

dan


On Fri, Mar 24, 2017 at 12:48 PM, Thomas Sepez <tse...@google.com> wrote:
> HI,
>
> I see dsinclair is CC'd on this who is the person in charge of the long-term
> direction.
>
> We're delighted that you'd like to use PDFium for your purposes. However ...
>
> 1. The public API does not expose these details, since they are at a level
> below which most "ordinary" embedders of PDFium would work.
>
> 2. I don't think the API exposed in public/ is the right place to add the
> entrypoints you propose. Wrapping all that code with C (not C++) style
> wrappers seems awkward, and we want to avoid making a commitment to support
> any particular internal details of these objects.
>
> For your purposes, you may be stuck with the internal API.

Miklos Vajna

unread,
Apr 1, 2017, 4:31:05 PM4/1/17
to Dan Sinclair, Thomas Sepez, pdf...@googlegroups.com, Tom Sepez, Lei Zhang
Hi Dan,

On Mon, Mar 27, 2017 at 09:11:43AM -0400, Dan Sinclair <dsin...@chromium.org> wrote:
> I'd have to agree with Tom that we don't want to expose the lower
> level API. But, we do expose things like FPDFPage_CountObject and
> FPDFPage_GetObject. I'm not really sure how XObject is stored
> internally but could those methods be extended so we can return the
> needed objects? Do you just need to know that it's in the object or do
> you need to know that it's also a ref and other things?

For that particular testcase the point was to know if the file uses the
"reference XObject" markup from the PDF spec. But I see the value of not
exposing such low-level details in the public API.

Here is an other example: the testTdf105461() at
<https://github.com/vmiklos/vmexam/blob/master/pdfium/test.cxx#L61>
almost manages to get away with the current public API. (Its goal is to
find out the number of yellow-filled paths on a given page.) Two missing
bits:

1) The type of the page object is not exposed. I think a
FPDFPageObject_GetType() could be added -- interestingly the
FPDF_PAGEOBJ_* defines in public/fpdf_edit.h already expose the possible
types.

2) The fill color of a page object is not exposed. I think a
FPDFPath_GetFillColor() could be added for this, as a pair of the
existing FPDFPath_SetFillColor().

Are patches to add these two functions welcome? If so, I'm happy to try
to work on this.

> Another option would be, instead of comparing the internals, can you
> hash the mem document and compare that to a known good hash?

Yes, that could work -- but when the hash doesn't match, it's not easy
to find out what broke; so if possible I would like to avoid that.

Thanks,

Miklos
signature.asc

Miklos Vajna

unread,
Apr 2, 2017, 7:45:32 AM4/2/17
to Thomas Sepez, pdf...@googlegroups.com, Lei Zhang, Tom Sepez, Dan Sinclair
Hi,

On Sat, Apr 01, 2017 at 03:49:57PM -0700, Thomas Sepez <tse...@google.com> wrote:
> I think the two proposals seem reasonable, as you note they compliment
> existing APIs.

Thanks -- first one is uploaded at
<https://pdfium-review.googlesource.com/c/3570/>. I thought I'm familiar
with gerrit, but I don't really see how I can add you as a reviewer
there. :-) Add reviewer -> typing tsepez -> Send doesn't do anything
visible.

> If you haven't contributed code before to chromium/PDFium, there's an
> agreement you'll need to complete before we can land your patches.
> dsinclair@ would have details.

That's already sorted out, I submitted some small fixes earlier.

Regards,

Miklos
signature.asc

Dan Sinclair

unread,
Apr 3, 2017, 9:33:24 AM4/3/17
to Miklos Vajna, Thomas Sepez, pdfium, Lei Zhang, Tom Sepez, Dan Sinclair
Awesome, thanks for the patch. For future reference, if you hit 'a' on
the pdfium-reviews site you can then add reviewers which will email us
that there is something to review.

dan
> --
> You received this message because you are subscribed to the Google Groups "pdfium" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pdfium+un...@googlegroups.com.
> To post to this group, send email to pdf...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pdfium/20170402114527.GA25178%40vmiklos.hu.
> For more options, visit https://groups.google.com/d/optout.

Miklos Vajna

unread,
Apr 4, 2017, 12:11:26 PM4/4/17
to Dan Sinclair, Thomas Sepez, pdfium, Lei Zhang, Tom Sepez
Hi Dan,

On Mon, Apr 03, 2017 at 09:32:42AM -0400, Dan Sinclair <dsin...@chromium.org> wrote:
> Awesome, thanks for the patch. For future reference, if you hit 'a' on
> the pdfium-reviews site you can then add reviewers which will email us
> that there is something to review.

Right, that's what I expected -- but PolyGerrit simply ignores my
request to add somebody as reviewer; if I go back to the old UI, then I
get this error message:

"dsinclair <dsin...@chromium.org> does not identify a registered user
or group"

(I tried various combinations, but this is what autocomplete suggests.)
I wonder if perhaps simply the gerrit instance is misconfigured and
non-committers are only allowed to comment, but not allowed to add
reviewers. :-)

In any case, the second patch is up for review at
<https://pdfium-review.googlesource.com/3690>, so far without reviewers.

Thanks,

Miklos
signature.asc

Dan Sinclair

unread,
Apr 4, 2017, 12:40:01 PM4/4/17
to Miklos Vajna, Dan Sinclair, Thomas Sepez, pdfium, Lei Zhang, Tom Sepez, Aaron Gable
+agable for issue adding reviewers.

dan

Aaron Gable

unread,
Apr 4, 2017, 12:56:44 PM4/4/17
to Dan Sinclair, Miklos Vajna, Thomas Sepez, pdfium, Lei Zhang, Tom Sepez
That's super weird. Anyone should be able to add anyone to the reviewers list, and it's clear that Dan was able to add himself.

If you can create a new CL and reproduce this behavior (Dan, please don't add yourself on that one), then I'll take a look there. If you can't reproduce on a different CL I'm not sure what I can do.

Miklos Vajna

unread,
Apr 4, 2017, 4:30:38 PM4/4/17
to Aaron Gable, Dan Sinclair, Thomas Sepez, pdfium, Lei Zhang, Tom Sepez
Hi Aaron,

On Tue, Apr 04, 2017 at 04:56:32PM +0000, Aaron Gable <aga...@google.com> wrote:
> That's super weird. Anyone should be able to add anyone to the reviewers
> list, and it's clear that Dan was able to add himself.

Right, though it seems I can't do the same. (Note that I don't have a
gmail.com address but a G Suite one.)

> If you can create a new CL and reproduce this behavior (Dan, please don't
> add yourself on that one), then I'll take a look there. If you can't
> reproduce on a different CL I'm not sure what I can do.

https://pdfium-review.googlesource.com/c/3730/ is a test change I just
created. Just to be sure, I did the followings:

1) git cl upload, it gives this warning:

[W2017-04-04 22:22:09,832 6626 140656247183104 gerrit_util.py] Note: "pdfium-...@googlegroups.com" not added as a cc

But apart from that the change is created.

2) When I click the "Add reviewer" link (tried both Chromium and
Firefox), and I type "vmi...@vmiklos.hu" in the reviewers field and add
some comment, then press the Send button -> the comment is posted, but
I'm not added to the reviewers list.

No error message, the reviewer list is just not updated.

3) When disabling PolyGerrit, I appear as a reviewer. I can even remove
myself.

4) When I try to add a real reviewer like "dsin...@chromium.org", I
get this error:

"dsin...@chromium.org does not identify a registered user or group"

If you need any more info, please let me know. :-)

Thanks,

Miklos
signature.asc

Aaron Gable

unread,
Apr 4, 2017, 5:14:20 PM4/4/17
to Miklos Vajna, Dan Sinclair, Thomas Sepez, pdfium, Lei Zhang, Tom Sepez
On Tue, Apr 4, 2017 at 1:30 PM Miklos Vajna <vmi...@vmiklos.hu> wrote:
Hi Aaron,

On Tue, Apr 04, 2017 at 04:56:32PM +0000, Aaron Gable <aga...@google.com> wrote:
> That's super weird. Anyone should be able to add anyone to the reviewers
> list, and it's clear that Dan was able to add himself.

Right, though it seems I can't do the same. (Note that I don't have a
gmail.com address but a G Suite one.)

> If you can create a new CL and reproduce this behavior (Dan, please don't
> add yourself on that one), then I'll take a look there. If you can't
> reproduce on a different CL I'm not sure what I can do.

https://pdfium-review.googlesource.com/c/3730/ is a test change I just
created. Just to be sure, I did the followings:

1) git cl upload, it gives this warning:

[W2017-04-04 22:22:09,832 6626 140656247183104 gerrit_util.py] Note: "pdfium-...@googlegroups.com" not added as a cc

Yep, this is expected, and is tracked here. It should be resolved very soon.
 

But apart from that the change is created.

2) When I click the "Add reviewer" link (tried both Chromium and
Firefox), and I type "vmi...@vmiklos.hu" in the reviewers field and add
some comment, then press the Send button -> the comment is posted, but
I'm not added to the reviewers list.

No error message, the reviewer list is just not updated.

This is almost expected. You can't add yourself as a reviewer because you're already the owner of the change. The correct behavior is that it should give you some sort of error message and refuse to send the message. Filed a bug to track that here.
 

3) When disabling PolyGerrit, I appear as a reviewer. I can even remove
myself.

Interesting, I wasn't able to reproduce that behavior while filing the bug above. After I re-added you as a reviewer, I switched to the old UI and you didn't show up in the reviewers field. If you can reproduce this behavior again, please file a bug for it and reference the one I filed above.
 

4) When I try to add a real reviewer like "dsin...@chromium.org", I
get this error:

"dsin...@chromium.org does not identify a registered user or group"

Does this happen for any other accounts that you try to add? Does this happen if you try to add dsin...@google.com? When does this message show up: as soon as you finish typing the address, or when you click Send? Does this happen if you type out the whole address, or only when you use autocomplete?

Miklos Vajna

unread,
Apr 5, 2017, 3:51:27 AM4/5/17
to Aaron Gable, Dan Sinclair, Thomas Sepez, pdfium, Lei Zhang, Tom Sepez
Hi Aaron,

On Tue, Apr 04, 2017 at 09:14:07PM +0000, Aaron Gable <aga...@google.com> wrote:
> > 3) When disabling PolyGerrit, I appear as a reviewer. I can even remove
> > myself.
> >
>
> Interesting, I wasn't able to reproduce that behavior while filing the bug
> above. After I re-added you as a reviewer, I switched to the old UI and you
> didn't show up in the reviewers field. If you can reproduce this behavior
> again, please file a bug
> <https://bugs.chromium.org/p/gerrit/issues/entry?template=PolyGerrit%20Issue>
> for it and reference the one I filed above.

I would say (unless you suggest otherwise) let's ignore this self-review
case for now: the real problem is I can't add the real reviewers,
combined with the habit of not adding reviewers is assumed to be a WIP
change in this project (as far as I understand).

> > 4) When I try to add a real reviewer like "dsin...@chromium.org", I
> > get this error:
> >
> > "dsin...@chromium.org does not identify a registered user or group"
> >
>
> Does this happen for any other accounts that you try to add? Does this
> happen if you try to add dsin...@google.com?

dsinclair was already a reviewer there, so I tried to remove him. The
error I get from PolyGerrit is:

"Server error: Not found: 5965"

The old UI gives this error:

"The page you requested was not found, or you do not have permission to
view this page."

This is a bit unexpected, since I'm the owner of this change.

> When does this message show
> up: as soon as you finish typing the address, or when you click Send?

There is no error with PolyGerrit. With the old UI there is no Send
button. When I click to the "Add reviewer" button, I fill
"n...@chromium.org" to the reviewer input field, then click the "Add"
button. Then I get this

"n...@chromium.org does not identify a registered user or group" error
message.

Same error message when I try to add "dsin...@google.com".

> Does this happen if you type out the whole address, or only when you
> use autocomplete?

It happens in both cases: when I use just the email address and when I
remove the email address and I choose a value from the autocomplete
list.

Thanks,

Miklos
signature.asc

Aaron Gable

unread,
Apr 6, 2017, 12:46:59 PM4/6/17
to Miklos Vajna, Dan Sinclair, Thomas Sepez, pdfium, Lei Zhang, Tom Sepez
Well I'm officially stumped. Let's continue all investigation in the bug.
Reply all
Reply to author
Forward
0 new messages