Thinking about the PDFium API

1,729 views
Skip to first unread message

Dan Sinclair

unread,
Mar 12, 2018, 3:44:10 PM3/12/18
to pdfium
Hi all,

I've been putting some thought into the PDFium public/ API and, before wandering down any one path, I'd like to hear how folks outside of Chrome are integrating with PDFium.

In Chrome we, basically, end up with a C++ wrapper on top of the PDFium C API. The wrapper is a nicer interface to PDFium for our purposes. Are other groups integrating PDFium using the C API directly, building a C++ layer on top or, doing something else? If we had a C++ API instead of a C API would that be an issue? (The core PDFium will always be C++, I'm just talking about the external interface in public/)

One other large change I'd like to see is using UTF-8 strings instead of UTF16-LE. My understanding is that UTF16-LE is easier to work with on Windows due to it matching the native format, are there other benefits on keeping it instead of using UTF-8?

Thanks,
dan


Andreas Falkenhahn

unread,
Mar 12, 2018, 4:08:26 PM3/12/18
to Dan Sinclair, pdfium
Hi Dan,

I'm using PDFium from a project written in C so I'd be sad to see the C API being replaced by a C++ API. Maybe a compatibility layer could be created to keep the C API in place.

As for UTF-8 I'm all for it because it's much more convenient to deal with it for my app which uses UTF-8 internally.

Best

Andreas
--
Best regards,
Andreas Falkenhahn mailto:and...@falkenhahn.com

Krzysztof Kowalczyk

unread,
Mar 13, 2018, 12:13:07 AM3/13/18
to pdfium
Yes to UTF-8, no to C++ only.

Most platforms do UTF-8 but not UTF16-LE so having UTF-8 in API makes it easier to use on mac os/linux and it's trivial to write UTF16-LE -> UTF-8 on Windows, because the API is already provided by the OS. It's easier than doing UTF-8 => UTF16-LE on mac or linux. But it would work either way with little effort.

Replacing C API with C++ only API makes it much harder to use the .dll/.so from other languages like C#, Go, Rust, Python, JavaScript etc.

Writing bindings for C is relatively well supported by any language.

Writing bindings for C++ is black art and at time impossible (so people resort to wrapping C++ classes in C API, see e.g. https://github.com/google/skia/blob/master/experimental/c-api-example/c.md).

Removing C API would be a major issue for anyone trying to use PDFium from a language other than C++.

Adding C++ API wouldn't hurt, just don't take away C API.

Regards,

-- kjk

rsippl

unread,
Mar 13, 2018, 3:53:13 AM3/13/18
to pdfium
Hi Dan,

I'm working on a project written in C++ (and Qt). Using the PDFium C API directly is a bit cumbersome, especially when called from modern C++. I ended up using smart pointers with the FPDF_Close... functions as custom deleters, e.g. for a page:

auto page = std::shared_ptr<void>(FPDF_LoadPage(doc.get(), pageIndex), FPDF_ClosePage);

On top of that, I have an OO wrapper, e.g. a class "Page" with methods for accessing / creating annotations, page objects etc. 

I believe a similar wrapper would complement the public C API in a very useful way, but it shouldn't replace the C API, for reasons already mentioned by Krzysztof.

Regarding UTF-8, in my early PDFium days I was surprised that it used UTF16-LE, but as I'm using Qt with its versatile QString, that's currently no issue for me.

Dan Sinclair

unread,
Mar 13, 2018, 10:23:33 AM3/13/18
to Ralf S., pdfium
On Tue, Mar 13, 2018 at 3:53 AM rsippl <ralf....@gmail.com> wrote:
I'm working on a project written in C++ (and Qt). Using the PDFium C API directly is a bit cumbersome, especially when called from modern C++. I ended up using smart pointers with the FPDF_Close... functions as custom deleters, e.g. for a page:

auto page = std::shared_ptr<void>(FPDF_LoadPage(doc.get(), pageIndex), FPDF_ClosePage);


Take a look in the public/cpp folder, we have a bunch of these setup already, although for unique_ptr, not shared_ptr. If you include the fpdf_deleter.h header youcan then do:

  std::unique_ptr<void, FPDFTextPageDeleter> textpage(FPDFText_LoadPage(page));

dan
 

Ryan Wiley

unread,
Mar 13, 2018, 3:37:57 PM3/13/18
to pdfium
Dan,

I currently compile pdfium as a shared library and call the C APIs directly from C#. In the future, I would like to add my own C++ layer in-between and just use pdfium as a static library.

Thanks,

Ryan Wiley

Martin Sandsmark

unread,
Mar 14, 2018, 4:43:51 AM3/14/18
to Dan Sinclair, pdfium
Hi!


On 12 March 2018 at 20:43, Dan Sinclair <dsin...@chromium.org> wrote:
> I've been putting some thought into the PDFium public/ API and, before
> wandering down any one path, I'd like to hear how folks outside of Chrome
> are integrating with PDFium.

As well as what people people have mentioned in this thread, there's
some other open source projects I'm aware of at least:
https://github.com/KDE/okular/tree/pdfium_generator/generators/pdfium
https://github.com/paulovap/qtpdfium
http://code.qt.io/cgit/qt-labs/qtpdf.git/


> In Chrome we, basically, end up with a C++ wrapper on top of the PDFium C
> API. The wrapper is a nicer interface to PDFium for our purposes. Are other
> groups integrating PDFium using the C API directly, building a C++ layer on
> top or, doing something else? If we had a C++ API instead of a C API would
> that be an issue? (The core PDFium will always be C++, I'm just talking
> about the external interface in public/)

All the projects I'm aware of that use pdfium are in c++ (including
our applications), and there's a lot of duplication of effort in
wrapping the C API, so IMHO this makes a lot of sense.

It also just feels wrong to have c++ on both ends (internally and in
the application using pdfium), but being limited to a C API in
between. :-)

As for bindings for other languages; there's a ton of c++ libraries
out there, so I assume there's some solution for those languages to
have bindings for c++ APIs.


> One other large change I'd like to see is using UTF-8 strings instead of
> UTF16-LE. My understanding is that UTF16-LE is easier to work with on
> Windows due to it matching the native format, are there other benefits on
> keeping it instead of using UTF-8?

I'm not familiar enough with other frameworks, but at least Qt also
uses UTF16 internally for performance reasons (the string handling is
also one of the few parts of Qt that has hand-written assembly for
this reason). So at least when integrating with a Qt application there
would be a slight performance decrease, but I doubt it matters for 99%
of the cases.

What does pdfium use internally, and what does chromium use internally?


--
Martin Sandsmark
Chief Technical Officer
+47 980 33 988

https://remarkable.com
Pilestredet 75c, 0354 Oslo - Norway

rsippl

unread,
Mar 16, 2018, 8:04:36 AM3/16/18
to pdfium
On Wednesday, March 14, 2018 at 9:43:51 AM UTC+1, Martin Sandsmark wrote:
It also just feels wrong to have c++ on both ends (internally and in
the application using pdfium), but being limited to a C API in
between. :-)

In fact, having C between a C++ library and C++ client code isn't that uncommon. With a C++ API, it's much easier to break binary compatibility than with a C one, so dropping in a replacement .so/.dll won't always work.

What I'd like to have is a higher level, object oriented C++ wrapper in a static library I can link my C++ code against. Under the hood, that library talks to a shared library containing PDFium via the (current) public C API.

Miklos Vajna

unread,
Mar 18, 2018, 8:59:11 AM3/18/18
to pdfium
Hi,

On Fri, Mar 16, 2018 at 05:04:36AM -0700, rsippl <ralf....@gmail.com> wrote:
> In fact, having C between a C++ library and C++ client code isn't that
> uncommon.

It's so common that it even has a name. :-) See e.g.
<https://www.slideshare.net/StefanusDuToit/cpp-con-2014-hourglass-interfaces-for-c-apis>.

Regards,

Miklos
signature.asc

Lei Zhang

unread,
Mar 19, 2018, 6:38:05 PM3/19/18
to Dan Sinclair, pdfium
[Taking off my PDFium developer hat, and putting on a Chromium/Chrome
PDF Viewer developer hat]

From the Chromium side, my preference would be to have a C++ API. If a
C++ PDFium API existed, we would just use that. PDFium is closely
integrated into Chromium, so there's no separate DLL / stable ABI
concerns. Chromium (and other projects) can then stop writing wrappers
around the C API. Since Chromium just wants to use the C++ API,
there's no strong preference for keeping/removing the C API.

One thing I'm wondering - if PDFium ends up in a state where there
exists both a C and a C++ API, would it be easier to:
1) Write the C++ API as a wrapper around the C API.
2) Write the C API as a wrapper around the C++ API.
3) Have both APIs talk directly to the C++ implementation.
> --
> You received this message because you are subscribed to the Google Groups
> "pdfium" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pdfium+un...@googlegroups.com.
> To post to this group, send email to pdf...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pdfium/CAMYH%3DOhVJ3NXgmVqqS7bFzBhppTEt%2BHBC94-bTmJMOy%3D7UGkWA%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

Dan Sinclair

unread,
Mar 20, 2018, 9:16:29 AM3/20/18
to Lei Zhang, Dan Sinclair, pdfium
So, from the sounds of things going with C and C++ APIs is what we're looking at. It looks like if we go with C99 we'd get uint32_t and friends, is there any benefit to going with c11 over c99?. C++ will be C++11.

I'd say we do c++ -> c -> c++. That way, we can write our tests the the c++ API and we'll get testing of the C API as we call through.

I'm going to start thinking on what the API should look and will try to write something up before committing any code.

dan

Martin Sandsmark

unread,
Mar 21, 2018, 8:41:59 AM3/21/18
to rsippl, pdfium
Hi!


On 16 March 2018 at 13:04, rsippl <ralf....@gmail.com> wrote:
> On Wednesday, March 14, 2018 at 9:43:51 AM UTC+1, Martin Sandsmark wrote:
>> It also just feels wrong to have c++ on both ends (internally and in
>> the application using pdfium), but being limited to a C API in
>> between. :-)
> In fact, having C between a C++ library and C++ client code isn't that
> uncommon.

Just because something is common doesn't mean it's good or the best
approach. :-)

FWIW, I can't really think of any large C++ libraries maintaining C
APIs. From the top of my head dlib uses pybind11 for the Python
bindings, Qt has bindings for a ton of languages without any kind of C
API or C wrappers, same with KDE frameworks, etc.


> With a C++ API, it's much easier to break binary compatibility
> than with a C one, so dropping in a replacement .so/.dll won't always work.

pdfium doesn't even have any API stability promises, so ABI
compatibility is pretty moot. And in addition pdfium is built as a
static library, so that point is moot as well.


> What I'd like to have is a higher level, object oriented C++ wrapper in a
> static library I can link my C++ code against. Under the hood, that library
> talks to a shared library containing PDFium via the (current) public C API.

pdfium is already a static library.


So in conclusion I don't see the point of a C API at all, it just
means more overhead in pdfium development. It'll be easier for the
community to contribute as well if these kind of wrappers/bindings are
developed externally as well.
Message has been deleted

Sam Chan

unread,
Mar 8, 2019, 9:11:05 PM3/8/19
to pdfium

Hi Dan,

I tried to use PDFIUM with CPP, I got into a lot of linker error on visual studio 2017
I use the gn with the recommendation on the follow site https://github.com/tct2k/wxPDFView;
Please provide suggestion to link with cpp mt sample code.

Severity    Code    Description    Project    File    Line    Suppression State
Error    LNK2001    unresolved external symbol __imp__wassert    sample1    C:\workspace\PDFIUM-ALL\sample1\pdfwindow.lib(PWL_ListBox.obj)    1   
Error    LNK2001    unresolved external symbol __imp__wassert    sample1    C:\workspace\PDFIUM-ALL\sample1\fdrm.lib(fx_crypt_sha.obj)    1   
Error    LNK2001    unresolved external symbol __imp__wassert    sample1    C:\workspace\PDFIUM-ALL\sample1\fdrm.lib(fx_crypt_aes.obj)    1   
Error    LNK2001    unresolved external symbol __imp__wassert    sample1    C:\workspace\PDFIUM-ALL\sample1\fx_libopenjpeg.lib(t2.obj)    1   
Error    LNK2001    unresolved external symbol __imp__wassert    sample1    C:\workspace\PDFIUM-ALL\sample1\fx_libopenjpeg.lib(raw.obj)    1   
Error    LNK2001    unresolved external symbol __imp__wassert    sample1    C:\workspace\PDFIUM-ALL\sample1\pdfium.lib(cfx_systemhandler.obj)    1   
Error    LNK2001    unresolved external symbol __imp__wassert    sample1    C:\workspace\PDFIUM-ALL\sample1\pdfwindow.lib(PWL_ComboBox.obj)    1   






On Monday, March 12, 2018 at 12:44:10 PM UTC-7, Dan Sinclair wrote:

Dan Sinclair

unread,
Mar 11, 2019, 10:08:34 AM3/11/19
to Sam Chan, pdfium
Hi Sam,

I don't know what the 'cpp mt sample code' you're referring to is? The wxPDFView library isn't associated with PDFium other then using it as a component, so if the sample is part of their code you'll need to file an issue with them. It looks like your linker didn't find assert for some reason.

dan


--
You received this message because you are subscribed to the Google Groups "pdfium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pdfium+un...@googlegroups.com.
To post to this group, send email to pdf...@googlegroups.com.

Sam Chan

unread,
Mar 11, 2019, 2:32:43 PM3/11/19
to pdfium
Hi Dan,


mkdir pdfium
cd pdfium
set DEPOT_TOOLS_WIN_TOOLCHAIN=0
gclient config --unmanaged --spec "solutions=[{'name':'pdfium','url':'https://pdfium.googlesource.com/pdfium.git@chromium/2953','deps_file':'DEPS','managed':False,'custom_deps':{},'safesync_url':''}]"
gclient sync
cd pdfium
set PDFIUM_DIR=%cd%
gn gen out/Debug --args="pdf_enable_xfa=false pdf_enable_v8=true pdf_is_standalone=true is_component_build=true target_cpu=\"x86\" is_debug=true"
ninja -C out/Debug pdfium

Above script seem to be the only instruction I found and was  able to build pdfium in https://github.com/TcT2k/wxPDFView website.
and I was trying to build a sample code from visual studio to test PDFIum.  Do you have any have good reference to build PDFium with visual studio 2017.

Thanks,
Sam

Dan Sinclair

unread,
Mar 11, 2019, 2:50:18 PM3/11/19
to Sam Chan, pdfium
I've never tried to build PDFium with Visual Studio, so I don't know of any off hand. Others in the group maybe able to help you.

dan


--
You received this message because you are subscribed to the Google Groups "pdfium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pdfium+un...@googlegroups.com.
To post to this group, send email to pdf...@googlegroups.com.

ZHUO QL (KDr2)

unread,
Mar 11, 2019, 11:56:04 PM3/11/19
to Sam Chan, pdfium
Hi Sam,

Also, you can download a pre-built tarball from http://cxan.kdr2.com/pdfium/windows-x64/.

I use that pre-built lib with Qt on windows, it works fine.

Greetings.

ZHUO QL (KDr2, http://kdr2.com)



Reply all
Reply to author
Forward
0 new messages