PDFium thread safety

3,199 views
Skip to first unread message

None

unread,
Jun 13, 2016, 8:16:07 PM6/13/16
to pdfium
PDFium currently is not thread safe from what I have gathered.  I was wondering if there are any efforts underway to make it thread safe, and if not, if the PDFium project would be open to outside contributions to make it thread safe, most likely by using the C++11 thread_local (I don't see anything listed in the Google C++ style guide on <thread> or thread_local) and a per-thread initialization (versus global initialization).

Looking forward to any comments/thoughts.

Thanks!

Lei Zhang

unread,
Jun 13, 2016, 8:20:13 PM6/13/16
to None, pdfium
That's correct. PDFium is not thread safe. Why do you want to make it
thread safe though? If you are trying to render multiple documents,
you can just run multiple PDFium processes instead. That's what we do
for PDFium's corpus tests.
> --
> You received this message because you are subscribed to the Google Groups
> "pdfium" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pdfium+un...@googlegroups.com.
> To post to this group, send email to pdf...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pdfium/5b03b91f-fc97-46d1-a0cf-4212b4dc02a5%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

None

unread,
Jun 13, 2016, 8:23:59 PM6/13/16
to pdfium, andr...@live.com
Hi Lei,

We are using PDFium as a server-side application and there is a more overhead and more resource usage in out-of-process rendering since we need to use interprocess communication.  Is there any reason to not make PDFium thread safe or is there just not much interest?  We would be interested in contributing code to make PDFium thread safe if the PDFium project is open to that (even if it's a separate build option or such).

Thanks!

Lei Zhang

unread,
Jun 14, 2016, 4:44:35 PM6/14/16
to None, pdfium
It's additional work and complexity. a few things off the top of my
head - not only does PDFium itself has to be thread-safe, all the
libraries it uses must be as well, or at least be used in a safe
manner. Then you have issues like what if multiple threads touch the
same document and one of them is modifying it. The list goes on.

It is a lot of work and the active PDFium developers would rather
spend time on fundamentals like code health, test coverage, and
(security) bugs. We'd also like to get XFA out the door. So
multi-threading is not exactly on the top of list of concerns,
especially when parallelism can be achieved via multiple processes.
Not to mention separate processes has its own advantages like better
stability and isolation.
> https://groups.google.com/d/msgid/pdfium/40a6eb27-5318-4bee-afd6-b25531931c1d%40googlegroups.com.

None

unread,
Jun 15, 2016, 2:19:34 AM6/15/16
to pdfium, andr...@live.com
Thanks for your thoughts on this.

Our use case is primarily thread safety when different threads are operating on completely different documents.  We don't need thread safety when modifying on the same document, and generally there is no expectation that modifying an object is thread safe (for example, the convention in the C++ standard library is that const functions are thread-safe, as well as functions that are operating on different objects, but non-const functions are not guaranteed to be thread safe (with a few exceptions))

In terms of making PDFium thread safe when operating on different documents, it seems that making the global/static variables (for example the CPDF_ModuleMgr, etc.) thread_local would solve the problem (and requiring that each thread initialize the PDFium library).  I can see that making PDFium thread safe when operating on the same document (even just the const functions) might introduce unnecessary complexity due to possible caching, etc. but I'm hoping that making PDFium thread safe when operating on different documents isn't as difficult.  We're comfortable with making these changes ourselves but we wanted to reduce the future maintenance cost and thought that the PDFium community would be interested (searching through the forums there are other people who have asked about thread safety it seems).  As for third party libraries, most that I have encountered are thread safe when operating on different objects.  Of course if you don't think that this will be useful to the PDFium community or don't think that the maintenance cost is worth it then I suppose we'll have to take care of that on our side.  Let us know what your thoughts are on this.

Thanks!

Lei Zhang

unread,
Jun 15, 2016, 8:01:29 PM6/15/16
to None, pdfium
If we establish thread safety to mean no two threads access the same
document at the same time, then it's more feasible. If it mostly mean
getting rid of non-const globals or protecting / having per-thread
copies of them, I think that's doable.

Have you actually checked to make sure all the third party libraries
are thread safe? If they are not, and they need to be made
thread-safe, that is outside of PDFium's control. We would highly
prefer not to maintain a (heavily) modified version of a third party
library.
> https://groups.google.com/d/msgid/pdfium/13a08dc7-1fcf-4f75-af58-5640ece63503%40googlegroups.com.

None

unread,
Jul 7, 2016, 12:18:49 AM7/7/16
to pdfium
Hi Lei,

I looked into this and this is what I found about the thread safety of the third party libraries...

icu - thread safe except for "writes" to the same object (writes = calls to non-const member functions) http://userguide.icu-project.org/design#TOC-ICU-Threading-Model-and-Open-and-Close-Model
v8 - thread safe for different v8::isolate (see Jochen's response in https://groups.google.com/forum/#!topic/v8-users/oN_3tVBd3H4)
bigint - no documentation on website but will investigate the code...
fx_agg - no documentation also...
fx_freetype - yes https://www.freetype.org/freetype2/docs/ft2faq.html (search for "thread-safe")
fx_lcms2 - not sure, but it appears to be thread-safe on a different context.  Will investigate further.
fx_libopenjpeg - not sure as well.
fx_lpng - yes http://libpng.org/pub/png/libpng-manual.txt (search for "thread safe")
fx_zlib - yes on different streams (http://www.zlib.net/zlib_faq.html item 21)
jpeg - not sure

I'm still looking into it but I think that even if some libraries are not thread safe we can synchronize calls to those libraries with little cost.

Thanks!

None

unread,
Jul 7, 2016, 12:46:09 AM7/7/16
to pdfium
Also I looked through bigint and there are no global variables or static variables...

icu - thread safe except for "writes" to the same object (writes = calls to non-const member functions) http://userguide.icu-project.org/design#TOC-ICU-Threading-Model-and-Open-and-Close-Model
v8 - thread safe for different v8::isolate (see Jochen's response in https://groups.google.com/forum/#!topic/v8-users/oN_3tVBd3H4)
bigint - no documentation, but appears to be by inspection of code
fx_agg - no documentation but will take a look...

fx_freetype - yes https://www.freetype.org/freetype2/docs/ft2faq.html (search for "thread-safe")
fx_lcms2 - not sure, but it appears to be thread-safe on a different context.  Will investigate further.
fx_libopenjpeg - not sure as well.
fx_lpng - yes http://libpng.org/pub/png/libpng-manual.txt (search for "thread safe")
fx_zlib - yes on different streams (http://www.zlib.net/zlib_faq.html item 21)
jpeg - not sure

Thanks!

Lei Zhang

unread,
Jul 28, 2016, 4:55:03 PM7/28/16
to None, pdfium

David Wilson

unread,
Nov 28, 2016, 6:19:48 AM11/28/16
to pdfium, andr...@live.com
Hi,

I'm trying to extract the text from several pdf documents and would like to do it using multiple threads in a VB.Net program. You mentioned about starting multiple processes. How would I go about doing this?  I can convert from C# if that's easier to reply with?

Many many thanks.

Wei Li

unread,
Nov 28, 2016, 1:41:26 PM11/28/16
to David Wilson, pdfium, andr...@live.com
Hi, David:

I believe what Lei meant was you could use PDFium library in multiple processes. It is as if you already have a program, you run multiple instances of it.


To unsubscribe from this group and stop receiving emails from it, send an email to pdfium+unsubscribe@googlegroups.com.

To post to this group, send email to pdf...@googlegroups.com.

ral...@webdox.cl

unread,
Jul 25, 2018, 8:46:10 PM7/25/18
to pdfium
Hi OP,

How did you solve your use case?

I concur that the date of posting and the publishing of a blog post at some big "Documents" company matches with the use case you described, but I'm just guessing.
We solved the same use case by running independent processes and running our workloads with big usage quotas to avoid contention from our container infrastructure. Works pretty well so far.

David Tejnora

unread,
Mar 9, 2021, 1:42:43 PM3/9/21
to pdfium
Hello, 
   may be help someone. We are using pdfium only for reading/iterating document on page object level. I checked all libraries which are needed for reading and I think all are thread safe. The main problem are fonts, because pdfium modify FreeType fontface and cached it therefore I have to rewrite this part. Some types/pattern(Observable), caches are not thread safe too. Another problem I didn't find.

The next main problems which we had to solve:
1. support only RGB colorspace, we need to support all colorspaces
2. expand/parsed 'form' pageobjects on each page  were  performance problem, when the software supports overlays
3. performance

But despite these problems, it is a very nice library.
David
Reply all
Reply to author
Forward
0 new messages