Batch processing slows down PDF generation

1,502 views
Skip to first unread message

Paul Waring

unread,
Dec 21, 2009, 8:33:17 AM12/21/09
to dom...@googlegroups.com
I'm currently using dompdf (0.5.1) to generate PDF files on a website,
and because it takes several seconds to create each one I'm running a
cron job which calls a function with the following code for each PDF:

$dompdf = new DOMPDF();
$dompdf->load_html($html);
$dompdf->set_paper('a4', 'landscape');
$dompdf->render();
file_put_contents($filename, $dompdf->output());

Each PDF has three pages with the following content:

Page 1: Two column table with around 10 rows.
Page 2: Eight column table with 1-20 rows.
Page 3: Two separate tables, one with 10 columns and six rows and the
other with three columns and six rows.

The HTML is valid and the tables are very basic - there's no complicated
CSS involved (I can't post the exact HTML as the information in it is
confidential).

Each PDF by itself takes 7-10 seconds to generate, but if I run three in
a row the whole process takes over a minute, and two PDFs take around
30-40 seconds. Is there any reason for this being the case? There are no
images in the PDF, so I don't think it's the problem mentioned in Issue 17.

Thanks

Paul

--
Paul Waring
http://www.pwaring.com

Paul Waring

unread,
Dec 22, 2009, 4:41:19 AM12/22/09
to dom...@googlegroups.com
Paul Waring wrote:
> Each PDF by itself takes 7-10 seconds to generate, but if I run three in
> a row the whole process takes over a minute, and two PDFs take around
> 30-40 seconds. Is there any reason for this being the case? There are no
> images in the PDF, so I don't think it's the problem mentioned in Issue 17.

In case anyone else comes across the same problem, taking the body of
the function and placing it in a separate PHP file, then calling exec()
instead of the function reduces the time from >60s to 30-40s. Using the
latest SVN revision instead of 0.5.1 also brings the 30-40s down to
20-25s. My best guess is that dompdf is using resources which aren't
freed up when the object falls out of scope, but I don't know if that's
a PHP or dompdf problem.

BrianS

unread,
Dec 22, 2009, 2:21:36 PM12/22/09
to dompdf

Are you destroying the DOMPDF object with unset() after rending your
PDF and putting the contents to a file? I'm using DOMPDF in a batch-
style process where I'm rendering one page of a document at a time.
This was necessary because trying to render the entire document at
once was taking too long and using too much memory. Switching to one
page at a time has significantly improved performance, but not until I
started destroying the DOMPDF object. Prior to that I was having all
sorts of problems.

Even so, I wouldn't be surprised if DOMPDF wasn't freeing up all the
memory it was using. We'll look into this at some point. In the
meantime it's good to remember that PHP's effectiveness with garbage
collection has improved over time, so the later the version you are
using the better in that regard.

Paul Waring

unread,
Dec 22, 2009, 5:17:49 PM12/22/09
to dom...@googlegroups.com
BrianS wrote:
> On Dec 22, 4:41 am, Paul Waring <p...@xk7.net> wrote:
>> Paul Waring wrote:
>>> Each PDF by itself takes 7-10 seconds to generate, but if I run three in
>>> a row the whole process takes over a minute, and two PDFs take around
>>> 30-40 seconds. Is there any reason for this being the case? There are no
>>> images in the PDF, so I don't think it's the problem mentioned in Issue 17.
>> In case anyone else comes across the same problem, taking the body of
>> the function and placing it in a separate PHP file, then calling exec()
>> instead of the function reduces the time from >60s to 30-40s. Using the
>> latest SVN revision instead of 0.5.1 also brings the 30-40s down to
>> 20-25s. My best guess is that dompdf is using resources which aren't
>> freed up when the object falls out of scope, but I don't know if that's
>> a PHP or dompdf problem.
>>
>> Paul
>>
>> --
>> Paul Waringhttp://www.pwaring.com
>
> Are you destroying the DOMPDF object with unset() after rending your
> PDF and putting the contents to a file?

No, I was just letting it fall out of scope and assumed (wrongly it
seems!) that PHP would do its own garbage collection. I'll see if I can
knock up a test with unset() tomorrow in case that speeds things up, and
also look at the one page at a time idea, as that might help.

Cheers

Paul Waring

unread,
Dec 23, 2009, 4:12:14 AM12/23/09
to dom...@googlegroups.com

After a bit of testing (times are averages from a few runs):

Without unset: takes 70s to create 6 PDF files, and only manages 3
before bailing out.

With unset: takes 80s to create 6 PDF files, and only manages 3 before
bailing out.

With exec: takes 35s to create 6 PDF files, and manages to create all 6
successfully.

So, either I'm doing something wrong (always a possibility!) or
somewhere PHP is failing to free up lots of memory. Whatever the case
is, turning the function into a separate script and running it using
exec() halves the time taken and the task is completed in full.

BrianS

unread,
Dec 23, 2009, 12:54:30 PM12/23/09
to dompdf

I would by no means claim to be an expert on PHP's internal workings.
It's something I'll have to study more when we work on a performance-
oriented release. DOMPDF doesn't currently do anything to release
memory when it's done working. I've been thinking that perhaps we
could decrease the memory load by cleaning up things in an object
destructor. But again, this will take some studying on my part.

> After a bit of testing (times are averages from a few runs):
>
> Without unset: takes 70s to create 6 PDF files, and only manages 3
> before bailing out.
>
> With unset: takes 80s to create 6 PDF files, and only manages 3 before
> bailing out.

I guess the extra time when using unset() shouldn't be too much of a
surprise since PHP would have to take a moment to clean up the
variable. I did notice significant improvement in speed in my case,
but I was creating a fairly complex document with over a dozen pages,
more than half of which had tables. Perhaps your document isn't
complex enough to see the same kind of improvement.

> With exec: takes 35s to create 6 PDF files, and manages to create all 6
> successfully.

I'm not really sure why you would see a significant difference between
a function within the script and rendering using an external script. I
would actually expect it to take more time since you have the extra
overhead of starting a new instance of PHP. One possible explanation
could be that PHP isn't cleaning up resource usage until the script as
a whole completes, even when using unset(). I've seen this mentioned
in comments in the PHP documentation, so perhaps running an external
script is actually less resource intensive than having multiple
instances of DOMPDF floating around unused in memory.

In my instance I'm also creating the documents offline. I'm not using
a cron job, but a PHP "fork" as described at <http://
www.welldonesoft.com/technology/articles/php/forking/>. This made it
possible for me to implement user-initiated document creation. The
documents are created one at a time to decrease system load. You can
find a version of the code I use at <http://eclecticgeek.com/code/
dompdf.txt>.

Paul Waring

unread,
Jan 4, 2010, 10:48:04 AM1/4/10
to dom...@googlegroups.com
BrianS wrote:
> I'm not really sure why you would see a significant difference between
> a function within the script and rendering using an external script. I
> would actually expect it to take more time since you have the extra
> overhead of starting a new instance of PHP. One possible explanation
> could be that PHP isn't cleaning up resource usage until the script as
> a whole completes, even when using unset(). I've seen this mentioned
> in comments in the PHP documentation, so perhaps running an external
> script is actually less resource intensive than having multiple
> instances of DOMPDF floating around unused in memory.

It's actually the case that unset() doesn't free up the memory, it just
marks the object as no longer in use. I don't think there's any specific
point for when PHP runs the garbage collection, except obviously at the
end of the script everything should be freed up. You can force the
garbage collector to run at any given point, but that's not available
until 5.3.x and the functions are not documented yet. Until then,
executing a separate script seems to do the job and produce a
significant performance improvement, both in terms of execution time and
memory use (in this particular case anyway).

BrianS

unread,
Jan 4, 2010, 2:55:32 PM1/4/10
to dompdf

On Jan 4, 10:48 am, Paul Waring <p...@xk7.net> wrote:
> BrianS wrote:
> > I'm not really sure why you would see a significant difference between
> > a function within the script and rendering using an external script. I
> > would actually expect it to take more time since you have the extra
> > overhead of starting a new instance of PHP. One possible explanation
> > could be that PHP isn't cleaning up resource usage until the script as
> > a whole completes, even when using unset(). I've seen this mentioned
> > in comments in the PHP documentation, so perhaps running an external
> > script is actually less resource intensive than having multiple
> > instances of DOMPDF floating around unused in memory.
>
> It's actually the case that unset() doesn't free up the memory, it just
> marks the object as no longer in use. I don't think there's any specific
> point for when PHP runs the garbage collection, except obviously at the
> end of the script everything should be freed up.

This is the impression I was getting based on what I've read so far
about PHP's garbage collection.

> You can force the garbage collector to run at any given point, but that's
> not available until 5.3.x and the functions are not documented yet. Until
> then, executing a separate script seems to do the job and produce a
> significant performance improvement, both in terms of execution time and
> memory use (in this particular case anyway).

Since the functionality isn't available until 5.3.x we can't rely on
forced garbage collection. Your point in using an externally executed
script to perform the PDF rendering is well made.
-b

Reply all
Reply to author
Forward
0 new messages