[reportlab-users] Reportlab performance

10 views
Skip to first unread message

Leszek Syroka

unread,
May 12, 2010, 9:20:08 AM5/12/10
to reportl...@lists2.reportlab.com
Hello,

my name is Leszek Syroka. I am CERN employee, developing Indico, a web
application using your library to create PDFs.

Right now our main concern is performance of 'platypus'. Creating a
document, similar to attached one, but containing 1032 pages, about 1
200 000 characters and no images takes about 12 minutes. Is it a normal
time of creating such document?

Moreover I found out that to insert the table of contents document is
created in three iterations. First time to create a document with blank
table of contents, second one to put there and adjust dynamically
created one and third time to fit the table of contents containing more
than one page. Is it possible to make this operation only once?

I would also like to know how much quicker is using pdfgen library in
favor of platypus, which for our application is a bit to slow.

Best regards
Leszek Syroka
sampleDoc.pdf

Andy Robinson

unread,
May 12, 2010, 9:45:55 AM5/12/10
to reportlab-users
On 12 May 2010 14:20, Leszek Syroka <leszek.ma...@cern.ch> wrote:
> Hello,
>
> Right now our main concern is performance of 'platypus'. Creating a
> document, similar to attached one, but containing 1032 pages, about 1 200
> 000 characters and no images takes about 12 minutes. Is it a normal time of
> creating such document?

No, this is not normal. We construct the entire document in memory,
but yours is simple text.

I emailed separately about the possibility of some commercial support
but whichever way you want to do it, it would be useful is you could
post some code here to show how you build up the Platypus document.

Are you using a big table which spans many pages? This could cause
problems and, looking at your document, it should not be necessary.


> Moreover I found out that to insert the table of contents document is
> created in three iterations. First time to create a document with blank
> table of contents, second one to put there and adjust dynamically created
> one and third time to fit the table of contents containing more than one
> page. Is it possible to make this operation only once?

The TableOfContents widget in Platypus does it 2x, or 3x as you say.
If you can inspect your content 'up front' and work out how many
sections there are and their titles, there will probably be a way to
'preload' the table of contents with this to eliminate at least one
pass.

There is also another technique we use in our commercial package using
delayed 'Form XObjects'. We inspect the content up-front and make a
table containing all the entries, which goes into the story. In the
right column we insert a 'Form XObject reference' (canvas.doForm(...))
saying 'draw the form "chapterXX" here', even though it isn't defined
yet. Then, on the way through the document, the forms get defined.
So single passes are possible. However, we have not packaged this
up to make it easy to use in the open source package

> I would also like to know how much quicker is using pdfgen library in favor
> of platypus, which for our application is a bit to slow.


If you need to 'move down the page' and draw paragraphs, you will need
Platypus, or something like it. However we have to eliminate any
backtracking, memory wastage and multiple passes first.


Best Regards,


--
Andy Robinson
CEO/Chief Architect
ReportLab Europe Ltd.
Media House, 3 Palmerston Road, Wimbledon, London SW19 1PG, UK
Tel +44-20-8545-1570
_______________________________________________
reportlab-users mailing list
reportl...@lists2.reportlab.com
http://two.pairlist.net/mailman/listinfo/reportlab-users

Henning von Bargen

unread,
May 12, 2010, 9:55:32 AM5/12/10
to reportl...@lists2.reportlab.com
Leszek Syroka wrote:

> Right now our main concern is performance of 'platypus'. Creating a
> document, similar to attached one, but containing 1032 pages, about
> 1 200 000 characters and no images takes about 12 minutes.
> Is it a normal time of creating such document?

12 minutes seems slow indeed. I would expect 1 or 2 minutes.
Are you sure that this is caused by ReportLab?
Could as well be a slow SQL query or whatever...

> Moreover I found out that to insert the table of contents document
> is created in three iterations. First time to create a document with
> blank table of contents, second one to put there and adjust
> dynamically created one and third time to fit the table of contents
> containing more than one page.
> Is it possible to make this operation only once?
...

Yes, it should be possible to just do that in one or at most 2 passes.

The idea is that you tell ReportLab in advance that you expect
the TOC needs N pages (you can compute that in your special case,
since you need one TOC row per article).

You could perhaps use insert a custom flowable after the TOC
that splits until page N is reached.

HTH
Henning

Leszek Syroka

unread,
May 12, 2010, 10:36:22 AM5/12/10
to reportlab-users
On 5/12/2010 3:45 PM, Andy Robinson wrote:
> On 12 May 2010 14:20, Leszek Syroka<leszek.ma...@cern.ch> wrote:
>
>> Hello,
>>
>> Right now our main concern is performance of 'platypus'. Creating a
>> document, similar to attached one, but containing 1032 pages, about 1 200
>> 000 characters and no images takes about 12 minutes. Is it a normal time of
>> creating such document?
>>
> No, this is not normal. We construct the entire document in memory,
> but yours is simple text.
>
To be clear, I didn't mentioned the file I attached, but document
similar to it, but a way bigger. I tested the wikipedia plugin using
your library and creation time of a file containing almost the same
numbers of characters (of course with more sophisticated design and
containing images, but created on more powerful machine than my PC used
for development) took about 7,5min. Profiling show that great majority
of the time is spend inside the Reportlab library in
doctemplate->multibuild method.
> I emailed separately about the possibility of some commercial support
> but whichever way you want to do it, it would be useful is you could
> post some code here to show how you build up the Platypus document.
>
I can't give you an answer about commercial support right now, because
it has to be discussed with my section leader.
> Are you using a big table which spans many pages? This could cause
> problems and, looking at your document, it should not be necessary.
>
Only flowables I'm using are Paragraph, Spacer, PageBreak and single
TableOfContents.
>
>
>> Moreover I found out that to insert the table of contents document is
>> created in three iterations. First time to create a document with blank
>> table of contents, second one to put there and adjust dynamically created
>> one and third time to fit the table of contents containing more than one
>> page. Is it possible to make this operation only once?
>>
> The TableOfContents widget in Platypus does it 2x, or 3x as you say.
> If you can inspect your content 'up front' and work out how many
> sections there are and their titles, there will probably be a way to
> 'preload' the table of contents with this to eliminate at least one
> pass.
>
>
> There is also another technique we use in our commercial package using
> delayed 'Form XObjects'. We inspect the content up-front and make a
> table containing all the entries, which goes into the story. In the
> right column we insert a 'Form XObject reference' (canvas.doForm(...))
> saying 'draw the form "chapterXX" here', even though it isn't defined
> yet. Then, on the way through the document, the forms get defined.
> So single passes are possible. However, we have not packaged this
> up to make it easy to use in the open source package
>
>
>> I would also like to know how much quicker is using pdfgen library in favor
>> of platypus, which for our application is a bit to slow.
>>
>
> If you need to 'move down the page' and draw paragraphs, you will need
> Platypus, or something like it. However we have to eliminate any
> backtracking, memory wastage and multiple passes first.
>
>
> Best Regards,
>
>
>
Thanks for solutions
Leszek

Andy Robinson

unread,
May 12, 2010, 10:51:54 AM5/12/10
to reportlab-users
On 12 May 2010 14:55, Henning von Bargen <H.von...@t-p.com> wrote:
> The idea is that you tell ReportLab in advance that you expect
> the TOC needs N pages (you can compute that in your special case,
> since you need one TOC row per article).


I have done some experiments and am checking in a small update now...

(a) platypus/doctemplate.py: multibuild returns the number of passes,
rather than None.

(a) tests/test_platypus_toc.py now has a test demonstrating
'preloading' the table of contents, and asserting that it cuts the
number of passes from 3 to 2 for a document with a TOC longer than one
page..


If you 'preload' the TOC wth the correct section headings, you can get
down from 3 to 2 passes.
1 pass is not easy. I don't know think it can be done with this
TableOfContents widget. It
would need lazy drawing of forms. Here's how to 'preload' it...

toc = tableofcontents.TableOfContents()
toc.levelStyles = [tocLevelStyle] #need at least one style
tocEntries = []
for i in range(chapters):
#add tuple of (level, text, pageNum=0, key=None)
tocEntries.append((0, 'This is chapter %d' % i, 0, None))
toc.addEntries(tocEntries)


Leszek, the time will definitely be spent inside multibuild, because
that's the outer loop!


Best Regards,

--
Andy Robinson
CEO/Chief Architect
ReportLab Europe Ltd.
Media House, 3 Palmerston Road, Wimbledon, London SW19 1PG, UK
Tel +44-20-8545-1570
Reply all
Reply to author
Forward
0 new messages