[reportlab-users] Blank space being added to beginning of PDF when using TableOfContents

134 visualizações
Ir para a primeira mensagem não lida

Rollins, Robert B.

não lida,
17/08/2022, 21:22:0817/08/22
para reportl...@lists2.reportlab.com

I’ve found that attempting to include a TableOfContents with the ~700 page PDF I’ve generating causes a 3/4-page chunk of blank space to get added to the beginning of the cover page, pushing most of the content in it down into a second page. This increases the length of my document by 1 page, and plays absolute havoc with the TableOfContents itself.

 

In fact, this completely breaks PDF generation, because it forces all the TOCEntries to change page number on every other pass of multiBuild(), since the blank space doesn’t get added on odd-numbered passes (and HOO boy was it an adventure to figure THAT detail out!). Since TableOfContents.isSatisfied() never returns True, due to even-numbered passes having different page numbers from odd-numbered passes, multiBuild() just loops infinitely until it hits maxPasses, and then kills itself.

 

I only discovered that the blank space was being added at all because some combinations of styles and text content make the TOC itself change length enough between passes to make the page numbers coincidentally remain the same on each pass, and the PDF generation successfully finishes. But then I’ve still got a huge blank spot at the front of the PDF.

 

Does anyone know what could possibly because causing this large chunk of blank space to get added to my first page? It doesn’t appear to be due to some spurious extra Flowable being added to the beginning of my story on even-numbered passes, so I’m completely baffled as to what might be happening.

Robin Becker

não lida,
18/08/2022, 03:57:0718/08/22
para reportlab-users,Rollins, Robert B.
Hi Robert,

Sorry this is causing you problems.

Does this occur when you reduce the actual content to say 3 pages? Does the size of extra space vary with the number of
pages?

If a small sample shows the issue then perhaps you could post the code here with content replaced by Lorum Ipsum text.
There are different ways to produce TOC, but if we can see the overall code structure it might be easy to find any problem.

On 18/08/2022 02:22, Rollins, Robert B. wrote:
> I’ve found that attempting to include a TableOfContents with the ~700 page PDF I’ve generating causes a 3/4-page chunk of blank space to get added to the beginning of the cover page, pushing most of the content in it down into a second page. This increases the length of my document by 1 page, and plays absolute havoc with the TableOfContents itself.
>

........--
Robin Becker
_______________________________________________
reportlab-users mailing list
reportl...@lists2.reportlab.com
https://pairlist2.pair.net/mailman/listinfo/reportlab-users

Robert Rollins

não lida,
18/08/2022, 21:20:2318/08/22
para reportlab-users
Well, my first attempt to reply seems to have gotten lost in moderation limbo, so I'm going to try again directly from the Google Groups interface.

I tried cutting all the actual content of out the PDF, to reduce it to just the headings that I use to build the ToC from. This cut it down to ~25 pages, and I tried building it with the default TableOfContents class again (I found a hacky workaround for this issue last night by subclassing TableOfContents, but it’s REALLY hacky, and it stopped working today while I was implementing a multi-column ToC). Unfortunately, with the default TableOfContents, I went right back to getting the infinite loop problem in multiBuild(), which ends with this:


  File "/catalog/catalog/core/jobs/pdf_generator.py", line 794, in render_full_catalog
    self.doc.multiBuild(self.flowables)
  File "/catalog-ve/lib64/python3.8/site-packages/reportlab/platypus/doctemplate.py", line 1182, in multiBuild
    raise IndexError("Index entries not resolved after %d passes" % maxPasses)
IndexError: Index entries not resolved after 10 passes

Using a debugger to manually kill the looping and force it to finish the build gave me this PDF: https://drive.google.com/file/d/1usFYgNHNHhNUD9w3VGz-iwi43qMQybad/view?usp=sharing 

The huge blank spot at the beginning is there, and is the exact same size as the full 700-page PDF. The page numbers in the ToC are off by one as well (something I fixed manually in my hack).

The code even just to build this tiny version of the PDF is… a lot. I’ll include the sections that I think are probably relevant.


class FullCatalogDocTemplate(SingleSectionDocTemplate):

    ... snip ...

 

    def afterFlowable(self, flowable):

        """

        Only the full-Catalog PDF includes a table of contents, so we only define this TOCEntry functionality in

        FullCatalogDocTemplate.

        """

        super().afterFlowable(flowable)

 

        if isinstance(flowable, Paragraph):

            heading_text = flowable.getPlainText()

            if flowable.style is SECTION_TITLE_P:

                counter = self.seq.nextf('Sections')

                # Add a bookmark to the current page, which acts as the destination for clicking on this TOC entry.

                key = f'Sections-{counter}'

                self.canv.bookmarkPage(key)

                # Format the Section Titles as e.g. "1. GENERAL INFORMATION".

                toc_text = f'{counter}. {heading_text.upper()}'

                self.notify('TOCEntry', (0, toc_text, self.page, key))

            elif flowable.style is FIRST_ORDER_HEAD:

                counter = self.seq.nextf('1stOrderHeads')

                # Add a bookmark to the current page, which acts as the destination for clicking on this TOC entry.

                key = f'1stOrderHeads-{counter}'

                self.canv.bookmarkPage(key)

                # Add a level-1 TOCEntry for this Section.

                self.notify('TOCEntry', (1, heading_text, self.page, key))

 

 

 

class PDFGenerator:

    ... snip ...

 

    def render_full_catalog(self):

        # Populate self.flowables.

        self._build_cover_page()

        self._build_table_of_contents()

        for ndx, section_page in enumerate(self.catalog_edition_page.get_children().specific(), start=1):

            ... snipped code that gathers Pages to be converted into PDF content. This is what adds the headings that get put into the TOC ...

 

        # Create the buffer into which the PDF will be written.

        buffer = BytesIO()

        # Build the PDF document.

        self.doc = FullCatalogDocTemplate(

            buffer,

            pagesize=(PAGE_WIDTH, PAGE_HEIGHT),

            title='Caltech Catalog'

        )

        self.doc.multiBuild(self.flowables)

 

        # Return the raw data of the PDF, ready to be written to a file.

        buffer.seek(0)

        return buffer

 

    def _build_cover_page(self):

        base = ParagraphStyle(

            'Base Cover Page Style',

            fontName='Helvetica Neue',

            fontSize=8,

            leading=10

        )

        # TODO: Pull the year from the `catalog_label` field on self.catalog_edition_page, once Katarina adds that.

        self.flowables.append(Paragraph(

            '2022-2023',

            ParagraphStyle('', parent=base, fontSize=13, leading=18, fontName='Helvetica Neue-Light')

        ))

        self.flowables.append(Spacer(1, 10))

        self.flowables.append(Paragraph(

            'Caltech Catalog',

            ParagraphStyle('', parent=base, fontSize=35, leading=40, fontName='Helvetica Neue-CondensedBold')

        ))

        ... snip ...

 

        # Force a page break at the end of the Cover Page.

        self.flowables.append(PageBreak())

 

    def _build_table_of_contents(self):

        # Based on 2nd Order Head.

        section_title_toc = ParagraphStyle(

            'Section Title for TOC',

            fontName='Helvetica Neue-Bold',

            fontSize=9,

            leading=20,

            firstLineIndent=0,

            leftIndent=0,

            spaceBefore=10,

        )

        # Based on No-indent Standard Paragraph.

        first_order_head_toc = ParagraphStyle(

            'First Order Head TOC',

            fontName='Helvetica Neue',

            fontSize=8,

            leading=8,

            firstLineIndent=0,

            leftIndent=0

        )

        toc = TableOfContents(

            dotsMinLevel=-1,

            levelStyles=[section_title_toc, first_order_head_toc]

        )

        self.flowables.append(toc)

        self.flowables.append(PageBreak())

 

From: Robin Becker <ro...@reportlab.com>
Date: Thursday, August 18, 2022 at 12:57 AM
To: reportlab-users <reportl...@lists2.reportlab.com>, "Rollins, Robert B." <rrol...@caltech.edu>
Subject: Re: [reportlab-users] Blank space being added to beginning of PDF when using TableOfContents

 

Hi Robert,

 

Sorry this is causing you problems.

 

Does this occur when you reduce the actual content to say 3 pages? Does the size of extra space vary with the number of 

pages?

 

If a small sample shows the issue then perhaps you could post the code here with content replaced by Lorum Ipsum text. 

There are different ways to produce TOC, but if we can see the overall code structure it might be easy to find any problem.



Robin Becker

não lida,
19/08/2022, 06:43:4219/08/22
para Rollins, Robert B.,reportlab-users
Hi Robert,

you may get more information by trying to reduce the problem size; even 25 pages is too much and your code is not
runnable by anyone else. Trying to simulate what's happening by inspection is probably not going to be possible.

Since it seems that you are getting extra space at the start of the story. It might be an idea to try and see what is
actually present at the story start; and if you can see that it's obviously wrong then you should be able to determine
where it gets added. Again it would be a good idea to reduce the number of TOC entries to see if the extra space is
proportional to some count of entries.

The story is just a list of flowables so can be a list subclass which can be used to inspect/instrument inserts/appends
etc etc.

None of the tests in reportlab/tests seem to have a cover page so perhaps there's a bug; it would be interesting to know
if you looked at one of those as a model. I do know that one of the problems with dynamic TOCs is that they do occupy
space and obviously those pages have to be accounted for. The reportlab tests seem to try several different strategies
for the dynamic case.

On 18/08/2022 19:50, Rollins, Robert B. wrote:
> I tried cutting all the actual content of out the PDF, to reduce it to just the headings that I use to build the ToC from. This cut it down to ~25 pages, and I tried building it with the default TableOfContents class again (I found a hacky workaround for this issue last night by subclassing TableOfContents, but it’s REALLY hacky). Unfortunately, with the default TableOfContents, I went right back to getting the infinite loop problem in multiBuild(), which ends with this:


>
> …
> File "/catalog/catalog/core/jobs/pdf_generator.py", line 794, in render_full_catalog
> self.doc.multiBuild(self.flowables)
> File "/catalog-ve/lib64/python3.8/site-packages/reportlab/platypus/doctemplate.py", line 1182, in multiBuild
> raise IndexError("Index entries not resolved after %d passes" % maxPasses)
> IndexError: Index entries not resolved after 10 passes
>

> Using a debugger to manually kill the looping and force it to finish the build gave me the attached PDF. The huge blank spot at the beginning is there, and is the exact same size as the full 700-page PDF. The page numbers in the ToC are off by one as well (something I fixed manually in my hack).


>
> The code even just to build this tiny version of the PDF is… a lot. I’ll include the sections that I think are probably relevant.
>
>

........
--

Robert Rollins

não lida,
19/08/2022, 14:58:3019/08/22
para reportlab-users
I figured it out! Turns out the ToC was sortof a red herring, because what was really the issue was that I was using the PageTemplate system wrong. The complete lack of docs on how to use NextPageTemplate made me frustrated, and I decided to just manually set BaseDocTemplate.pageTemplate in the beforePage hook, depending on which page number was being rendered. This turned out to be a bad idea, because it somehow caused Page 1 to use some phantom template that I didn't define, which must have had a really short frame for some reason, which is what was causing the extra blank space at the top.

I discovered this because I noticed the "showBoundary" setting on Frame objects, and thought "Hmmm, maybe if I enable that for all my PageTemplates' frames, I'll be able to tell why there's so much blank space at the top of Page 1." And the result of that was that every page got a box drawn around its frames EXCEPT Page 1! 

This utterly unexpected behavior convinced me to just bite the bullet and read the ReportLab code to figure out the *right* way to use NextPageTemplate. Once I figured that out, removed my hack and started using NextPageTemplate properly, the extra space went away and Page 1 was correctly getting the box drawn around it. so I turned off showBoundary, and now my PDF is generating correctly. Hurray!

Might I suggest some proper docs be written for NextPageTemplate? It's *kindof important*.
Responder a todos
Responder ao autor
Reencaminhar
0 mensagens novas