Blake, we agree totally that it SHOULD be fixed, we just haven't been
able to do it in about 12 years. If you can crack it in 2-4 hours,
even after a week of familiarisation with the code, we would be
delighted to hire you to do it and you could pretty much name your
hourly rate ;-)
The ReportLab library does no backtracking. It can't say "that did
not work, I'll back up and try again". This was a design decision to
get decent speed out of a large Python object model. Splitting works
by returning two table objects, one to go on this page and one for the
next, with suitably modified styles for the "next page" table. It
also has to handle horizontal and vertical column spanning, the fact
that a table cell can contain anything (including more tables, with
padding around them), some cells having defined width and some being
flexible. If you split one cell you then have to figure out, for
each cell alongside it (some of which may span rows) how much of their
content must go on the first page or second page.
We looked at the algorithms used to do table layout in browsers and
they are not simple (and require C+-like speeds and multiple tries at
things), and they don't even handle splitting.
Now it may well be that a split could be possible in the special case
that there was no spanning alongside.
As for a better workaround: If your table content is just text, and
you know the width, you can work out the size of a paragraph quite
easily and accurately by calling wrap() and passing in the width and a
larger-than-needed height; it will reply with the width and heigh it
actually needs. So you could pre-scan all your content and break
into smaller chunks before building rh table. Ugly, but could be
quite accurate and would not complicate rendering.
- Andy