Patch to support tables with oversize cells

85 views
Skip to first unread message

Lennart Regebro

unread,
Jan 5, 2022, 9:45:21 AMJan 5
to reportlab-users
I have implemented a new feature of tables, that enables tables which
have cells that are too tall to fit on one page to still be rendered to PDF. I'm unsure on the best way to submit it, it seems this group doesn't support attachments.

So the diff is simply pasted at the end of the description. If you want it any other way, just say the word!

//Lennart Regebro, Shoobx


== splitInRow patch ==

Currently, if a table is too tall to fit on one page and has the attribute
splitByRow set to 1 (which is the default), the table will be split into two
tables and the rows that do fit on the current page will be rendered on the
current page. The remaining rows will be printed on the second page. If the
remaining rows do not fit on that page, the table will be split again in the
same manner until the end of the table.

However, should one cell be too large for one page, the table can not be split,
and the rendering to PDF will fail. This will also happen if the splitting is
disabled by setting the splitByRow attribute to 0, and the table as a whole is
too large for a single page.

This patch implements functionality that will split a row that is taller
then a page into two rows, enabling the printing of tables with very tall rows.

It does this by adding a new attribute, splitInRow, which also defaults to 1.
This will, if and only if, a table fails to render in the allocated space,
split a row into two rows, also splitting all the content inside those rows at
the appropriate point.


=== How to use it ===

The _splitRows() method of tables now takes a new argument, doInRowSplit,
which defaults to 0. The split() method of the table will now do the following:

* If both .splitByRow and .splitInRow is 0, it will return [], ie fail.

* If .splitByRow is 1, it will call split() with doInRowSplit=False, to attempt
  a split by whole rows. But if .splitByRow is 0 is will call it with
  doInRowSplit=True to attempt to split it inside a  row.
 
* If that fails, it will fall back to doing the opposite.

* If that also fails, the method will fail (ie return []).

This results in the following behavior:

* Setting both attributes to 0 will never split the table

* Setting .splitByRow to 1 and .splitInRow to 0 will, when a split is necessary,
  only split between rows (this is the current behavior).
 
* Setting .splitByRow to 0 and .splitInRow to 1 will always split inside rows,
  unless the split is too close to the start or end of a row that it can't
  split those cells, in which case it splits between those rows.

* Setting both .splitByRow and .splitInRow to 1 will first attempt to split the
  table between rows, and only split a row if it is necessary to fit the table
  on a page. This is the new default behavior.


=== Implementation details ===

Table._splitRow() will as before find the place to split the table as
variable n. But when doInRowSplit is set to one, instead of n being the
first row after the split, n will be the row that is split.

This necessitated a small change in the logic dealing with repeatRows, as
we otherwise might split in a repeat row.

If a row should be split, each cell is checked for the minimum and maximum
split points, which typically is one line of text, as we can't split inside
a line of text. The split point is then selected to be as high as possible
inside that.

After that, each cell is split into two at the height given by a new method,
Table._splitCell().

This split looks at the valign set in the cell style and calculates where
inside the cell value the split will be, and splits at a row or flowable value
before that.

When the valign of the cell is "MIDDLE", the margins of the new split
cells are adjusted to try to keep the contents in the "middle" of a merged
cell. If the cell contents were actually split, that's easy, the first one
is simply set to valign to the bottom, and the second to the top. But if the
value was not split, the margins of that cell is adjusted to try to get
the value to remain close to the original position.

After this, a new table T is created from the new data, and the adjusted
styles, line commands, etc. This is then used in the rest of the method,
which will split the new table into two tables, in the normal fashion.


==== Other changes ====

To get CellStyle.copy() to work, self.name is passed in to the copy, and
attributes that starts with underscore are skipped.

In Table._splitRows there was a section that would adjust the line commands
list to adjust for the new split. This section did need a few extra lines
for handling the splitting of a row, and it also needed to be called in
two different places, so it was moved to its own method, Table._splitLineCmds().

Table._listCellGeom() now supports cells that have plain text content.
This could be used to simplify code in some places where it's used.

Tests were added in the test_table_layout.py module.


Diff:

diff --git a/src/reportlab/platypus/tables.py b/src/reportlab/platypus/tables.py
index 070c3fa..04717e8 100755
--- a/src/reportlab/platypus/tables.py
+++ b/src/reportlab/platypus/tables.py
@@ -60,8 +60,9 @@ class CellStyle(PropertySet):
             parent.copy(self)
     def copy(self, result=None):
         if result is None:
-            result = CellStyle()
+            result = CellStyle(self.name)
         for name in dir(self):
+            if name.startswith('_'): continue
             setattr(result, name, getattr(self, name))
         return result
 
@@ -251,7 +252,7 @@ RoundingRectLine = namedtuple('RoundingRectLine','xs ys xe ye weight color cap d
 
 class Table(Flowable):
     def __init__(self, data, colWidths=None, rowHeights=None, style=None,
-                repeatRows=0, repeatCols=0, splitByRow=1, emptyTableAction=None, ident=None,
+                repeatRows=0, repeatCols=0, splitByRow=1, splitInRow=1, emptyTableAction=None, ident=None,
                 hAlign=None,vAlign=None, normalizedData=0, cellStyles=None, rowSplitRange=None,
                 spaceBefore=None,spaceAfter=None, longTableOptimize=None, minRowHeights=None,
                 cornerRadii=__UNSET__, #or [topLeft, topRight, bottomLeft bottomRight]
@@ -336,6 +337,7 @@ class Table(Flowable):
         self.repeatRows = repeatRows
         self.repeatCols = repeatCols
         self.splitByRow = splitByRow
+        self.splitInRow = splitInRow
 
         if style:
             self.setStyle(style)
@@ -464,6 +466,10 @@ class Table(Flowable):
         w = 0
         canv = getattr(self,'canv',None)
         sb0 = None
+        if isinstance(V, str):
+            vw = self._elementWidth(V, s)
+            vh = len(V.split('\n'))*s.fontsize*1.2
+            return max(w, vw), vh
         for v in V:
             vw, vh = v.wrapOn(canv, aW, aH)
             sb = v.getSpaceBefore()
@@ -1356,46 +1362,100 @@ class Table(Flowable):
             if er>=n: er -= n
             self._addCommand((c[0],)+((sc, sr), (ec, er))+tuple(c[3:]))
 
-    def _splitRows(self,availHeight):
-        n=self._getFirstPossibleSplitRowPosition(availHeight)
-        repeatRows = self.repeatRows
-        if n<= (repeatRows if isinstance(repeatRows,int) else (max(repeatRows)+1)): return []
-        lim = len(self._rowHeights)
-        if n==lim: return [self]
+    def _splitCell(self, value, style, oldHeight, newHeight, width):
+        # Content height of the new top row
+        height0 = newHeight - style.topPadding
+        # Content height of the new bottom row
+        height1 = oldHeight - (style.topPadding + newHeight)
 
-        lo = self._rowSplitRange
-        if lo:
-            lo, hi = lo
-            if lo<0: lo += lim
-            if hi<0: hi += lim
-            if n>hi:
-                return self._splitRows(availHeight - sum(self._rowHeights[hi:n]))
-            elif n<lo:
+        if isinstance(value, (tuple, list)):
+            newCellContent = []
+            postponedContent = []
+            split = False
+            cellHeight = self._listCellGeom(value, width, style)[1]
+
+            if style.valign == "MIDDLE":
+                usedHeight = (oldHeight - cellHeight) / 2
+            else:
+                usedHeight = 0
+
+            for flowable in value:
+                if split:
+                    if flowable.height <= height1:
+                        postponedContent.append(flowable)
+                        # Shrink the available height:
+                        height1 -= flowable.height
+                    else:
+                        # The content doesn't fit after the split:
+                        return []
+                elif usedHeight + flowable.height <= height0:
+                    newCellContent.append(flowable)
+                    usedHeight += flowable.height
+                else:
+                    # This is where we need to split
+                    splits = flowable.split(width, height0-usedHeight)
+                    if splits:
+                        newCellContent.append(splits[0])
+                        postponedContent.append(splits[1])
+                    else:
+                        # We couldn't split this flowable at the desired
+                        # point. If we already has added previous paragraphs
+                        # to the content, just add everything after the split.
+                        # Also try adding it after the split if valign isn't TOP
+                        if newCellContent or style.valign != "TOP":
+                            if flowable.height <= height1:
+                                postponedContent.append(flowable)
+                                # Shrink the available height:
+                                height1 -= flowable.height
+                            else:
+                                # The content doesn't fit after the split:
+                                return []
+                        else:
+                            # We could not split this, so we fail:
+                            return []
+
+                    split = True
+
+            return (tuple(newCellContent), tuple(postponedContent))
+
+        elif isinstance(value, str):
+            rows = value.split("\n")
+            lineHeight = 1.2 * style.fontsize
+            contentHeight = (style.leading or lineHeight) * len(rows)
+            if style.valign == "TOP" and contentHeight <= height0:
+                # This fits in the first cell, all is good
+                return (value, '')
+            elif style.valign == "BOTTOM" and contentHeight <= height1:
+                # This fits in the second cell, all is good
+                return ('', value)
+            elif style.valign == "MIDDLE":
+                # Put it in the largest cell:
+                if height1 > height0:
+                    return ('', value)
+                else:
+                    return (value, '')
+
+            elif len(rows) < 2:
+                # It doesn't fit, and there's nothing to split: Fail
                 return []
+            # We need to split this, and there are multiple lines, so we can
+            if style.valign == "TOP":
+                splitPoint = height0 // lineHeight
+            elif style.valign == "BOTTOM":
+                splitPoint = len(rows) - (height1 // lineHeight)
+            else:  # MID
+                splitPoint = (height0 - height1 + contentHeight) // (2 * lineHeight)
 
-        repeatCols = self.repeatCols
-        splitByRow = self.splitByRow
-        data = self._cellvalues
+            splitPoint = int(splitPoint)
+            return ('\n'.join(rows[:splitPoint]), '\n'.join(rows[splitPoint:]))
 
-        #we're going to split into two superRows
-        ident = self.ident
-        if ident: ident = IdentStr(ident)
-        lto = self._longTableOptimize
-        if lto:
-            splitH = self._rowHeights
-        else:
-            splitH = self._argH
-        cornerRadii = getattr(self,'_cornerRadii',None)
-        R0 = self.__class__( data[:n], colWidths=self._colWidths, rowHeights=splitH[:n],
-                repeatRows=repeatRows, repeatCols=repeatCols,
-                splitByRow=splitByRow, normalizedData=1, cellStyles=self._cellStyles[:n],
-                ident=ident,
-                spaceBefore=getattr(self,'spaceBefore',None),
-                longTableOptimize=lto,
-                cornerRadii=cornerRadii[:2] if cornerRadii else None)
+        # No content
+        return ('', '')
 
+    def _splitLineCmds(self, n, doInRowSplit=0):
         nrows = self._nrows
         ncols = self._ncols
+
         #copy the commands
         A = []
         # hack up the line commands
@@ -1414,16 +1474,21 @@ class Table(Flowable):
             if er < 0: er += nrows
 
             if op in ('BOX','OUTLINE','GRID'):
-                if sr<n and er>=n:
+                if (sr<n and er>=n) or (doInRowSplit and sr==n):
                     # we have to split the BOX
                     A.append(('LINEABOVE',(sc,sr), (ec,sr), weight, color, cap, dash, join, count, space))
                     A.append(('LINEBEFORE',(sc,sr), (sc,er), weight, color, cap, dash, join, count, space))
                     A.append(('LINEAFTER',(ec,sr), (ec,er), weight, color, cap, dash, join, count, space))
                     A.append(('LINEBELOW',(sc,er), (ec,er), weight, color, cap, dash, join, count, space))
                     if op=='GRID':
-                        A.append(('LINEBELOW',(sc,n-1), (ec,n-1), weight, color, cap, dash, join, count, space))
-                        A.append(('LINEABOVE',(sc,n), (ec,n), weight, color, cap, dash, join, count, space))
-                        A.append(('INNERGRID',(sc,sr), (ec,er), weight, color, cap, dash, join, count, space))
+                        if doInRowSplit:
+                            A.append(('INNERGRID',(sc,sr), (ec,n), weight, color, cap, dash, join, count, space))
+                            A.append(('INNERGRID',(sc,n+1), (ec,er), weight, color, cap, dash, join, count, space))
+                            #A.append(('LINEBELOW',(sc,n), (ec,n), weight, color, cap, dash, join, count, space))
+                        else:
+                            A.append(('LINEBELOW',(sc,n-1), (ec,n-1), weight, color, cap, dash, join, count, space))
+                            A.append(('LINEABOVE',(sc,n), (ec,n), weight, color, cap, dash, join, count, space))
+                            A.append(('INNERGRID',(sc,sr), (ec,er), weight, color, cap, dash, join, count, space))
                 else:
                     A.append((op,(sc,sr), (ec,er), weight, color, cap, dash, join, count, space))
             elif op == 'INNERGRID':
@@ -1442,11 +1507,180 @@ class Table(Flowable):
             else:
                 A.append((op,(sc,sr), (ec,er), weight, color, cap, dash, join, count, space))
 
+        return A
+
+    def _splitRows(self,availHeight,doInRowSplit=0):
+        # Get the split position. if we split between rows (doInRowSplit=0),
+        # then n will be the first row after the split. If we split a row,
+        # then n is the row we split in two.
+        n=self._getFirstPossibleSplitRowPosition(availHeight)
+
+        # We can't split before or in the repeatRows/headers
+        repeatRows = self.repeatRows
+        maxrepeat = repeatRows if isinstance(repeatRows,int) else max(repeatRows)+1
+        if doInRowSplit and n<maxrepeat or not doInRowSplit and n<=maxrepeat:
+            return []
+
+        # If the whole table fits, return it
+        lim = len(self._rowHeights)
+        if n==lim: return [self]
+
+        lo = self._rowSplitRange
+        if lo:
+            lo, hi = lo
+            if lo<0: lo += lim
+            if hi<0: hi += lim
+            if n>hi:
+                return self._splitRows(availHeight - sum(self._rowHeights[hi:n]), doInRowSplit=doInRowSplit)
+            elif n<lo:
+                return []
+
+        repeatCols = self.repeatCols
+        splitByRow = self.splitByRow
+        splitInRow = self.splitInRow
+        data = self._cellvalues
+
+        if not doInRowSplit:
+            T = self
+        else:
+            # We are splitting the n row into two, if possible.
+            # We can't split if the available height is less than the minimum set:
+            if self._minRowHeights and availHeight < self._minRowHeights[n]:
+                return []
+
+            usedHeights = sum(self._rowHeights[:n])
+
+            cellvalues = self._cellvalues[n]
+            cellStyles = self._cellStyles[n]
+            cellWidths = self._colWidths
+            curRowHeight = self._rowHeights[n]
+
+            # First find the min/max split point
+            minSplit = 0  # Counted from top
+            maxSplit = 0  # Counted from bottom
+            maxHeight = 0
+
+            for (value, style, width) in zip(cellvalues, cellStyles, cellWidths):
+
+                if isinstance(value, (tuple, list)):
+                    # A sequence of flowables:
+                    w, height = self._listCellGeom(value, width, style)
+                    height += style.topPadding + style.bottomPadding
+                    if height > maxHeight:
+                        maxHeight = height
+                elif isinstance(value, str):
+                    rows = value.split("\n")
+                    lineHeight = 1.2 * style.fontsize
+                    height = lineHeight * len(rows) + style.topPadding + style.bottomPadding
+
+                    # Make sure we don't try to split in the middle of the first or last line
+                    minSplit = max(minSplit, lineHeight + style.topPadding)
+                    maxSplit = max(maxSplit, lineHeight + style.bottomPadding)
+
+                    if height > maxHeight:
+                        maxHeight = height
+
+            if minSplit + maxSplit > curRowHeight:
+                return []
+            if minSplit > (availHeight - usedHeights):  # Fail
+                return []
+
+            # This is where we split the row:
+            splitPoint = min(availHeight - usedHeights, maxHeight - maxSplit)
+
+            R0 = []  # Top half of the row
+            R0Height = 0  # Minimum height
+            R1 = []  # Bottom half of the row
+            R1Height = 0  # Minimum height
+            R1Styles = []
+            for (value, style, width) in zip(cellvalues, cellStyles, cellWidths):
+                v = self._splitCell(value, style, curRowHeight, splitPoint, width)
+                if not v:
+                    # Splitting the table failed
+                    return []
+
+                newStyle = style.copy()
+                if style.valign == "MIDDLE":
+                    # Adjust margins
+                    if v[0] and v[1]:
+                        # We split the content, so fix up the valign:
+                        style.valign = "BOTTOM"
+                        newStyle.valign = "TOP"
+                    else:
+                        # Adjust the margins to push it towards the true middle
+                        h = self._listCellGeom(v[0] or v[1], width, style)[1]
+                        margin = (curRowHeight - h) / 2
+                        if v[0]:
+                            style.topPadding += margin
+                        elif v[1]:
+                            newStyle.bottomPadding += margin
+                R0.append(v[0])
+                R1.append(v[1])
+                h0 = self._listCellGeom(v[0], width, style)[1] + style.topPadding + style.bottomPadding
+                R0Height = max(R0Height, h0)
+                h1 = self._listCellGeom(v[1], width, style)[1] + style.topPadding + style.bottomPadding
+                R1Height = max(R1Height, h1)
+                R1Styles.append(newStyle)
+
+            # Make a new table with the row split into two:
+            usedHeight = min(splitPoint, R0Height)
+            newRowHeight = max(R1Height, self._rowHeights[n] - usedHeight)
+            newRowHeights = self._rowHeights[:]
+            newRowHeights.insert(n + 1, newRowHeight)
+            newRowHeights[n] = usedHeight
+            newCellStyles = self._cellStyles[:]
+            newCellStyles.insert(n + 1, R1Styles)
+
+            data = data[:n] + [R0] + [R1] + data[n+1:]
+
+            T = self.__class__( data, colWidths=self._colWidths,
+                rowHeights=newRowHeights, repeatRows=self.repeatRows,
+                repeatCols=self.repeatCols, splitByRow=self.splitByRow,
+                splitInRow=self.splitInRow, normalizedData=1,
+                cellStyles=newCellStyles, ident=self.ident,
+                spaceBefore=getattr(self,'spaceBefore',None),
+                longTableOptimize=self._longTableOptimize,
+                cornerRadii=getattr(self,'_cornerRadii',None))
+
+            T._linecmds = self._linecmds
+            T._linecmds = T._splitLineCmds(n, doInRowSplit=1)
+            n = n + 1
+
+        #we're going to split into two superRows
+        ident = self.ident
+        if ident: ident = IdentStr(ident)
+        lto = T._longTableOptimize
+        if lto:
+            splitH = T._rowHeights
+        else:
+            splitH = T._argH
+
+        if doInRowSplit:
+            spaceAfter = None
+        else:
+            spaceAfter = availHeight - sum(splitH[:n])
+
+
+        cornerRadii = getattr(self,'_cornerRadii',None)
+        R0 = self.__class__( data[:n], colWidths=T._colWidths, rowHeights=splitH[:n],
+                repeatRows=repeatRows, repeatCols=repeatCols, splitByRow=splitByRow,
+                splitInRow=splitInRow, normalizedData=1, cellStyles=T._cellStyles[:n],
+                ident=ident,
+                spaceBefore=getattr(self,'spaceBefore',None),
+                spaceAfter=spaceAfter,
+                longTableOptimize=lto,
+                cornerRadii=cornerRadii[:2] if cornerRadii else None)
+
+        nrows = T._nrows
+        ncols = T._ncols
+
+        A = T._splitLineCmds(n, doInRowSplit=doInRowSplit)
+
         R0._cr_0(n,A,nrows)
-        R0._cr_0(n,self._bkgrndcmds,nrows,_srflMode=True)
-        R0._cr_0(n,self._spanCmds,nrows)
-        R0._cr_0(n,self._nosplitCmds,nrows)
-        for c in self._srflcmds:
+        R0._cr_0(n,T._bkgrndcmds,nrows,_srflMode=True)
+        R0._cr_0(n,T._spanCmds,nrows)
+        R0._cr_0(n,T._nosplitCmds,nrows)
+        for c in T._srflcmds:
             R0._addCommand(c)
             if c[1][1]!='splitlast': continue
             (sc,sr), (ec,er) = c[1:3]
@@ -1457,50 +1691,50 @@ class Table(Flowable):
             if isinstance(repeatRows,int):
                 iRows = data[:repeatRows]
                 iRowH = splitH[:repeatRows]
-                iCS = self._cellStyles[:repeatRows]
+                iCS = T._cellStyles[:repeatRows]
                 repeatRows = list(range(repeatRows))
             else:
                 #we have a list of repeated rows eg (1,3)
                 repeatRows = list(sorted(repeatRows))
                 iRows = [data[i] for i in repeatRows]
                 iRowH = [splitH[i] for i in repeatRows]
-                iCS = [self._cellStyles[i] for i in repeatRows]
-            R1 = self.__class__(iRows+data[n:],colWidths=self._colWidths,
+                iCS = [T._cellStyles[i] for i in repeatRows]
+            R1 = self.__class__(iRows+data[n:],colWidths=T._colWidths,
                     rowHeights=iRowH+splitH[n:],
                     repeatRows=len(repeatRows), repeatCols=repeatCols,
                     splitByRow=splitByRow, normalizedData=1,
-                    cellStyles=iCS+self._cellStyles[n:],
+                    cellStyles=iCS+T._cellStyles[n:],
                     ident=ident,
                     spaceAfter=getattr(self,'spaceAfter',None),
                     longTableOptimize=lto,
                     cornerRadii = cornerRadii,
                     )
             R1._cr_1_1(n,nrows,repeatRows,A) #linecommands
-            R1._cr_1_1(n,nrows,repeatRows,self._bkgrndcmds,_srflMode=True)
-            R1._cr_1_1(n,nrows,repeatRows,self._spanCmds)
-            R1._cr_1_1(n,nrows,repeatRows,self._nosplitCmds)
+            R1._cr_1_1(n,nrows,repeatRows,T._bkgrndcmds,_srflMode=True)
+            R1._cr_1_1(n,nrows,repeatRows,T._spanCmds)
+            R1._cr_1_1(n,nrows,repeatRows,T._nosplitCmds)
         else:
             #R1 = slelf.__class__(data[n:], self._argW, self._argH[n:],
-            R1 = self.__class__(data[n:], colWidths=self._colWidths, rowHeights=splitH[n:],
+            R1 = self.__class__(data[n:], colWidths=T._colWidths, rowHeights=splitH[n:],
                     repeatRows=repeatRows, repeatCols=repeatCols,
-                    splitByRow=splitByRow, normalizedData=1, cellStyles=self._cellStyles[n:],
+                    splitByRow=splitByRow, normalizedData=1, cellStyles=T._cellStyles[n:],
                     ident=ident,
                     spaceAfter=getattr(self,'spaceAfter',None),
                     longTableOptimize=lto,
                     cornerRadii = ([0,0] + cornerRadii[2:]) if cornerRadii else None,
                     )
             R1._cr_1_0(n,A)
-            R1._cr_1_0(n,self._bkgrndcmds,_srflMode=True)
-            R1._cr_1_0(n,self._spanCmds)
-            R1._cr_1_0(n,self._nosplitCmds)
-        for c in self._srflcmds:
+            R1._cr_1_0(n,T._bkgrndcmds,_srflMode=True)
+            R1._cr_1_0(n,T._spanCmds)
+            R1._cr_1_0(n,T._nosplitCmds)
+        for c in T._srflcmds:
             R1._addCommand(c)
             if c[1][1]!='splitfirst': continue
             (sc,sr), (ec,er) = c[1:3]
             R1._addCommand((c[0],)+((sc, 0), (ec, 0))+tuple(c[3:]))
 
-        R0.hAlign = R1.hAlign = self.hAlign
-        R0.vAlign = R1.vAlign = self.vAlign
+        R0.hAlign = R1.hAlign = T.hAlign
+        R0.vAlign = R1.vAlign = T.vAlign
         self.onSplit(R0)
         self.onSplit(R1)
         return [R0,R1]
@@ -1521,6 +1755,7 @@ class Table(Flowable):
     _getRowImpossible=staticmethod(_getRowImpossible)
 
     def _getFirstPossibleSplitRowPosition(self,availHeight):
+        # Returns the LAST possible split row position
         impossible={}
         if self._spanCmds:
             self._getRowImpossible(impossible,self._rowSpanCells,self._spanRanges)
@@ -1540,11 +1775,21 @@ class Table(Flowable):
 
     def split(self, availWidth, availHeight):
         self._calc(availWidth, availHeight)
-        if self.splitByRow:
+        if self.splitByRow or self.splitInRow:
             if not rl_config.allowTableBoundsErrors and self._width>availWidth: return []
-            return self._splitRows(availHeight)
-        else:
-            raise NotImplementedError
+
+            # If self.splitByRow is true, first try with doInRowSplit as false.
+            # Otherwise, first try with doInRowSplit as true
+            result = self._splitRows(availHeight, doInRowSplit=not self.splitByRow)
+            if result:
+                # That worked, return that:
+                return result
+
+            # The first attempt did NOT succeed, now try with the flag flipped.
+            return self._splitRows(availHeight, doInRowSplit=self.splitByRow)
+
+        # We can't split this table in any way, raise an error:
+        return []
 
     def _makeRoundedCornersClip(self, FUZZ=rl_config._FUZZ):
         self._roundingRectDef = None
diff --git a/tests/test_table_layout.py b/tests/test_table_layout.py
index edfe436..b019f1f 100644
--- a/tests/test_table_layout.py
+++ b/tests/test_table_layout.py
@@ -403,6 +403,120 @@ class TableTestCase(unittest.TestCase):
         lst.append(Paragraph("""This code does not yet handle spans well.""",
         styNormal))
 
+        lst.append(PageBreak())
+
+        lst.append(Paragraph("""Oversized cells""", styleSheet['Heading1']))
+
+        lst.append(Paragraph("""In some cases cells with flowables can end up
+        being larger than a page. In that case, we need to split the page.
+        The splitInRow attribute allows that, it's by default 0.""",
+        styNormal))
+
+        lst.append(Paragraph("""Here is a table, with splitByRow=1 and
+        splitInRow=0. It splits between the two rows.""",
+        styNormal))
+
+        ministy = TableStyle([
+            ('GRID', (0,0), (-1,-1), 1.0, colors.black),
+            ('VALIGN', (0,1), (1,1), 'BOTTOM'),
+            ('VALIGN', (1,1), (2,1), 'MIDDLE'),
+            ('VALIGN', (2,1), (3,1), 'TOP'),
+            ('VALIGN', (3,1), (4,1), 'BOTTOM'),
+            ('VALIGN', (4,1), (5,1), 'MIDDLE'),
+            ('VALIGN', (5,1), (6,1), 'TOP'),
+            ])
+        cell1 = [Paragraph(
+            """This is a very tall cell to make a tall row for splitting.""",
+            styNormal)]
+        cell2 = [Paragraph("This cell has two flowables.", styNormal),
+            Paragraph("And valign=MIDDLE.", styNormal)]
+        cell3 = [Paragraph("Paragraph with valign=TOP", styNormal)]
+
+        tableData = [
+            ['Row 1', 'Two rows:\nSo there', 'is a', 'place', 'to split', 'the table'],
+            [cell1, cell2, cell3, 'valign=BOTTOM', 'valign=MIDDLE', 'valign=TOP']
+        ]
+
+        # This is the table with splitByRow, which splits between row 1 & 2:
+        t = Table(tableData,
+                  colWidths=(50, 70, 70, 90, 90, 70),
+                  rowHeights=None,
+                  style=ministy,
+                  splitByRow=1,
+                  splitInRow=0)
+        parts = t.split(451, 55)
+        lst.append(parts[0])
+        lst.append(Paragraph("", styNormal))
+        lst.append(parts[1])
+
+        lst.append(Paragraph("""Here is the same table, with splitByRow=0 and
+        splitInRow=1. It splits inside a row.""",
+        styNormal))
+
+        # This is the table with splitInRow, which splits in row 2:
+        t = Table(tableData,
+                  colWidths=(50, 70, 70, 90, 90, 70),
+                  rowHeights=None,
+                  style=ministy,
+                  splitByRow=0,
+                  splitInRow=1)
+
+        parts = t.split(451, 57)
+        lst.append(parts[0])
+        lst.append(Paragraph("", styNormal))
+        lst.append(parts[1])
+
+        lst.append(Paragraph("""Here is the same table, with splitByRow=1 and
+        splitInRow=1. It splits between the rows, if possible.""",
+        styNormal))
+
+        # This is the table with both splits, which splits in row 2:
+        t = Table(tableData,
+                  colWidths=(50, 70, 70, 90, 90, 70),
+                  rowHeights=None,
+                  style=ministy,
+                  splitByRow=1,
+                  splitInRow=1)
+
+        parts = t.split(451, 57)
+        lst.append(parts[0])
+        #lst.append(Paragraph("", styNormal))
+        lst.append(parts[1])
+
+        lst.append(Paragraph("""But if we constrict the space to less than the first row,
+        it splits that row.""",
+        styNormal))
+
+        # This is the table with both splits and no space, which splits in row 1:
+        t = Table(tableData,
+                  colWidths=(50, 70, 70, 90, 90, 70),
+                  rowHeights=None,
+                  style=ministy,
+                  splitByRow=1,
+                  splitInRow=1)
+
+        parts = t.split(451, 15)
+        lst.append(parts[0])
+        lst.append(Paragraph("", styNormal))
+        lst.append(parts[1])
+
+        # Split it at a point in row 2, where the split fails
+        lst.append(Paragraph("""When splitByRow is 0 and splitInRow is 1, we should
+        still allow fallback to splitting between rows""",
+        styNormal))
+
+        t = Table(tableData,
+                  colWidths=(50, 70, 70, 90, 90, 70),
+                  rowHeights=None,
+                  style=ministy,
+                  splitByRow=0,
+                  splitInRow=1)
+        parts = t.split(451, 55)
+        lst.append(parts[0])
+        lst.append(Paragraph("", styNormal))
+        lst.append(parts[1])
+
+
         SimpleDocTemplate(outputfile('test_table_layout.pdf'), showBoundary=1).build(lst)
 
 def makeSuite():

Lennart Regebro via reportlab-users

unread,
Jan 11, 2022, 2:55:20 PMJan 11
to reportlab-users, Lennart Regebro
I have implemented a new feature of tables, that enables tables which
have cells that are too tall to fit on one page to still be rendered to PDF,
which is a behavior that would make things a lot easier for us, as we
sometimes have tables that have long texts in them.

See description below.

//Lennart Regebro, Shoobx

== splitInRow patch ==

Currently, if a table is too tall to fit on one page and has the attribute
splitByRow set to 1 (which is the default), the table will be split into two
tables and the rows that do fit on the current page will be rendered on the
current page. The remaining rows will be printed on the second page. If the
remaining rows do not fit on that page, the table will be split again in the
same manner until the end of the table.

However, should one cell be too large for one page, the table can not be split,
and the rendering to PDF will fail. This will also happen if the splitting is
disabled by setting the splitByRow attribute to 0, and the table as a whole is
too large for a single page.

This patch implements functionality that will split a row that is taller
than a page into two rows, enabling the rendering of tables with very tall rows.


It does this by adding a new attribute, splitInRow, which also defaults to 1.
This will if, and only if, a table fails to render in the allocated space,
splitInRow.diff

Robin Becker

unread,
Jan 13, 2022, 5:33:34 AMJan 13
to reportlab-users
Hi Lennart,

I looked at this problem recently and concluded I didn't have enough stamina to work through all the problems. If you
have then I congratulate you. I will try and get some tests done over the next few days or weeks.

On 11/01/2022 19:55, Lennart Regebro via reportlab-users wrote:
> I have implemented a new feature of tables, that enables tables which
> have cells that are too tall to fit on one page to still be rendered to
> PDF,
> which is a behavior that would make things a lot easier for us, as we
> sometimes have tables that have long texts in them.
>
........--
Robin Becker
_______________________________________________
reportlab-users mailing list
reportl...@lists2.reportlab.com
https://pairlist2.pair.net/mailman/listinfo/reportlab-users

Lennart Regebro via reportlab-users

unread,
Jan 13, 2022, 8:24:40 AMJan 13
to Robin Becker, Lennart Regebro, reportlab-users
On Thu, Jan 13, 2022 at 11:33 AM Robin Becker <ro...@reportlab.com> wrote:
Hi Lennart,

I looked at this problem recently and concluded I didn't have enough stamina to work through all the problems. If you
have then I congratulate you.

It did take a long time. I *hope* I found and fixed all the problems.

Robin Becker

unread,
Jan 28, 2022, 7:44:21 AMJan 28
to Lennart Regebro, reportlab-users
Hi Lennart,

I tried applying this patch and running through our usual comparison tests.

I see immediately that a downstream failure test fails (ie a pdf was produced when it should have errored out). We have
tests in rml which can use a fairly complex backtrack to determine exactly which but of xml produced the error.

Looked in the patch and see that it defaults to splitInRow=1. I changed that to 0 and expected it would stop any change
in behaviour. However, I am still seeing a success when I expect a failure.

I made a zip containing the differences I see in before and after pdf for the reportlab tests; see

https://www.reportlab.com/ftp/lennart-oversize-cells-diffs-20220128-0.zip

for the page level differences. These were made with the default for splitInRow=0.

The rml file that should fail is

#######################################################
<?xml version="1.0" encoding="utf-8" standalone="no" ?>
<!DOCTYPE document SYSTEM "rml_1_0.dtd">
<!--expect:rowHeights='150cm'> (Table) @ 12:49 - 14:15-->
<document filename="test_054_blockTable_run_error.pdf">
<template pageSize="(595, 842)" leftMargin="72" showBoundary="1">
<pageTemplate id="main">
<frame id="second" x1="35" y1="45" width="525" height="590"/>
</pageTemplate>
</template>
<stylesheet/>
<story>
<blockTable colWidths="3cm" rowHeights="150cm">
<tr><td/></tr>
</blockTable>
</story>
</document>
#######################################################

which is a row of height 150cm inside frame of height 590pts. It should fail with a layout error. I can probably
recreate in simple reportlab, but as we have other unexpected differences in simple reportlab I guess those need
correction first.

Lennart Regebro via reportlab-users

unread,
Feb 1, 2022, 11:34:57 AMFeb 1
to Robin Becker, Lennart Regebro, reportlab-users
OK, I got a small bit wrong in the logic of when to split in each way, fixed now, new patch attached.

I guess rml should support a splitInRow attribute on tables as well, but I can't figure out how that works. The only mention I can find of splitByRow in rlextra is in the rml.dtd. If I add splitInRow there, the parser doesn't complain, but it doesn't get passed to the table...
splitInRow.diff

Robin Becker

unread,
Feb 2, 2022, 8:40:59 AMFeb 2
to Lennart Regebro, reportlab-users
On 01/02/2022 16:34, Lennart Regebro wrote:
> OK, I got a small bit wrong in the logic of when to split in each way,
> fixed now, new patch attached.
>
> I guess rml should support a splitInRow attribute on tables as well, but I
> can't figure out how that works. The only mention I can find of splitByRow
> in rlextra is in the rml.dtd. If I add splitInRow there, the parser doesn't
> complain, but it doesn't get passed to the table...
>
> On Fri, Jan 28, 2022 at 1:44 PM Robin Becker <ro...@reportlab.com> wrote:
>
.......
I think I can do the rml part if reportlab is working. I guess the splitInRow value should be passed into the underlying
table. I'll take a look at the new patch in a couple of days.

Lennart Regebro via reportlab-users

unread,
Feb 2, 2022, 9:05:34 AMFeb 2
to Robin Becker, Lennart Regebro, reportlab-users
Sounds good!

Robin Becker

unread,
Feb 7, 2022, 5:10:54 AMFeb 7
to Lennart Regebro, For users of Reportlab open source software
Hi Lennart,

I still get a small number of visual differences and behaviour when I change the splitInRow to 0 which should be the
default (ie no behaviour change).

I will try and breakout the simplest problem I see and create a small example.

On 01/02/2022 16:34, Lennart Regebro wrote:
> OK, I got a small bit wrong in the logic of when to split in each way,
> fixed now, new patch attached.
>
> I guess rml should support a splitInRow attribute on tables as well, but I
> can't figure out how that works. The only mention I can find of splitByRow
> in rlextra is in the rml.dtd. If I add splitInRow there, the parser doesn't
> complain, but it doesn't get passed to the table...
.............

Robin Becker

unread,
Feb 17, 2022, 6:34:06 AMFeb 17
to Lennart Regebro, reportlab-users
On 01/02/2022 16:34, Lennart Regebro wrote:
> OK, I got a small bit wrong in the logic of when to split in each way,
> fixed now, new patch attached.
>
........

Hi Lennart, I finally got around to making a small example that illustrates the splitInrow=0 differences.

You can see the patched (with splitInrow=0) and unpatched results here

https://www.reportlab.com/ftp/lennart/

the generating script is lennart-example.py and the two pdfs illustrate behaviour unpatched and patched with
splitInRow=0 as an argument.

I changed the your patch so by default it starts with splitInRow = 0. I don't think we should change behaviour at all
unless it's done explicitly.

To my eye it looks as though the patch has changed the behaviour regarding space around the split table.

Lennart Regebro via reportlab-users

unread,
Feb 17, 2022, 6:55:56 AMFeb 17
to Robin Becker, Lennart Regebro, reportlab-users
Yeah, definitely looks like somethings up with the spacing there. I should be able to take a look at this this week! Thanks!

Lennart Regebro via reportlab-users

unread,
Feb 21, 2022, 10:15:11 AMFeb 21
to Robin Becker, Lennart Regebro, reportlab-users
OK, I solved that problem, indeed, space got incorrectly added after the split.

I also changed the default to not split in rows, as you preferred.
splitInRow-v3.diff

Robin Becker

unread,
Feb 22, 2022, 4:26:36 AMFeb 22
to Lennart Regebro, reportlab-users
On 21/02/2022 15:14, Lennart Regebro wrote:
> OK, I solved that problem, indeed, space got incorrectly added after the
> split.
>
> I also changed the default to not split in rows, as you preferred.
>
.......

thanks Lennart I will fling it through some testing

Robin Becker

unread,
Feb 23, 2022, 5:51:17 AMFeb 23
to Lennart Regebro, reportlab-users
Hi Lennart,

thanks again for the effort there's good and bad news.

Good
1) the default setting splitInRow=0 doesn't seem to make any changes to the many rendered PDF's we compare for
unexpected changes.

2) the simple splits seem to work OK with either simple string or flowable list cell content.

Bad
1) the split return wrongly adjusts the styling of the second part of the split. You can see this
in the example here https://www.reportlab.com/ftp/lennart/lennart-example2-1-1-36.pdf
which was made using https://www.reportlab.com/ftp/lennart/lennart-example2.py
ie python lennart-example2.py --splitInRow=1 --split=36
We have quite a bit of code that adjusts part 2 style definitions on the assumption that
the split happens at a row boundary. I see that you tried to adjust eg
if (sr<n and er>=n) or (doInRowSplit and sr==n):
but the example shows it didn't work/is not applied for some cases.

2) The cell split seems to have no lower limit. In most cases we would not want to leave a very small part of a table

at the end of a page so presumable we would only want to split a row if it exceeds a specific height. That would
normally be the height of a frame or page. My guess is we could adjust this patch to use the value of splitInRow as
a suitable lower limit on row height for splitting. I think there are really two kinds of issue here ie when a row
should split and where in the row should the split happen.

Untested
1) I guess this has not been tested at all with spanned rows/columns, but I haven't tried as yet.

2) I haven't tried very long cells that would require two or more splits.

3) I haven't checked to see if vertical alignment makes any real difference in split cells, but I think your
added test does show that it seems to do the right thing.

If we can fix these it will be a big step forward even if the spanned cases don't work.


On 21/02/2022 15:14, Lennart Regebro wrote:
> OK, I solved that problem, indeed, space got incorrectly added after the
> split.
>
> I also changed the default to not split in rows, as you preferred.
>
.......

Lennart Regebro via reportlab-users

unread,
Mar 7, 2022, 11:13:45 AMMar 7
to Robin Becker, Lennart Regebro, reportlab-users
On Wed, Feb 23, 2022 at 11:51 AM Robin Becker <ro...@reportlab.com> wrote:
Bad
  1) the split return wrongly adjusts the styling of the second part of the split.

Yes, I had some insights in the handling of commands when looking at this, and rewrote that bit to be much nicer, and also now handle the _srflcmds, which I had completely missed before.
I added tests to exercise this (and the earlier split tests) into the new test_table_inrowsplit.py, where I have also added your edge cases. I added the current output of that file.
 
  2) The cell split seems to have no lower limit. In most cases we would not want to leave a very small part of a table

     at the end of a page so presumable we would only want to split a row if it exceeds a specific height. That would
     normally be the height of a frame or page. My guess is we could adjust this patch to use the value of splitInRow as
     a suitable lower limit on row height for splitting.

So, the minimal "split height" will be one row of text for text cells, and whatever the flowable wants for flowables. We can use splitInRow for that as well. I now implemented it slightly differently, the splitInRow value will be used for a minimum "end of table" height, so that we don't get just one row of a table on a separate page. I think that's called an "orphan" when it comes to paragraphs, right? I don't know if you think that makes sense, I can change it to just be a general minimum of a cell.
 
Untested
   1) I guess this has not been tested at all with spanned rows/columns, but I haven't tried as yet.

I added a test for this.

   2) I haven't tried very long cells that would require two or more splits.

Added tests for this.

   3) I haven't checked to see if vertical alignment makes any real difference in split cells, but I think your
      added test does show that it seems to do the right thing.

Yeah, it won't be pixel-perfect, I think, but at least something in the vicinity.

// Lennart
splitInRow-v4.diff
test_table_inrowsplit.pdf

Robin Becker

unread,
Mar 11, 2022, 7:33:38 AMMar 11
to Lennart Regebro, reportlab-users
Hi Lennart,

tried the latest patch and it seems better for backgrounds etc etc. See the attached test script which I used to see
what was happening visually some examples are seen with

python lennart-example.py --split=36 --splitByRow=1 --splitInRow=1 --pdfn=sss.pdf
python lennart-example.py --split=58 --splitByRow=1 --splitInRow=1 --pdfn=sss.pdf
python lennart-example.py --split=58 --splitByRow=0 --splitInRow=1 --pdfn=sss.pdf

I haven't run this version entirely though the tests yet, but I will try over the weekend.

As for small tables:
widow is end of something alone at the start of a page. A widow has a past, but no future.
orphan is start of something alone at the end of a page. An orphan has no past, but has a future.
....
thanks again for the effort
--
Robin Becker
lennart-example.py
sss.pdf

Lennart Regebro via reportlab-users

unread,
Apr 12, 2022, 6:13:16 AMApr 12
to Robin Becker, Lennart Regebro, reportlab-users
Hi! Have you been able to look more at this?

I'm feeling pretty good about the patch now, with added tests and better command handling.

splitInRow-v5.diff
test_table_inrowsplit.pdf

Robin Becker

unread,
Apr 19, 2022, 5:45:31 AMApr 19
to Lennart Regebro, reportlab-users
Hi Lennart,

I put your data for the spanning case into my test program which is attached; I can use the following commands to see
the behavior


python lennart-example.py --splitByRow=0 --splitInRow=1 --split=55 --pdfn=ttt.pdf --useData=1

this works and splits the column spanned row 3 (row=2 in python). The cell is split OK, but it gets styles from the
splitfirst, splitlast styles. I'm not sure that's right for a split in the row when the split row is neither a first row
or last. I don't really know what the correct styling should be, but I think the style that would have applied in the
non-splitting case should be used.

If I use

python lennart-example.py --splitByRow=0 --splitInRow=1 --split=22 --pdfn=ttt.pdf --useData=1

then the split fails because it crosses the spanned row cell

On 12/04/2022 11:12, Lennart Regebro wrote:
> Hi! Have you been able to look more at this?
>
> I'm feeling pretty good about the patch now, with added tests and better
> command handling.
>
>
.......


--
Robin Becker
lennart-example.py

Lennart Regebro via reportlab-users

unread,
Apr 19, 2022, 8:37:18 AMApr 19
to Robin Becker, Lennart Regebro, reportlab-users
On Tue, Apr 19, 2022 at 11:45 AM Robin Becker <ro...@reportlab.com> wrote:
python lennart-example.py --splitByRow=0 --splitInRow=1 --split=55 --pdfn=ttt.pdf --useData=1

this works and splits the column spanned row 3 (row=2 in python). The cell is split OK, but it gets styles from the
splitfirst, splitlast styles. I'm not sure that's right for a split in the row when the split row is neither a first row
or last.

Well, it becomes a last and a first row... So that behaviour was simply what happened, and is the same as if it was splitByRow, so it made sense to me to just keep it. But certainly it's possible to change so that the lastrow and firstrow commands aren't applied in this case. I don't really know what the usecase is for such styles, it seemed to me that they were there to show that this was not the top or bottom of the table, but continuations, and then it makes sense in this case as well, in my opinion. But maybe the usecase is different?
 
If I use

python lennart-example.py --splitByRow=0 --splitInRow=1 --split=22 --pdfn=ttt.pdf --useData=1

then the split fails because it crosses the spanned row cell

Hmm, yes, it tries to split in the second row, fails because it's only one line high, so it can't be split, and then falls back to splitByRow, which it doesn't because of the span. So that's a specific edge case which I could try to support if you want? But it's a mix of inside and inbetween splits, so I would have to add special handling just for that. I think I would have to, instead of failing in this case, check if there is any spans, and in that case split only those, and then do a byRowSplit. It's doable, but possibly opens up another can of worms.

//Lennart
 

Robin Becker

unread,
Apr 20, 2022, 4:14:49 AMApr 20
to Lennart Regebro, reportlab-users
Hi Lennart,

I take your point on the splitfirst/splitlast styling; I cannot actually remember the need for those styling indicators,
so perhaps they should be left as is.

I certainly wouldn't want to add splitfirstrow/splitlastrow as overrides.

On the span splitting I suppose the only way to proceed is to make the content more splittable in some way. I tried
changing the split position up and down in row 2 (python row 1), but could not get a split even with multiple rows in
all the cells see eg the attached
.......
--
Robin Becker
ttt.pdf

Robin Becker

unread,
Apr 20, 2022, 5:04:16 AMApr 20
to Lennart Regebro, reportlab-users
Hi Lennart,

I did some investigation of the problems I saw in that last example (see attached new version of the script useData==2).

It seems that the real problem is that because of the spanned cell in column 0 row 0 that in _splitRows @ 1565

n=self._getFirstPossibleSplitRowPosition(availHeight)

returns a value of 0 for the first possible split row position.

Then at line 1633 we get

n=0 usedHeights=0 cellvalues=['A\nB\nC\nD', 'BBBBB', 'C', 'D', 'E']
curRowHeight=18 minSplit=15.0 maxSplit=15.0
minSplit + maxSplit > curRowHeight=True
minSplit > (availHeight - usedHeights)=False

I assume that because of the spanned rows we are just using the first (n=0) row to work out the minSplit/maxSplit and
then the failure follows from the first test.

1) It's not obvious what n should be in the splitInRow row span case.
2) It seems the splitInRow case requires us to consider far more than the height of one row.
--
Robin Becker
lennart-example.py

Lennart Regebro via reportlab-users

unread,
Apr 20, 2022, 7:39:02 AMApr 20
to Robin Becker, Lennart Regebro, reportlab-users
Right, the current code first tries to split in the non-spanned rows, and fails (because they are only one line high and can't be split), and then it tries to splitByRow, but then the spanned row prevents that. In this case, since we have a span we could split that spanned row, and then do a splitByRow.

I'll look into that, maybe it will be easy.

Lennart Regebro via reportlab-users

unread,
May 13, 2022, 8:13:12 AMMay 13
to Robin Becker, Lennart Regebro, reportlab-users
Hi, sorry for the delay in this, I was on vacation.

Splitting spanned cells turns out to be not easy at all. Well, the splitting was no problem, it's dealing with the styles.

I believe I have found an edge case in Reportlab which I'm not sure if it's intended or not, and fixing it will require quite a major refactoring, so I would like your opinion on this. Basically, you can color a spanned cell only partially in current Reportlab, but supporting splitting such a cell leads to some weirdness.

If you set the background color only on the starting cell for a spanned cell, then that whole cell will get that background color. But if you set a background color on only some out of the rows, only that part of the cell will get colored. If you then split that cell, then what parts of that split cell are colored can change,

Fixing this would require calculating the colors or all cells before a split, and then, when the table is drawn, not doing any such "recalculation". But the first question is: Is this really supported? Is that how it is supposed to work? That you can set partial background colors on spanned cells?

Attaching PDF that demonstrates this. I accidentally overwrote the code used to generate it, though. *facepalm*

//Lennart

test_edgecase_1.pdf

Robin Becker

unread,
May 14, 2022, 3:49:16 AMMay 14
to For users of Reportlab open source software
On 13/05/2022 13:12, Lennart Regebro wrote:
> Hi, sorry for the delay in this, I was on vacation.
>
> Splitting spanned cells turns out to be not easy at all. Well, the
> splitting was no problem, it's dealing with the styles.
>
> I believe I have found an edge case in Reportlab which I'm not sure if it's
> intended or not, and fixing it will require quite a major refactoring, so I
> would like your opinion on this. Basically, you can color a spanned cell
> only partially in current Reportlab, but supporting splitting such a cell
> leads to some weirdness.
>
> If you set the background color only on the starting cell for a spanned
> cell, then that whole cell will get that background color. But if you set a
> background color on only some out of the rows, only that part of the cell
> will get colored. If you then split that cell, then what parts of that
> split cell are colored can change,
>
> Fixing this would require calculating the colors or all cells before a
> split, and then, when the table is drawn, not doing any such
> "recalculation". But the first question is: Is this really supported? Is
> that how it is supposed to work? That you can set partial background colors
> on spanned cells?
>
> Attaching PDF that demonstrates this. I accidentally overwrote the code
> used to generate it, though. *facepalm*
>
> //Lennart
>
.........
Hi Lennart, hope you enjoyed the vacation. Looking at your output I believe you have in fact found a bug. I had little
to do with the original spanned columns/rows implementation, but I know it's a source of many bugs.

I don't believe the intent was ever to allow partial styling and I haven't seen this before. It seems to me that in a
span range we should consider the spanned cells to be absent. That would mean only the start cell of a span would
contain content and or styling. If I can work up your example I may see how to avoid the partial styling. If your
example actually shows a row span being split then we're almost there. Even without the partial styling fix I would
always say the the intent is that the styles should cover the cells and the partial case is a bug.

If we are able to avoid putting in content for the 'absent' spanned cells we should be able to ignore their styles.

Robin Becker

unread,
May 14, 2022, 3:56:10 AMMay 14
to Lennart Regebro, reportlab-users
On 13/05/2022 13:12, Lennart Regebro wrote:
> Hi, sorry for the delay in this, I was on vacation.
>
> Splitting spanned cells turns out to be not easy at all. Well, the
> splitting was no problem, it's dealing with the styles.
>
> I believe I have found an edge case in Reportlab which I'm not sure if it's
> intended or not, and fixing it will require quite a major refactoring, so I
> would like your opinion on this. Basically, you can color a spanned cell
> only partially in current Reportlab, but supporting splitting such a cell
> leads to some weirdness.
>
for what it's worth I don't think we need to 'fix' the styling edge case as it stands; if people really want to colour
part of a span range then let them, but we can caution that if the cell needs splitting that won't always work
correctly. The cell split is already a rare case.

> If you set the background color only on the starting cell for a spanned
> cell, then that whole cell will get that background color. But if you set a
> background color on only some out of the rows, only that part of the cell
> will get colored. If you then split that cell, then what parts of that
> split cell are colored can change,
>
> Fixing this would require calculating the colors or all cells before a
> split, and then, when the table is drawn, not doing any such
> "recalculation". But the first question is: Is this really supported? Is
> that how it is supposed to work? That you can set partial background colors
> on spanned cells?
>
> Attaching PDF that demonstrates this. I accidentally overwrote the code
> used to generate it, though. *facepalm*
>
> //Lennart
..........

Lennart Regebro via reportlab-users

unread,
May 16, 2022, 7:34:20 AMMay 16
to reportlab-users, Lennart Regebro
I'm glad to hear I don't need to support this case. Yes, the span splitting seems to work now, but the handling of styling isn't 100% yet.

I redid and attached the test I sent on Friday, if you want to look into that behavior, but yes, I agree it might not really need fixing, I can't imagine it's a common usecase.

//Lennart
test_table_bug.pdf
test_table_bug.py

Robin Becker

unread,
May 16, 2022, 11:44:01 AMMay 16
to reportlab-users
On 16/05/2022 12:34, Lennart Regebro via reportlab-users wrote:
> I'm glad to hear I don't need to support this case. Yes, the span splitting
> seems to work now, but the handling of styling isn't 100% yet.
>
> I redid and attached the test I sent on Friday, if you want to look into
> that behavior, but yes, I agree it might not really need fixing, I can't
> imagine it's a common usecase.
>
> //Lennart
>
.........
Hi Lennart,

not sure exactly what's wrong, but I don't seem to have a split in the spanned cell. Is there another version of the patch?

Lennart Regebro via reportlab-users

unread,
May 23, 2022, 6:18:02 AMMay 23
to Robin Becker, Lennart Regebro, reportlab-users
OK, that was a lot of work, but now I can't find more edge cases...

v6 of the patch attached
splitInRow-v6.diff
test_table_inrowsplit.pdf

Robin Becker

unread,
May 24, 2022, 5:28:30 AMMay 24
to Lennart Regebro, reportlab-users
Hi Lennart,

thanks for the new patch. I assume everyone will recognize you as genius if it all works :)

I will try and test over the upcoming weekend.

Again many thanks for the effort


On 23/05/2022 11:17, Lennart Regebro wrote:
> OK, that was a lot of work, but now I can't find more edge cases...
>
> v6 of the patch attached
>
> On Sat, May 14, 2022 at 9:56 AM Robin Becker <ro...@reportlab.com> wrote:
>

Lennart Regebro via reportlab-users

unread,
May 24, 2022, 1:53:56 PMMay 24
to Robin Becker, Lennart Regebro, reportlab-users
On Tue, May 24, 2022 at 11:28 AM Robin Becker <ro...@reportlab.com> wrote:
Hi Lennart,

thanks for the new patch. I assume everyone will recognize you as genius if it all works :)

If that was true I would not have to make spreadsheets to figure out the logic when I can't get all the cases to work at the same time. :-)

image.png

Andy Robinson

unread,
May 24, 2022, 3:14:27 PMMay 24
to reportlab-users, Robin Becker
Lennart, these are impressive - can we include them in the Release Notice in due course?

When you guys are happy, this warrants a major release number and a lot of celebration...thanks for all of your efforts...

- Andy

Lennart Regebro

unread,
May 25, 2022, 3:04:14 AMMay 25
to reportl...@googlegroups.com, reportlab-users, Robin Becker
On Tue, May 24, 2022 at 9:14 PM Andy Robinson <an...@reportlab.com> wrote:
Lennart, these are impressive - can we include them in the Release Notice in due course?

Sure, no problem. I didn't expect them to make sense to anyone but me, though. :-)

Robin Becker

unread,
May 26, 2022, 7:14:45 AMMay 26
to Lennart Regebro, reportlab-users
Hi Lennart, I guess genius status is deferred a bit. Whenn I apply the patch I find that we are getting an error related
to deepcopy of the cell values in the script docs/genAll.py which creates reportlab-userguide.pdf

> (.py310) robin@minikat:~/devel/reportlab/REPOS/reportlab/docs
> $ python genAll.py
> "/home/robin/devel/reportlab/.py310/bin/python" genuserguide.py
> Built story contains 1842 flowables...
> .........
> File "/home/robin/devel/reportlab/reportlab/platypus/tables.py", line 2032, in split
> result = self._splitRows(availHeight, doInRowSplit=not self.splitByRow)
> File "/home/robin/devel/reportlab/reportlab/platypus/tables.py", line 1589, in _splitRows
> data = deepcopy(self._cellvalues)
> .......> File "/home/robin/devel/reportlab/.py310/lib/python3.10/copy.py", line 161, in deepcopy
> rv = reductor(4)
> TypeError: cannot pickle '_io.BufferedReader' object



I did a quick debug and find that the problem is caused by one of the cell values being/containing an image eg


>> /home/robin/devel/reportlab/.py310/lib/python3.10/copy.py(165)deepcopy()
> -> raise
> (Pdb) p reductor
> <built-in method __reduce_ex__ of _io.BufferedReader object at 0x7fb33f846350>
> (Pdb) p x
> <_io.BufferedReader name='../images/replogo.gif'>

the code that fails is I think a bit like this

I = Image('../images/replogo.gif')
P0 = Paragraph(.......
P = Paragraph(......
data= [['A', 'B', 'C', P0, 'D'],
['00', '01', '02', [I,P], '04'],
['10', '11', '12', [P,I], '14'],
['20', '21', '22', '23', '24'],
['30', '31', '32', '33', '34']]
t=Table(data,style=[......

I suppose we have either have to make ImageReader support a deepcopy or defer / change the deepcopy. In this case I
don't think the splitInRow is in operation so previously we didn't create a deepcopy. Deferring to the special
splitInRow case should save us time in the normal case, but obviously we would like to support the special cases.

First off it seems overkill to deepcopy the whole data. Previously the rows went either into one table or its successor
no copying needed. Now I assume the problem is that we are splitting cells and want to have two separate versions of the
same cells.

It seems reasonable to me that the cells / styles that get split must be created in two parts one for the top part of
the table and the second for the lower table. Since these must be new I don't see where the copy comes in.

I will have to think about the ImageReader problem as it's the most likely problem causer; in addition there are far too
many options which attempt to save storage reduce open files etc etc etc. I think years ago we worried more about
storage so defrred reading the whole image into memory. The deferral caused problems with too many open files etc etc.
It all needs cleaning up.

Of course the real issue here is that we don't require any of reportlab content to be immutable. A lot of things are
made harder because of that.

Lennart Regebro via reportlab-users

unread,
May 26, 2022, 8:11:40 AMMay 26
to Robin Becker, Lennart Regebro, reportlab-users
Uh, yeah, I suddenly realized that the problem I solved with copying the table must be solved later down in the standard split code, so that must be entirely unnecessary.

I'll fix that.

Robin Becker

unread,
May 26, 2022, 8:28:21 AMMay 26
to Lennart Regebro, reportlab-users
Hi again Lennart,

I made some progress cleaning up the ImageReader class and now I can generate the userguide. However, that allows
another case to fail which is in

> python test_platypus_accum.py
> .E
> ======================================================================
> ERROR: test2 (__main__.TablesTestCase
> ......
> File "/home/robin/devel/reportlab/reportlab/platypus/tables.py", line 2032, in split
> result = self._splitRows(availHeight, doInRowSplit=not self.splitByRow)
> File "/home/robin/devel/reportlab/reportlab/platypus/tables.py", line 1589, in _splitRows
> data = deepcopy(self._cellvalues)
> ......
> File "/home/robin/devel/reportlab/.py310/lib/python3.10/copyreg.py", line 101, in __newobj__
> return cls.__new__(cls, *args)
> TypeError: onDrawStr.__new__() missing 2 required positional arguments: 'onDraw' and 'label'

not sure I understand this error; something to do with PageAccumulator goes wrong.

On 26/05/2022 13:11, Lennart Regebro wrote:
> Uh, yeah, I suddenly realized that the problem I solved with copying the
> table must be solved later down in the standard split code, so that must be
> entirely unnecessary.
>
> I'll fix that.
>
.........--

Lennart Regebro via reportlab-users

unread,
May 26, 2022, 9:07:32 AMMay 26
to Robin Becker, Lennart Regebro, reportlab-users
OK, here is v7 that doesn't do a deepcopy.
splitInRow-v7.diff
test_table_inrowsplit.pdf

Robin Becker

unread,
May 27, 2022, 7:01:56 AMMay 27
to Lennart Regebro, reportlab-users
On 26/05/2022 14:07, Lennart Regebro wrote:
> OK, here is v7 that doesn't do a deepcopy.
>
> On Thu, May 26, 2022 at 1:14 PM Robin Becker <ro...@reportlab.com> wrote:
>
........
Thanks Lennart,

I'll try again later today.

Robin Becker

unread,
May 27, 2022, 11:53:44 AMMay 27
to Lennart Regebro, reportlab-users
Hi Lennart,

I ran the patch through our test render mechanism; it generates all the pdfs and then uses a jpeg conversion to allow
automatic comparison. Ideally we should have no differences in most of the pdfs because inRowSplit=0 by default.

Unfortunately we see these differences which are not expected

> reportlab-userguide-page0061
> reportlab-userguide-page0062
> reportlab-userguide-page0063
> reportlab-userguide-page0064
> reportlab-userguide-page0078
> reportlab-userguide-page0079
> reportlab-userguide-page0087
> reportlab-userguide-page0088
> reportlab-userguide-page0089
> reportlab-userguide-page0106
> reportlab-userguide-page0107
> reportlab-userguide-page0108
> reportlab-userguide-page0109
> reportlab-userguide-page0110
> reportlab-userguide-page0112
> reportlab-userguide-page0113
> reportlab-userguide-page0116
> reportlab-userguide-page0117
> test_platypus_tables_2-page0006
> test_platypus_tables_2-page0007

Even though the image differences are found I cannot see differences easily by eye. However, I think the problem is that
some of the lines are either the wrong width or slightly differently positioned/styled.

I placed the pdfs here https://www.reportlab.com/ftp/lennart/ from scanning through the image differences using my image
blinker I see obvious change at reportlab-userguide-page0063/reportlab-userguide-page0087 (horizontal line at the page
bottom) and reportlab-userguide-page0107 (horizontal line at the page top). I see something similar for
test_platypus_tables_2-page0006 & test_platypus_tables_2-page0007 (several horizontals).

Robin Becker

unread,
May 29, 2022, 5:43:11 AMMay 29
to Lennart Regebro, reportlab-users
On 26/05/2022 14:07, Lennart Regebro wrote:
> OK, here is v7 that doesn't do a deepcopy.
>
........

Hi Lennart,

I did some more analysis on the minute differences I am seeing. I produced a failure test here


https://www.reportlab.com/ftp/lennart/ttablelines.py

this produces a simple table with various styles and then checks to see how it can be split.

The test goes

1) t=Table(....)
2) lst_add(t) #add to the story
3) tc = copy.deepcopy(t) if os.environ.get('DEEPCOPY','0')=='1' else t
4) for s in tc.split(4*inch,30): lst_add(s) #add the tow split bits

As a check I monkeypatch the canvas used for build so all the line commands get printed

I then find differences between the lines drawn when tc is t ie no deepcopy of t is done.
If a deepcopy is done then I see no differences between the standard and patched lines.

As a check I also tried comparing the output lines between standard no copy and standard deepcopy cases, but there were
no differences in the lines drawn for those cases.

I think the implication is that for this simple split case the patched code is making some change to the original table.

Since we are assumed to be making two completely fresh tables from the original and the rows are disjoint we ought not
to make any change to the structure of the original.

At present I don't know if the lines that are added/changed etc belong to the first table, the two split ones or all three.

I can probably make up some mechanism to print out a message when the tables are drawn so we know when each starts in
the list of lines. I think a monkepatch on the Table.draw will allow us to do that.