Apostrophe causes new column in PYFPDF

188 views
Skip to first unread message

Richard Warg

unread,
Oct 6, 2014, 6:10:25 PM10/6/14
to web...@googlegroups.com
I am using the sample report from https://code.google.com/p/pyfpdf/wiki/Web2Py#Sample_Report.

It works as expected until I have an apostrophe or ampersand in the string.

ORIGINAL
 # create several rows:
    rows = []
    for i in range(1000):
        col = i % 2 and "#F0F0F0" or "#FFFFFF"
        rows.append(TR(TD("Rows %s" %i),
                       TD("Something", _align="center"),
                       TD("%s" % i, _align="right"),
                       _bgcolor=col)) 

Yields a table similar to this:

Row 1      Something       1
Row 2      Something       2
Row 3      Something       3
          ...

When I add an apostrophe(') to the string, the columns break at the apostrophe
CHANGED
 # create several rows:
    rows = []
    for i in range(1000):
        col = i % 2 and "#F0F0F0" or "#FFFFFF"
        rows.append(TR(TD("Row's %s" %i),
                       TD("Something", _align="center"),
                       TD("%s" % i, _align="right"),
                       _bgcolor=col)) 

Yields a table similar to this:

Row        s 1             Something       
Row        s 2             Something       
Row        s 3             Something       
          ...

I looked for a setting to disable the 'magic quotes' but found nothing. The same behavior results if I insert an ampersand(&).  The text that I am creating contains many words like it's, who's, something's, someone's, and so on.

Dave S

unread,
Oct 8, 2014, 7:45:30 PM10/8/14
to web...@googlegroups.com

Hi, Richard ...


On Monday, October 6, 2014 3:10:25 PM UTC-7, Richard Warg wrote:
I am using the sample report from https://code.google.com/p/pyfpdf/wiki/Web2Py#Sample_Report.

It works as expected until I have an apostrophe or ampersand in the string.

[...]
 
        rows.append(TR(TD("Row's %s" %i),
                       TD("Something", _align="center"),
                       TD("%s" % i, _align="right"),
                       _bgcolor=col)) 

In your example, you are entering the string manually ... and I think you're running into Python string behavior.  I think you need to escape the apostrophe (\' or ''') when you do it this way.  But how are you getting the "real" text contents that you want to display?  Are you reading it from a file?

/dps

Massimo Di Pierro

unread,
Oct 9, 2014, 12:31:02 AM10/9/14
to web...@googlegroups.com
I do not think that is the problem. A single quote between double quotes is allowed.

Leonel Câmara

unread,
Oct 9, 2014, 7:47:17 AM10/9/14
to web...@googlegroups.com
Is the file where this code is in utf-8? I don't think it is. If you replace "Row's %s" %i  with (u"Row's %s" %i).encode('utf-8')  Does it work?  Better yet just make sure to save the file with encoding utf-8 and change nothing.                                                  

If it still doesn't work then I'm out of ideas and you can just escape the apostrophe using '





 

Richard Warg

unread,
Oct 10, 2014, 9:00:49 AM10/10/14
to web...@googlegroups.com
Here are some additional observations-
it only fails in tables, either inside a <td></> or TD ().
the actual application is using text from a database column.
RTF and CSV output conversions work as expected with the same data. I suspect an issue with the pdf table conversion code.

Massimo Di Pierro

unread,
Oct 10, 2014, 4:17:42 PM10/10/14
to web...@googlegroups.com
I am pretty sure this is a pyfpdf issue and you should report it to the maintainers.

Leonel Câmara

unread,
Oct 11, 2014, 5:42:54 AM10/11/14
to web...@googlegroups.com
Ok, I have tested this and I have to say sorry for dismissing it as an encoding problem, the bug is definitely there but it's weirder than it looks.

I made the test with:

controller:
def test():
    response.view = 'generic.pdf'
    return {}

view test.html:
{{=TABLE(TR(TD("Row's %d" % 1, _width='30%'), TD("Something", _align="center", _width='30%'),TD("%d" % 1, _width='40%')))}}


And the problem was there, it had also converted the apostrophe to &#x27;s. This is web2py helpers doing it btw. 

If you change test.html to this:

<table>
    <tr>
        <td width="30%">Row's 1</td>
        <td width="30%" align="center">Something</td>
        <td width="40%">1</td>
    </tr>
</table>

Then the problem isn't there. This shows the problem is caused by the escaping being done by web2py helpers. It needs to be unescaped before going to the pdf.


While trying to find the problem I found yet another problem, things are being escaped without logic because this was what fpdf's HTMLMixin was getting

<table><tr><td width="30%">Row&amp;#x27;s 1</td><td align="center" width="30%">Something</td><td width="40%">1</td></tr>
</table>

Did anyone notice the &amp;? the &#x27; was escaped again to &amp;#x27; WTF? This seems like a bug with web2py's helpers as they shouldn't escape things that are already escaped.

Ignoring that bug there's a simple fix to be made in gluon/contrib/fpdf/html.py make HTMLMixin escape the hell out of whatever it gets.

class HTMLMixin(object):
    def write_html(self, text, image_map=None):
        "Parse HTML and convert it to PDF"
        h2p = HTML2FPDF(self, image_map)
        unescaped = h2p.unescape(text)
        while(unescaped != text):
            text = unescaped
            unescaped = h2p.unescape(text)        
        h2p.feed(text)


This finally solved the problem for me.

Richard Warg

unread,
Oct 14, 2014, 1:14:42 AM10/14/14
to web...@googlegroups.com

Thanks Professor,  I'll contact them soon.   Good luck on the promotion.  I know how difficult a hurdle that is. You merit the post for many reasons; passion, dedication, technical insight and responsiveness.

--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to a topic in the Google Groups "web2py-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/web2py/tOVmx3QJ5fo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to web2py+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Massimo Di Pierro

unread,
Oct 14, 2014, 6:16:35 PM10/14/14
to web...@googlegroups.com
I do not think web2py is escaping twice ever. The problem as to do with the input expected by pyfpdf. Can you please open a ticket with your solution and I will test it asap?

Leonel Câmara

unread,
Oct 15, 2014, 2:05:38 PM10/15/14
to
I have found the problem. It's not the HTML helpers that are escaping twice.

The problem is that gluon.contrib.generics.pyfpdf_from_html calls sanitize on the input.

>>> P("Row's").xml()
'<p>Row&#x27;s</p>'
>>> from gluon.sanitizer import sanitize
>>> sanitize('<p>Row&#x27;s</p>')
'<p>Row&amp;#x27;s</p>'
>>> sanitize('<p>Row&#x27;s</p>', escape=False)
'<p>Row&amp;#x27;s</p>'


This is where things are being escaped twice. So this seems like a bug in gluon.sanitizer.XssCleaner which shouldn't escape entities that are already escaped.

Even after this bug is corrected, we would still have to unescape the input at least in gluon.contrib.generics.pyfpdf_from_html before calling pdf.write_html since the HTMLParser in fpdf isn't doing it.

Massimo Di Pierro

unread,
Oct 17, 2014, 12:03:31 AM10/17/14
to web...@googlegroups.com
can you send a patch to Mariano (author of pyfpdf) and to me?

Leonel Câmara

unread,
Oct 17, 2014, 8:51:45 AM10/17/14
to web...@googlegroups.com
Sure, I'll try to do it.

Leonel Câmara

unread,
Oct 18, 2014, 7:42:40 AM10/18/14
to
Fix for web2py's side of the problem submitted:

This fixes web2py's escaping of '&#x27;'.

On to fpdf unquoting stuff.

[edit] Fix for fpdf

Richard Warg

unread,
Oct 18, 2014, 12:06:56 PM10/18/14
to web...@googlegroups.com

I haven't had time to dig into this myself, but the solution needs to also handle the following case:

{{mytext=" a is < than b & c is > b"}}
<tr><td> {{=mytext}}</td><td>This goes in column 2 </td></tr>

The '&', '<', and '>' should not start a new column.

Richard Warg

On Oct 17, 2014 5:51 AM, "Leonel Câmara" <leonel...@gmail.com> wrote:
Sure, I'll try to do it.

--
Reply all
Reply to author
Forward
0 new messages