Solved -- Problems with special characters and pyfpdf

9,957 views
Skip to first unread message

Bernardo

unread,
Sep 25, 2010, 7:35:30 AM9/25/10
to web2py-users
Hi all,

When using pyfpdf which comes with web2py framework, there are some
issues with special characters such as accented characters (á, é,
í, ...). After some research, I found out that pyfpdf just understands
'iso-8859-1', and web2py gives him the strings in 'utf-8' format.

So, as a solution, in your python code you just have to convert the
string before passing it to pyfpdf, like this:


txt = 'Hélló wórld'
utxt = unicode('txt', 'utf-8')
stxt = utxt.encode('iso-8859-1')
pdf.cell(50,20, stxt, 0, 2, 'L')


If anyone has any doubts, just ask. I hope this can help someone...

kind regards,
Bernardo

Christopher Steel

unread,
Apr 28, 2011, 11:08:27 PM4/28/11
to web2py-users

This solution works well. You will need to make a minor correction and
remove the single quotes around 'txt' in the second line. The edited
version looks like this:

txt = 'Hélló wórld'
utxt = unicode(txt, 'utf-8')


stxt = utxt.encode('iso-8859-1')
pdf.cell(50,20, stxt, 0, 2, 'L')


Thanks for the hint Bernardo!

C.


---------- Forwarded message ----------
From: Bernardo <estem...@gmail.com>
Date: Sep 25 2010, 7:35 am
Subject: Solved -- Problems with special characters and pyfpdf
To: web2py-users


Hi all,

When usingpyfpdfwhich comes with web2py framework, there are some
issues withspecialcharacterssuch as accentedcharacters(á, é,
í, ...). After some research, I found out thatpyfpdfjust understands


'iso-8859-1', and web2py gives him the strings in 'utf-8' format.

So, as a solution, in your python code you just have to convert the

string before passing it topyfpdf, like this:

Alexandre Andrade

unread,
Apr 28, 2011, 11:35:40 PM4/28/11
to web...@googlegroups.com
The same can be converted to a function, to make it easier:

def lt(str):
    return unicode(str,'utf-8').encode('iso-8859-1')


so just 

pdf.cell(50,20,lt('Helló Wórld'), 0,2,'L') 

2011/4/29 Christopher Steel <chris...@gmail.com>



--
Atenciosamente


Alexandre Andrade
Hipercenter.com Classificados Gratuitos

Christopher Steel

unread,
Apr 30, 2011, 12:19:36 AM4/30/11
to web2py-users
Thank you Alexandre, now I am feeling exceptionally lazy as well as
being highly satisfied with the end results and am therefore impelled
to add something (anything!) as well.

If you as lazy as I am, opps, I mean, if you want to keep all of your
pdf related imports and code in a single file(controller) but want to
"hide" the function you can start your conversion function name with
an underscore.

In addition the following "laziness enhanced' version of Alexandre's
function also uses an "explicit" version of 'cell' for folks who are
too lazy to memorize this http://code.google.com/p/pyfpdf/wiki/Cell ,
I mean, umm, for folks who would like to see an example using a more
explicit cell definition methods...

def _i2u(str):
'''
convert iso-8859-1
'''
return unicode(str,'utf-8').encode('iso-8859-1')

pdf.cell(w=97.5,h=9,txt=_i2u('Montréal
2011'),border='',ln=1,align='R',fill=0,link='')


; )

Chris

On 28 avr, 23:35, Alexandre Andrade <alexandrema...@gmail.com> wrote:
> The same can be converted to a function, to make it easier:
>
> def lt(str):
>     return unicode(str,'utf-8').encode('iso-8859-1')
>
> so just
>
> pdf.cell(50,20,lt('Helló Wórld'), 0,2,'L')
>
> 2011/4/29 Christopher Steel <chris.st...@gmail.com>

Jurgis Pralgauskis

unread,
Apr 26, 2013, 3:20:30 PM4/26/13
to
Hi, 

but this seems to work not for all unicode characters
like if I have "Ąžuolas"

u"Ąžuolas".encode('iso-8859-1')   gives error :/
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256)

I also posted on hosting forum looking for TTF solution

Jurgis Pralgauskis

unread,
Apr 26, 2013, 4:17:41 PM4/26/13
to web...@googlegroups.com
ok, SOLVED ttf issue for  unicode example http://code.google.com/p/pyfpdf/wiki/Unicode
just needed to create directory    gluon > contrib > fpdf > font
and place needed ttf files insited it :)
then pdf.write(8, u"Ąžuolas")  works fine 

BUT - how to make it work with     write_html(...) ?

write_html( str(P( u"Ąžuolas" ))  #  produces "Ąžuolas"  

pdf.write_html( u'Ąžuolas'.encode('utf8') )  # also  "Ąžuolas"  

pdf.write_html( u'Ąžuolas' )   gives error

File "/home/jurgis/web2py/applications/apskaitele/controllers/default.py", line 59, in pdftest
pdf.write_html( u'Ąžuolas' )
File "/home/jurgis/web2py/gluon/contrib/fpdf/html.py", line 397, in write_html
h2p.feed(text)
File "/usr/local/lib/python2.7/HTMLParser.py", line 114, in feed
self.goahead(0)
File "/usr/local/lib/python2.7/HTMLParser.py", line 152, in goahead
if i < j: self.handle_data(rawdata[i:j])
File "/home/jurgis/web2py/gluon/contrib/fpdf/html.py", line 122, in handle_data
self.pdf.write(self.h,txt)
File "/home/jurgis/web2py/gluon/contrib/fpdf/fpdf.py", line 822, in write
txt = self.normalize_text(txt)
File "/home/jurgis/web2py/gluon/contrib/fpdf/fpdf.py", line 1012, in normalize_text
txt = txt.encode('latin1')
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256)

pdf.write_html( u'Ąžuolas'.decode('utf8') )

2013 m. balandis 26 d., penktadienis 22:03:41 UTC+3, Jurgis Pralgauskis rašė:
Hi, 

but this seems to work not for all unicode characters
like if I have "Ąžuolas"

u"Ąžuolas".encode('iso-8859-1')   gives error :/
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256)

I also posted on hosting forum looking for TTF solution

2011 m. balandis 29 d., penktadienis 06:35:40 UTC+3, Alexandre Andrade rašė:

Jurgis Pralgauskis

unread,
Apr 26, 2013, 4:25:03 PM4/26/13
to web...@googlegroups.com

Jonathan Lundell

unread,
Apr 26, 2013, 5:34:59 PM4/26/13
to web...@googlegroups.com
On 26 Apr 2013, at 1:17 PM, Jurgis Pralgauskis <jurgis.pr...@gmail.com> wrote:
ok, SOLVED ttf issue for  unicode example http://code.google.com/p/pyfpdf/wiki/Unicode
just needed to create directory    gluon > contrib > fpdf > font
and place needed ttf files insited it :)
then pdf.write(8, u"Ąžuolas")  works fine 

The fpdf logic uses utf8 for fonts it sees as UTF-based, otherwise latin-1. It looks to me as though either it isn't recognizing your fonts as UTF, or there's some overlooked case that it's making a mistake with. Have a look at FPDF.set_font:

        self.unifontsubset = (self.fonts[fontkey]['type'] == 'TTF')

...and make sure it's getting set.

Mariano Reingart

unread,
Apr 26, 2013, 10:42:00 PM4/26/13
to web...@googlegroups.com
On Fri, Apr 26, 2013 at 6:34 PM, Jonathan Lundell <jlun...@pobox.com> wrote:
> On 26 Apr 2013, at 1:17 PM, Jurgis Pralgauskis
> <jurgis.pr...@gmail.com> wrote:
>
> ok, SOLVED ttf issue for unicode example
> http://code.google.com/p/pyfpdf/wiki/Unicode
> just needed to create directory gluon > contrib > fpdf > font
> and place needed ttf files insited it :)
> then pdf.write(8, u"Ąžuolas") works fine
>
>
> The fpdf logic uses utf8 for fonts it sees as UTF-based, otherwise latin-1.
> It looks to me as though either it isn't recognizing your fonts as UTF, or
> there's some overlooked case that it's making a mistake with. Have a look at
> FPDF.set_font:
>
> self.unifontsubset = (self.fonts[fontkey]['type'] == 'TTF')
>
> ...and make sure it's getting set.
>

Yes, as Jhonatan saids, FPDF (and the PDF standard, BTW) only support
latin1 characters for standard font.

If you need utf8 characters, you need to embeed a T

Mariano Reingart
http://www.sistemasagiles.com.ar
http://reingart.blogspot.com

Mariano Reingart

unread,
Apr 26, 2013, 10:45:18 PM4/26/13
to web...@googlegroups.com
You need to embed a UTF8 TTF font, for example:

# Add a DejaVu Unicode font (uses UTF-8)
# Supports more than 200 languages. For a coverage status see:
# http://dejavu.svn.sourceforge.net/viewvc/dejavu/trunk/dejavu-fonts/langcover.txt
pdf.add_font('DejaVu', '', 'DejaVuSansCondensed.ttf', uni=True)

(sorry, the previous message was sent incomplete)

I'll try to enhance the docs about this, thank for reporting the issue

Best regards

Mariano Reingart

unread,
Apr 26, 2013, 11:24:11 PM4/26/13
to web...@googlegroups.com
Sorry, I misread the email.

Unicode fonts were not supported in html2pdf.

I've made a change to allow them, please update html.py:

https://pyfpdf.googlecode.com/hg/fpdf/html.py

Then, you need to load a ttf unicode font, and then pass it in <font>
face attribute:

pdf=MyFPDF()
# add utf8 font
pdf.add_font('DejaVu', '', 'DejaVuSansCondensed.ttf', uni=True)
# first page:
pdf.add_page()
pdf.write_html(u"<font face='DejaVu'>Ąžuolas</font>")

For more info and complete code, see:

https://code.google.com/p/pyfpdf/wiki/Web2Py

Let me know if that works so I can update the docs and web2py contrib version

Jurgis Pralgauskis

unread,
Apr 27, 2013, 6:30:18 AM4/27/13
to web...@googlegroups.com
I see 2 lines were changed, the main

- if 'face' in attrs and attrs['face'].lower() in self.font_list:
+   if 'face' in attrs:

but I still get

pdf.write_html(u"<font face='DejaVu'>Ąžuolas</font>"
)
File "/home/jurgis/web2py/gluon/contrib/fpdf/html.py", line 397, in write_html
  File "/usr/local/lib/python2.7/HTMLParser.py", line 114, in feed

self.goahead(0)
File "/usr/local/lib/python2.7/HTMLParser.py", line 152, in goahead
if i < j: self.handle_data(rawdata[i:j])
File "/home/jurgis/web2py/gluon/contrib/fpdf/html.py", line 122, in handle_data
  File "/home/jurgis/web2py/gluon/contrib/fpdf/fpdf.py", line 822, in write

txt = self.normalize_text(txt)
File "/home/jurgis/web2py/gluon/contrib/fpdf/fpdf.py", line 1012, in normalize_text
txt = txt.encode('latin1')
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256
)


and if I
pdf.write_html(u"<font face='DejaVu'>Ąžuolas</font>".encode('utf8'))
I get Ä„Å3⁄4uolas




--

---
You received this message because you are subscribed to a topic in the Google Groups "web2py-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/web2py/KJDeQoLKw-M/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to web2py+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.





--
Jurgis Pralgauskis
tel: 8-616 77613;
Don't worry, be happy and make things better ;)
http://galvosukykla.lt

Mariano Reingart

unread,
Apr 27, 2013, 2:06:18 PM4/27/13
to web...@googlegroups.com
Did you add the TTF unicode font with add_font?
Can you post a complete example (ie a script.py just with the code to
test), so I can reproduce it easily.

Best regards,
> You received this message because you are subscribed to the Google Groups
> "web2py-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an

Ovidio Marinho

unread,
Apr 27, 2013, 5:34:55 PM4/27/13
to web...@googlegroups.com

      


         Ovidio Marinho Falcao Neto
                 Web Developer
             ovid...@gmail.com 
               83   8826 9088 - Oi
               83   9336 3782 - Claro
                        Brasil
              


2013/4/27 Mariano Reingart <rein...@gmail.com>

Mariano Reingart

unread,
Apr 27, 2013, 10:47:18 PM4/27/13
to web...@googlegroups.com
Yes, sure, using jasper reports and java :-)

Of course pyfpdf is not the silver bullet, but if users help to
improve it with bug reports, test cases & tentative features or even
ideas, surely it could be more powerful.
That's the way open source works, and maybe we can reach even a
simpler and more elegant solution at the end.

Jurgis Pralgauskis

unread,
Apr 28, 2013, 5:18:27 AM4/28/13
to web...@googlegroups.com
SOLVED - the problem was that I needed to reload web2py -- for changed html.py to make effect ;)

one more issue
that after write_html(..)   it "forgets" the previously set font (should be at least mentioned in docs.. :)


Jurgis Pralgauskis

unread,
Apr 28, 2013, 6:40:07 AM4/28/13
to web...@googlegroups.com
by the way -- would it be possible to pack at least one ttf with web2py, 
and in normalize_text , when it notices unicode,
automatically add (and set) default ttf font (if such is not set) to render ok ?

Mariano Reingart

unread,
Apr 30, 2013, 5:22:26 PM4/30/13
to web...@googlegroups.com
Which font do you want to include?

The font pack is 15MB, I don't know if it could be included with web2py.
Also, the problem is that no one is complete (you need several fonts
to cover west / east languages)

https://pyfpdf.googlecode.com/files/fpdf_unicode_font_pack.zip

BTW, thanks for you comments, I gave you contributor access, so you
can change the docs directly in the wiki if you like so:

https://code.google.com/p/pyfpdf/w/list

If you have any patch, also I'll be happy to review and include it ;-)
On Sun, Apr 28, 2013 at 7:40 AM, Jurgis Pralgauskis

Jurgis Pralgauskis

unread,
Apr 30, 2013, 5:41:22 PM4/30/13
to web...@googlegroups.com
> Which font do you want to include?

DejaVu - I guess it covers western languages.. (could be withouth bold/italics to save space) 
Another one could be for eastern chars (buy I don't know anything about them...)

I gave you contributor access

Thanks  :)

Martin Weissenboeck

unread,
May 2, 2013, 2:34:07 AM5/2/13
to web...@googlegroups.com
Hi,
I have tried again to generate a pdf file from an htlm file with some unicode characters.
There is my test program. It's a simplified version, in the original program there is a lot of additional test lines.

def pp():       
    from gluon.contrib.pyfpdf import FPDF, HTMLMixin
   
    class MyFPDF(FPDF, HTMLMixin):
        def header(self): pass           
        def footer(self): pass  

    # create a small table with some data:
    rows = [THEAD(TR(TH("Key",_width="70%"), TH("Value",_width="30%"))),
            TBODY(TR(TD("Hello"),TD("60")),
                  TR(TD("World äöü éè €"),TD("40")))]
    table = TABLE(*rows, _border="0", _align="center", _width="50%")
   
    pdf=MyFPDF()

    pdf.add_font('DejaVu', '', 'DejaVuSansCondensed.ttf',  uni=True)   
    pdf.add_page()
    pdf.set_font('DejaVu','',10)     # set font method 1
    # table =TAG.font(table, _face="DejaVu")  # set font method 2

    html = str(XML(table, sanitize=False))
    pdf.write_html(html)
    response.headers['Content-Type'] = "application/pdf" 
    return pdf.output(dest='S')


I am sure that the font file is loaded, but it seems that the font is not used.
I have tried two methods to change the font, but the results are the same.
It doesn't look like Sanserif and every Unicode-Character is printed with every single utf-8 byte. Maybe it's only a small error, but I could not find it.


Regards, MartinInline-Bild 3


2013/4/30 Jurgis Pralgauskis <jurgis.pr...@gmail.com>
image.png

Martin Weissenboeck

unread,
May 2, 2013, 8:00:26 AM5/2/13
to web...@googlegroups.com
Some hours  later...
Now I have tried to use the Arial-font:

pdf.set_font('Arial','',10)

There is always the same font - set_font seems to do nothing.

 


2013/5/2 Martin Weissenboeck <mwei...@gmail.com>
image.png

Jurgis Pralgauskis

unread,
May 2, 2013, 8:53:44 AM5/2/13
to web...@googlegroups.com
try
pdf.write_html("<font face='DejaVu'>%s</font>" % html )
image.png

Jurgis Pralgauskis

unread,
May 2, 2013, 8:58:10 AM5/2/13
to web...@googlegroups.com
by the way, not sure, if there is need to write
u"World äöü éè €"
or just "World äöü éè €"

seems, both work
image.png

Martin Weissenboeck

unread,
May 2, 2013, 11:39:57 AM5/2/13
to web...@googlegroups.com
Thank you, I have tried u"World äöü éè €" and
"World äöü éè €"

and both


    table =TAG.font(table, _face="DejaVu")   
    html = str(XML(table, sanitize=False))
    pdf.write_html(html)

and


    html = str(XML(table, sanitize=False))
    html="<font face='DejaVu'>%s</font>" % html
    pdf.write_html(html)

The result is the same html-string

No success - the pdf file remains unchanged.
It looks like Times Roman and not Sans serif.

Has anybody tried my whole example with success?

Regards, Martin
  


2013/5/2 Jurgis Pralgauskis <jurgis.pr...@gmail.com>
image.png

Mariano Reingart

unread,
May 2, 2013, 2:44:56 PM5/2/13
to web...@googlegroups.com
Could you send me a complete example?
Are you using the updated version of pyfpdf?
image.png

Martin Weissenboeck

unread,
May 2, 2013, 3:13:15 PM5/2/13
to web...@googlegroups.com
This is my shortest example:

def p2():       
    from gluon.contrib.pyfpdf import FPDF, HTMLMixin
   
    class MyFPDF(FPDF, HTMLMixin): pass
   
    pdf=MyFPDF()
    pdf.set_font('Arial','',18)
    pdf.add_page()
    pdf.write_html(str(XML(CAT(B('hello'), I(' world')),
       sanitize=False)))

    response.headers['Content-Type'] = "application/pdf" 
    return pdf.output(dest='S')


It does not show Arial, but Times.

Versions: I have check gluon/contrib/pyfpdf once again:
fpdf: Version 1.7.1
html.py:  the version today
I have loaded all other some minutes ago and I have tried it again - always the same result.

Regards, Martin


2013/5/2 Mariano Reingart <rein...@gmail.com>



--
Mit freundlichen Grüßen / With kind regards
Martin Weissenböck
Gregor-Mendel-Str. 37, 1190 Wien
Austria / European Union
Tel  +43 1 31400 00
Fax  +43 1 31400 700
image.png

Mariano Reingart

unread,
May 2, 2013, 4:24:26 PM5/2/13
to web...@googlegroups.com
Your code is incorrect, you need to use <FONT> tag to correctly change the font in the PDF generation.
For unicode, you'll need to load the TTF font with add_font.

Please see the standalone example, you can adapt it to run in web2py:


Attached is the ouptut file.

Also, remember that if you're updating python modules in web2py, you will need to restart the webserver.

Let me know if that works,

Best regards,

image.png
html_unicode.pdf

Martin Weissenboeck

unread,
May 3, 2013, 6:38:04 AM5/3/13
to web...@googlegroups.com
Thank you. I had a second example program using "<font>" and it did not work too.
The answer: <font> must not be the outmost tag:

"<p><font>.....</fon></p>"                works,
"<span><font> ... </font></span>"  works too, but not:
"<font>...</font>"

Maybe this should be mentioned anywhere.
I'll try an enhanced version of "write_html" and send it during the next days.

I have played a little bit and every thing looks fine: another color (please look at issue 59), umlauts, €-symbol and so on.
Now I have found another problem with the Zapfdingbats font: seems that every character has a width of 0 pixels.

Inline-Bild 1

I have installed zapfdingbats.ttf again, but the pdf file did not change.
Any ideas?

image.png
image.png

Martin Weissenboeck

unread,
May 3, 2013, 3:26:48 PM5/3/13
to web...@googlegroups.com
Now I have a proposal for an enhanced write_html.

def write_html(self, text, image_map=None, font=None,
    size=None, color=None, newline=False):

font is a string with the name of a built-in font or a font added with add_font.
size is the size in Points (pt)
color is a string like "#ff0000"  (this is red)
newline: if it is True, a <br /> is append at the end of the text string

You can use html-entities like &alpha; or &auml; or &#65; or &#x42; or &#X43;

The whole file is attached to this email.

Example (published by Mariano Reingart):

    pdf.write_html("<B>hello</B> <I>world äöü &auml;</I>", font="Arial", newline=True)
    pdf.write_html("<B>hello</B> <I>world</I>", font='Times', size=20,
        color="#ff0000", newline=True)
    pdf.write_html("<B>hello</B> <I>world</I>", font="Courier", newline=True)
    pdf.write_html("hello world", font="Zapfdingbats", newline=True)
  
    # greek
    pdf.write_html("Γειά σου κόσμος", font="DejaVu", newline=True)
    # russian
    pdf.write_html("Здравствулте мир", font="DejaVu", newline=True)
    # unicode and entities
    pdf.write_html("abc äöü € éè &alpha;&auml;&lt;", font="DejaVu", newline=True)


Please look at my last message - there is a screenshot of the pdf file. (Yes, there is a problem with Zapfdingbats!)
Hope it could help.

Regards, Martin




2013/5/3 Martin Weissenboeck <mwei...@gmail.com>
image.png
image.png
html.py
Reply all
Reply to author
Forward
0 new messages