Solved -- Problems with special characters and pyfpdf

Bernardo

unread,

Sep 25, 2010, 7:35:30 AM9/25/10

to web2py-users

Hi all,

When using pyfpdf which comes with web2py framework, there are some
issues with special characters such as accented characters (á, é,
í, ...). After some research, I found out that pyfpdf just understands
'iso-8859-1', and web2py gives him the strings in 'utf-8' format.

So, as a solution, in your python code you just have to convert the
string before passing it to pyfpdf, like this:

txt = 'Hélló wórld'
utxt = unicode('txt', 'utf-8')
stxt = utxt.encode('iso-8859-1')
pdf.cell(50,20, stxt, 0, 2, 'L')

If anyone has any doubts, just ask. I hope this can help someone...

kind regards,
Bernardo

Christopher Steel

unread,

Apr 28, 2011, 11:08:27 PM4/28/11

to web2py-users

This solution works well. You will need to make a minor correction and
remove the single quotes around 'txt' in the second line. The edited
version looks like this:

txt = 'Hélló wórld'
utxt = unicode(txt, 'utf-8')

stxt = utxt.encode('iso-8859-1')
pdf.cell(50,20, stxt, 0, 2, 'L')

Thanks for the hint Bernardo!

C.

---------- Forwarded message ----------
From: Bernardo <estem...@gmail.com>
Date: Sep 25 2010, 7:35 am
Subject: Solved -- Problems with special characters and pyfpdf
To: web2py-users

Hi all,

When usingpyfpdfwhich comes with web2py framework, there are some
issues withspecialcharacterssuch as accentedcharacters(á, é,
í, ...). After some research, I found out thatpyfpdfjust understands

'iso-8859-1', and web2py gives him the strings in 'utf-8' format.

So, as a solution, in your python code you just have to convert the

string before passing it topyfpdf, like this:

Alexandre Andrade

unread,

Apr 28, 2011, 11:35:40 PM4/28/11

to web...@googlegroups.com

The same can be converted to a function, to make it easier:

def lt(str):

return unicode(str,'utf-8').encode('iso-8859-1')

so just

pdf.cell(50,20,lt('Helló Wórld'), 0,2,'L')

2011/4/29 Christopher Steel <chris...@gmail.com>

--
Atenciosamente

Alexandre Andrade
Hipercenter.com Classificados Gratuitos

Christopher Steel

unread,

Apr 30, 2011, 12:19:36 AM4/30/11

to web2py-users

Thank you Alexandre, now I am feeling exceptionally lazy as well as
being highly satisfied with the end results and am therefore impelled
to add something (anything!) as well.

If you as lazy as I am, opps, I mean, if you want to keep all of your
pdf related imports and code in a single file(controller) but want to
"hide" the function you can start your conversion function name with
an underscore.

In addition the following "laziness enhanced' version of Alexandre's
function also uses an "explicit" version of 'cell' for folks who are
too lazy to memorize this http://code.google.com/p/pyfpdf/wiki/Cell ,
I mean, umm, for folks who would like to see an example using a more
explicit cell definition methods...

def _i2u(str):
'''
convert iso-8859-1
'''

return unicode(str,'utf-8').encode('iso-8859-1')

pdf.cell(w=97.5,h=9,txt=_i2u('Montréal
2011'),border='',ln=1,align='R',fill=0,link='')

; )

Chris

On 28 avr, 23:35, Alexandre Andrade <alexandrema...@gmail.com> wrote:
> The same can be converted to a function, to make it easier:
>
> def lt(str):
> return unicode(str,'utf-8').encode('iso-8859-1')
>
> so just
>
> pdf.cell(50,20,lt('Helló Wórld'), 0,2,'L')
>

> 2011/4/29 Christopher Steel <chris.st...@gmail.com>

Jurgis Pralgauskis

unread,

Apr 26, 2013, 3:20:30 PM4/26/13

to

Hi,

but this seems to work not for all unicode characters

like if I have "Ąžuolas"

u"Ąžuolas".encode('iso-8859-1') gives error :/

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256)

I also posted on hosting forum looking for TTF solution

https://www.pythonanywhere.com/forums/topic/602/#id_post_4362

Jurgis Pralgauskis

unread,

Apr 26, 2013, 4:17:41 PM4/26/13

to web...@googlegroups.com

ok, SOLVED ttf issue for unicode example http://code.google.com/p/pyfpdf/wiki/Unicode

just needed to create directory gluon > contrib > fpdf > font

and place needed ttf files insited it :)

then pdf.write(8, u"Ąžuolas") works fine

BUT - how to make it work with write_html(...) ?

write_html( str(P( u"Ąžuolas" )) # produces "Ä„Å¾uolas"

pdf.write_html( u'Ąžuolas'.encode('utf8') ) # also "Ä„Å¾uolas"

pdf.write_html( u'Ąžuolas' ) gives error

File "/home/jurgis/web2py/applications/apskaitele/controllers/default.py", line 59, in pdftest
pdf.write_html( u'Ąžuolas' )
File "/home/jurgis/web2py/gluon/contrib/fpdf/html.py", line 397, in write_html
h2p.feed(text)
File "/usr/local/lib/python2.7/HTMLParser.py", line 114, in feed
self.goahead(0)
File "/usr/local/lib/python2.7/HTMLParser.py", line 152, in goahead
if i < j: self.handle_data(rawdata[i:j])
File "/home/jurgis/web2py/gluon/contrib/fpdf/html.py", line 122, in handle_data
self.pdf.write(self.h,txt)
File "/home/jurgis/web2py/gluon/contrib/fpdf/fpdf.py", line 822, in write
txt = self.normalize_text(txt)
File "/home/jurgis/web2py/gluon/contrib/fpdf/fpdf.py", line 1012, in normalize_text
txt = txt.encode('latin1')

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256)

pdf.write_html( u'Ąžuolas'.decode('utf8') )

2013 m. balandis 26 d., penktadienis 22:03:41 UTC+3, Jurgis Pralgauskis rašė:

Hi,

but this seems to work not for all unicode characters
like if I have "Ąžuolas"

u"Ąžuolas".encode('iso-8859-1') gives error :/

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256)
I also posted on hosting forum looking for TTF solution
https://www.pythonanywhere.com/forums/topic/602/#id_post_4362

2011 m. balandis 29 d., penktadienis 06:35:40 UTC+3, Alexandre Andrade rašė:

Jurgis Pralgauskis

unread,

Apr 26, 2013, 4:25:03 PM4/26/13

to web...@googlegroups.com

also posted issue on pyfpdf site http://code.google.com/p/pyfpdf/issues/detail?id=54&thanks=54&ts=1367007819

Jonathan Lundell

unread,

Apr 26, 2013, 5:34:59 PM4/26/13

to web...@googlegroups.com

On 26 Apr 2013, at 1:17 PM, Jurgis Pralgauskis <jurgis.pr...@gmail.com> wrote:

ok, SOLVED ttf issue for unicode example http://code.google.com/p/pyfpdf/wiki/Unicode
just needed to create directory gluon > contrib > fpdf > font
and place needed ttf files insited it :)
then pdf.write(8, u"Ąžuolas") works fine

The fpdf logic uses utf8 for fonts it sees as UTF-based, otherwise latin-1. It looks to me as though either it isn't recognizing your fonts as UTF, or there's some overlooked case that it's making a mistake with. Have a look at FPDF.set_font:

self.unifontsubset = (self.fonts[fontkey]['type'] == 'TTF')

...and make sure it's getting set.

Mariano Reingart

unread,

Apr 26, 2013, 10:42:00 PM4/26/13

to web...@googlegroups.com

On Fri, Apr 26, 2013 at 6:34 PM, Jonathan Lundell <jlun...@pobox.com> wrote:
> On 26 Apr 2013, at 1:17 PM, Jurgis Pralgauskis
> <jurgis.pr...@gmail.com> wrote:
>
> ok, SOLVED ttf issue for unicode example
> http://code.google.com/p/pyfpdf/wiki/Unicode
> just needed to create directory gluon > contrib > fpdf > font
> and place needed ttf files insited it :)
> then pdf.write(8, u"Ąžuolas") works fine
>
>
> The fpdf logic uses utf8 for fonts it sees as UTF-based, otherwise latin-1.
> It looks to me as though either it isn't recognizing your fonts as UTF, or
> there's some overlooked case that it's making a mistake with. Have a look at
> FPDF.set_font:
>
> self.unifontsubset = (self.fonts[fontkey]['type'] == 'TTF')
>
> ...and make sure it's getting set.
>

Yes, as Jhonatan saids, FPDF (and the PDF standard, BTW) only support
latin1 characters for standard font.

If you need utf8 characters, you need to embeed a T

Mariano Reingart
http://www.sistemasagiles.com.ar
http://reingart.blogspot.com

Mariano Reingart

unread,

Apr 26, 2013, 10:45:18 PM4/26/13

to web...@googlegroups.com

You need to embed a UTF8 TTF font, for example:

# Add a DejaVu Unicode font (uses UTF-8)
# Supports more than 200 languages. For a coverage status see:
# http://dejavu.svn.sourceforge.net/viewvc/dejavu/trunk/dejavu-fonts/langcover.txt
pdf.add_font('DejaVu', '', 'DejaVuSansCondensed.ttf', uni=True)

(sorry, the previous message was sent incomplete)

I'll try to enhance the docs about this, thank for reporting the issue

Best regards

Mariano Reingart

unread,

Apr 26, 2013, 11:24:11 PM4/26/13

to web...@googlegroups.com

Sorry, I misread the email.

Unicode fonts were not supported in html2pdf.

I've made a change to allow them, please update html.py:

https://pyfpdf.googlecode.com/hg/fpdf/html.py

Then, you need to load a ttf unicode font, and then pass it in 
face attribute:

pdf=MyFPDF()
# add utf8 font

pdf.add_font('DejaVu', '', 'DejaVuSansCondensed.ttf', uni=True)

# first page:
pdf.add_page()
pdf.write_html(u"Ąžuolas")

For more info and complete code, see:

https://code.google.com/p/pyfpdf/wiki/Web2Py

Let me know if that works so I can update the docs and web2py contrib version

Best regards
Mariano Reingart
http://www.sistemasagiles.com.ar
http://reingart.blogspot.com

Jurgis Pralgauskis

unread,

Apr 27, 2013, 6:30:18 AM4/27/13

to web...@googlegroups.com

I see 2 lines were changed, the main

- if 'face' in attrs and attrs['face'].lower() in self.font_list:
+ if 'face' in attrs:

but I still get


  File "/home/jurgis/web2py/applications/apskaitele/controllers/default.py", line 61, in pdftest



    pdf.write_html(u"<font face='DejaVu'>Ąžuolas</font>"

File "/home/jurgis/web2py/gluon/contrib/fpdf/html.py", line 397, in write_html

  File "/usr/local/lib/python2.7/HTMLParser.py", line 114, in feed



    self.goahead(0)
  File "/usr/local/lib/python2.7/HTMLParser.py", line 152, in goahead

    if i < j: self.handle_data(rawdata[i:j])

  File "/home/jurgis/web2py/gluon/contrib/fpdf/html.py", line 122, in handle_data

  File "/home/jurgis/web2py/gluon/contrib/fpdf/fpdf.py", line 822, in write



    txt = self.normalize_text(txt)

  File "/home/jurgis/web2py/gluon/contrib/fpdf/fpdf.py", line 1012, in normalize_text

    txt = txt.encode('latin1')

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256

)


        
and if I 
pdf.write_html(u"<font face='DejaVu'>Ąžuolas</font>".encode('utf8'))

I get Ä„Å3⁄4uolas

--

---
You received this message because you are subscribed to a topic in the Google Groups "web2py-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/web2py/KJDeQoLKw-M/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to web2py+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Jurgis Pralgauskis
tel: 8-616 77613;
Don't worry, be happy and make things better ;)
http://galvosukykla.lt

Mariano Reingart

unread,

Apr 27, 2013, 2:06:18 PM4/27/13

to web...@googlegroups.com

Did you add the TTF unicode font with add_font?
Can you post a complete example (ie a script.py just with the code to
test), so I can reproduce it easily.

Best regards,

Mariano Reingart
http://www.sistemasagiles.com.ar
http://reingart.blogspot.com

> You received this message because you are subscribed to the Google Groups
> "web2py-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an

Ovidio Marinho

unread,

Apr 27, 2013, 5:34:55 PM4/27/13

to web...@googlegroups.com

This all ends with this tool made by lucas davila.

https://github.com/simpleservices/app_report-python/wiki/Using-the-AppReport-client-on-Web2py-Apps

Ovidio Marinho Falcao Neto

Web Developer
ovid...@gmail.com

83 8826 9088 - Oi

83 9336 3782 - Claro
Brasil

2013/4/27 Mariano Reingart <rein...@gmail.com>

Mariano Reingart

unread,

Apr 27, 2013, 10:47:18 PM4/27/13

to web...@googlegroups.com

Yes, sure, using jasper reports and java :-)

Of course pyfpdf is not the silver bullet, but if users help to
improve it with bug reports, test cases & tentative features or even
ideas, surely it could be more powerful.
That's the way open source works, and maybe we can reach even a
simpler and more elegant solution at the end.

Best regards,

Mariano Reingart
http://www.sistemasagiles.com.ar
http://reingart.blogspot.com

Jurgis Pralgauskis

unread,

Apr 28, 2013, 5:18:27 AM4/28/13

to web...@googlegroups.com

SOLVED - the problem was that I needed to reload web2py -- for changed html.py to make effect ;)

one more issue

that after write_html(..) it "forgets" the previously set font (should be at least mentioned in docs.. :)

https://code.google.com/p/pyfpdf/issues/detail?id=54#c2

Jurgis Pralgauskis

unread,

Apr 28, 2013, 6:40:07 AM4/28/13

to web...@googlegroups.com

by the way -- would it be possible to pack at least one ttf with web2py,

and in normalize_text , when it notices unicode,

automatically add (and set) default ttf font (if such is not set) to render ok ?

Mariano Reingart

unread,

Apr 30, 2013, 5:22:26 PM4/30/13

to web...@googlegroups.com

Which font do you want to include?

The font pack is 15MB, I don't know if it could be included with web2py.
Also, the problem is that no one is complete (you need several fonts
to cover west / east languages)

https://pyfpdf.googlecode.com/files/fpdf_unicode_font_pack.zip

BTW, thanks for you comments, I gave you contributor access, so you
can change the docs directly in the wiki if you like so:

https://code.google.com/p/pyfpdf/w/list

If you have any patch, also I'll be happy to review and include it ;-)

Best regards,

Mariano Reingart
http://www.sistemasagiles.com.ar
http://reingart.blogspot.com

On Sun, Apr 28, 2013 at 7:40 AM, Jurgis Pralgauskis

Jurgis Pralgauskis

unread,

Apr 30, 2013, 5:41:22 PM4/30/13

to web...@googlegroups.com

> Which font do you want to include?

DejaVu - I guess it covers western languages.. (could be withouth bold/italics to save space)

Another one could be for eastern chars (buy I don't know anything about them...)

> I gave you contributor access

Thanks :)

Martin Weissenboeck

unread,

May 2, 2013, 2:34:07 AM5/2/13

to web...@googlegroups.com

Hi,

I have tried again to generate a pdf file from an htlm file with some unicode characters.

There is my test program. It's a simplified version, in the original program there is a lot of additional test lines.

def pp():
    from gluon.contrib.pyfpdf import FPDF, HTMLMixin

    class MyFPDF(FPDF, HTMLMixin):
        def header(self): pass
        def footer(self): pass

    # create a small table with some data:
    rows = [THEAD(TR(TH("Key",_width="70%"), TH("Value",_width="30%"))),
            TBODY(TR(TD("Hello"),TD("60")),
                  TR(TD("World äöü éè €"),TD("40")))]
    table = TABLE(*rows, _border="0", _align="center", _width="50%")

    pdf=MyFPDF()

pdf.add_font('DejaVu', '', 'DejaVuSansCondensed.ttf', uni=True)

    pdf.add_page()
    pdf.set_font('DejaVu','',10)     # set font method 1
    # table =TAG.font(table, _face="DejaVu") # set font method 2
    html = str(XML(table, sanitize=False))
    pdf.write_html(html)
    response.headers['Content-Type'] = "application/pdf"
    return pdf.output(dest='S')

I am sure that the font file is loaded, but it seems that the font is not used.
I have tried two methods to change the font, but the results are the same.
It doesn't look like Sanserif and every Unicode-Character is printed with every single utf-8 byte. Maybe it's only a small error, but I could not find it.

Regards, Martin Inline-Bild 3

2013/4/30 Jurgis Pralgauskis <jurgis.pr...@gmail.com>

image.png

Martin Weissenboeck

unread,

May 2, 2013, 8:00:26 AM5/2/13

to web...@googlegroups.com

Some hours later...

Now I have tried to use the Arial-font:

pdf.set_font('Arial','',10)

There is always the same font - set_font seems to do nothing.

2013/5/2 Martin Weissenboeck <mwei...@gmail.com>

image.png

Jurgis Pralgauskis

unread,

May 2, 2013, 8:53:44 AM5/2/13

to web...@googlegroups.com

try
pdf.write_html("%s" % html )

image.png

Jurgis Pralgauskis

unread,

May 2, 2013, 8:58:10 AM5/2/13

to web...@googlegroups.com

by the way, not sure, if there is need to write
u"World äöü éè €"

or just "World äöü éè €"

seems, both work

image.png

Martin Weissenboeck

unread,

May 2, 2013, 11:39:57 AM5/2/13

to web...@googlegroups.com

Thank you, I have tried u"World äöü éè €" and
"World äöü éè €"

and both

table =TAG.font(table, _face="DejaVu")

html = str(XML(table, sanitize=False))
pdf.write_html(html)

and

 html = str(XML(table, sanitize=False))
 html="%s" % html
 pdf.write_html(html)

The result is the same html-string

No success - the pdf file remains unchanged.

It looks like Times Roman and not Sans serif.

Has anybody tried my whole example with success?

Regards, Martin

2013/5/2 Jurgis Pralgauskis <jurgis.pr...@gmail.com>

image.png

Mariano Reingart

unread,

May 2, 2013, 2:44:56 PM5/2/13

to web...@googlegroups.com

Could you send me a complete example?

Are you using the updated version of pyfpdf?

Mariano Reingart
http://www.sistemasagiles.com.ar
http://reingart.blogspot.com

image.png

Martin Weissenboeck

unread,

May 2, 2013, 3:13:15 PM5/2/13

to web...@googlegroups.com

This is my shortest example:

def p2():

from gluon.contrib.pyfpdf import FPDF, HTMLMixin

    class MyFPDF(FPDF, HTMLMixin): pass

    pdf=MyFPDF()
    pdf.set_font('Arial','',18)
    pdf.add_page()
    pdf.write_html(str(XML(CAT(B('hello'), I(' world')),
       sanitize=False)))

response.headers['Content-Type'] = "application/pdf"
return pdf.output(dest='S')

It does not show Arial, but Times.

Versions: I have check gluon/contrib/pyfpdf once again:
fpdf: Version 1.7.1

html.py: the version today

I have loaded all other some minutes ago and I have tried it again - always the same result.

Regards, Martin

2013/5/2 Mariano Reingart <rein...@gmail.com>

--
Mit freundlichen Grüßen / With kind regards
Martin Weissenböck
Gregor-Mendel-Str. 37, 1190 Wien
Austria / European Union
Tel +43 1 31400 00
Fax +43 1 31400 700

image.png

Mariano Reingart

unread,

May 2, 2013, 4:24:26 PM5/2/13

to web...@googlegroups.com

Your code is incorrect, you need to use tag to correctly change the font in the PDF generation.

For unicode, you'll need to load the TTF font with add_font.

Please see the standalone example, you can adapt it to run in web2py:

https://code.google.com/p/pyfpdf/source/browse/tests/html_unicode.py

Attached is the ouptut file.

Also, remember that if you're updating python modules in web2py, you will need to restart the webserver.

Let me know if that works,

Best regards,

Mariano Reingart
http://www.sistemasagiles.com.ar
http://reingart.blogspot.com

image.png

html_unicode.pdf

Martin Weissenboeck

unread,

May 3, 2013, 6:38:04 AM5/3/13

to web...@googlegroups.com

Thank you. I had a second example program using "" and it did not work too.

The answer: must not be the outmost tag:

".....</fon>" works,

" ... " works too, but not:

"..."

Maybe this should be mentioned anywhere.

I'll try an enhanced version of "write_html" and send it during the next days.

I have played a little bit and every thing looks fine: another color (please look at issue 59), umlauts, €-symbol and so on.
Now I have found another problem with the Zapfdingbats font: seems that every character has a width of 0 pixels.

Inline-Bild 1

I have installed zapfdingbats.ttf again, but the pdf file did not change.

Any ideas?

image.png

Martin Weissenboeck

unread,

May 3, 2013, 3:26:48 PM5/3/13

to web...@googlegroups.com

Now I have a proposal for an enhanced write_html.

def write_html(self, text, image_map=None, font=None,
size=None, color=None, newline=False):

font is a string with the name of a built-in font or a font added with add_font.

size is the size in Points (pt)

color is a string like "#ff0000" (this is red)

newline: if it is True, a is append at the end of the text string

You can use html-entities like α or ä or A or B or &#X43;

The whole file is attached to this email.

Example (published by Mariano Reingart):

pdf.write_html("hello world äöü ä", font="Arial", newline=True)
 pdf.write_html("hello world", font='Times', size=20,
 color="#ff0000", newline=True)
 pdf.write_html("hello world", font="Courier", newline=True)
 pdf.write_html("hello world", font="Zapfdingbats", newline=True)

 # greek
 pdf.write_html("Γειά σου κόσμος", font="DejaVu", newline=True)
 # russian
 pdf.write_html("Здравствулте мир", font="DejaVu", newline=True)
 # unicode and entities
 pdf.write_html("abc äöü € éè αä<", font="DejaVu", newline=True)

Please look at my last message - there is a screenshot of the pdf file. (Yes, there is a problem with Zapfdingbats!)
Hope it could help.

Regards, Martin

2013/5/3 Martin Weissenboeck <mwei...@gmail.com>

image.png

html.py

Reply all

Reply to author

Forward