Problem with python3 / pdf output (encoding): UnicodeDecodeError utf-8 invalid continuation byte

59 views
Skip to first unread message

Silvan Marco Fin

unread,
Feb 28, 2020, 4:15:04 AM2/28/20
to web...@googlegroups.com
Hi!
I have a problem with pdf output running in python3 environment. The problem I'm trying to solve is more complex, but I managed to strip it down to a very small example. If I get this running, I'm sure to manage in my actual situation.

I created a new application with a controller like this:

# -*- coding: utf-8 -*-

def index():
    return dict(
        data=UL(
            A('Output html', _href=URL(print, extension='html')),
            A('Output pdf', _href=URL(print, extension='pdf'))
        ),
        message="hello from tiatpi.py")

def print():
    return dict(content='Data ascii characters.') # line was modified from previous post

Pressing the first link I get presented with expected web page.
Pressing on the second link I receive an Internal error:

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
Traceback (most recent call last):
File "/mnt/c/Users/Silvan Marco Fin/Desktop/Working/web2py/gluon/restricted.py", line 219, in restricted
exec(ccode, environment)
File "/mnt/c/Users/Silvan Marco Fin/Desktop/Working/web2py/applications/pdf_test/views/generic.pdf", line 9, in <module>
pass
File "/mnt/c/Users/Silvan Marco Fin/Desktop/Working/web2py/gluon/globals.py", line 434, in write
self.body.write(to_native(xmlescape(data)))
File "/mnt/c/Users/Silvan Marco Fin/Desktop/Working/web2py/gluon/packages/dal/pydal/_compat.py", line 136, in to_native
return obj.decode(charset, errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 10: invalid continuation byte

Bevore assembling this small snippet I experimented with the whole situation for some time now and I never produced anything else than exceptions, most of them some form of DecordeErorr or EncodingError, so I assume, there is something wrong with either fpdf or the XML() helper.
I would greatly appreciate any help on this!

Kind regards,
Silvan

Additional Information:
Environment: 
I'm running an ubuntu in Windows 10 pro 1909 WSL:
silvan@Nepumuk:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.4 LTS
Release:        18.04
Codename:       bionic

and from web21py admin pages: 2.18.5-stable+timestamp.2019.04.08.04.22.03
(läuft auf Rocket 1.2.6, Python 3.6.9)

But the problem shows up on native Ubuntu Linux as well.

changed post; former version hat the line:
    return dict(content = 'Data with "dschörmän" umlauts')

Clemens

unread,
Feb 28, 2020, 4:29:53 AM2/28/20
to web2py-users
Hi Silvan,

since I've also to handle "dschörmän" Umlauts, I've put a converstion in the following procedure:

import cgi
import sys

# -------------------------------------------------------------------------
# convert_special_chars(label)
#
# Converts string to be compatible to UTF-8
# and to HTML (e.g. ä to &auml;)
#
# parameter: label as string
# -------------------------------------------------------------------------
def convert_special_chars(label):
    if sys.version_info[0] == 2: # is Python 2.x
        label = label.decode('utf-8')
    label = cgi.escape(label)
    label = label.encode('ascii', 'xmlcharrefreplace')

    return label


Have a try, hope it helps!

Best regards
Clemens



On Friday, February 28, 2020 at 10:15:04 AM UTC+1, Silvan Marco Fin wrote:
Hi!
I have a problem with pdf output running in python3 environment. The problem I'm trying to solve is more complex, but I managed to strip it down to a very small example. If I get this running, I'm sure to manage in my actual situation.

I created a new application with a controller like this:

# -*- coding: utf-8 -*-

def index():
    return dict(
        data=UL(
            A('Output html', _href=URL(print, extension='html')),
            A('Output pdf', _href=URL(print, extension='pdf'))
        ),
        message="hello from tiatpi.py")

def print():
    return dict(content='Data with "dschörmän" Umlauts')

Silvan Marco Fin

unread,
Feb 28, 2020, 5:55:57 AM2/28/20
to web2py-users
Hi Clemens,

thanks for your answer, but I think that your code does not help to solve the problem. I just ran some more tests and if the 'content' variable only contains ASCII characters (tried 'Data without german umlauts.'), the problem happens all the same. IMHO it is not a problem with non ASCII characters and therefore not solved by escaping them by any means.

Maybe I had the wrong focus while writing the post.

Kind regards,
Silvan

Silvan Marco Fin

unread,
Mar 2, 2020, 4:30:27 AM3/2/20
to web2py-users

 I believe this to be a bug, so I submitted a bug report: https://github.com/web2py/web2py/issues/2289
Reply all
Reply to author
Forward
0 new messages