Problem with python3 / pdf output (encoding): UnicodeDecodeError utf-8 invalid continuation byte

Silvan Marco Fin

unread,

Feb 28, 2020, 4:15:04 AM2/28/20

to web...@googlegroups.com

Hi!

I have a problem with pdf output running in python3 environment. The problem I'm trying to solve is more complex, but I managed to strip it down to a very small example. If I get this running, I'm sure to manage in my actual situation.

I created a new application with a controller like this:

# -*- coding: utf-8 -*-

def index():

return dict(

data=UL(

A('Output html', _href=URL(print, extension='html')),

A('Output pdf', _href=URL(print, extension='pdf'))

),

message="hello from tiatpi.py")

def print():

return dict(content='Data ascii characters.') # line was modified from previous post

Pressing the first link I get presented with expected web page.

Pressing on the second link I receive an Internal error:

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.

Traceback (most recent call last):
  File "/mnt/c/Users/Silvan Marco Fin/Desktop/Working/web2py/gluon/restricted.py", line 219, in restricted
    exec(ccode, environment)
  File "/mnt/c/Users/Silvan Marco Fin/Desktop/Working/web2py/applications/pdf_test/views/generic.pdf", line 9, in <module>
    pass
  File "/mnt/c/Users/Silvan Marco Fin/Desktop/Working/web2py/gluon/globals.py", line 434, in write
    self.body.write(to_native(xmlescape(data)))
  File "/mnt/c/Users/Silvan Marco Fin/Desktop/Working/web2py/gluon/packages/dal/pydal/_compat.py", line 136, in to_native
    return obj.decode(charset, errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 10: invalid continuation byte

Bevore assembling this small snippet I experimented with the whole situation for some time now and I never produced anything else than exceptions, most of them some form of DecordeErorr or EncodingError, so I assume, there is something wrong with either fpdf or the XML() helper.

I would greatly appreciate any help on this!

Kind regards,

Silvan

Additional Information:

Environment:

I'm running an ubuntu in Windows 10 pro 1909 WSL:

silvan@Nepumuk:~$ lsb_release -a

No LSB modules are available.

Distributor ID: Ubuntu

Description: Ubuntu 18.04.4 LTS

Release: 18.04

Codename: bionic

and from web21py admin pages: 2.18.5-stable+timestamp.2019.04.08.04.22.03

(läuft auf Rocket 1.2.6, Python 3.6.9)

But the problem shows up on native Ubuntu Linux as well.

changed post; former version hat the line:

return dict(content = 'Data with "dschörmän" umlauts')

Clemens

unread,

Feb 28, 2020, 4:29:53 AM2/28/20

to web2py-users

Hi Silvan,

since I've also to handle "dschörmän" Umlauts, I've put a converstion in the following procedure:

import cgi
import sys

# -------------------------------------------------------------------------
# convert_special_chars(label)
#
# Converts string to be compatible to UTF-8
# and to HTML (e.g. ä to ä)
#
# parameter: label as string
# -------------------------------------------------------------------------
def convert_special_chars(label):
    if sys.version_info[0] == 2: # is Python 2.x
        label = label.decode('utf-8')
    label = cgi.escape(label)
    label = label.encode('ascii', 'xmlcharrefreplace')

    return label

Have a try, hope it helps!

Best regards

Clemens

On Friday, February 28, 2020 at 10:15:04 AM UTC+1, Silvan Marco Fin wrote:

Hi!
I have a problem with pdf output running in python3 environment. The problem I'm trying to solve is more complex, but I managed to strip it down to a very small example. If I get this running, I'm sure to manage in my actual situation.

I created a new application with a controller like this:

# -*- coding: utf-8 -*-

def index():
return dict(
data=UL(
A('Output html', _href=URL(print, extension='html')),
A('Output pdf', _href=URL(print, extension='pdf'))
),
message="hello from tiatpi.py")

def print():

return dict(content='Data with "dschörmän" Umlauts')

Silvan Marco Fin

unread,

Feb 28, 2020, 5:55:57 AM2/28/20

to web2py-users

Hi Clemens,

thanks for your answer, but I think that your code does not help to solve the problem. I just ran some more tests and if the 'content' variable only contains ASCII characters (tried 'Data without german umlauts.'), the problem happens all the same. IMHO it is not a problem with non ASCII characters and therefore not solved by escaping them by any means.

Maybe I had the wrong focus while writing the post.

Kind regards,

Silvan

Silvan Marco Fin

unread,

Mar 2, 2020, 4:30:27 AM3/2/20

to web2py-users

I believe this to be a bug, so I submitted a bug report: https://github.com/web2py/web2py/issues/2289

Reply all

Reply to author

Forward