Re: Unicode in URLs with CherryPy 3.2.2 and Python 3

506 views
Skip to first unread message

Behold

unread,
Jul 29, 2012, 2:10:27 PM7/29/12
to cherryp...@googlegroups.com
Yes, I did find a similar workaround as param.encode('latin1').decode('utf-8').
However, it would be much easier if the string was properly decoded as utf-8
to begin with, rather than first decoding each character separately as latin1
and then converting the string to utf-8 which is what it seems to be doing at
the moment. And I'm still not sure whether this is an error on my part for not
properly specifying the encoding somewhere, or a bug in CherryPy.

On Sunday, July 29, 2012 3:57:37 PM UTC+2, Tim wrote:
You might try specifying utf8 in the encoded method like so - param.encode('utf-8') 
you might also try # -*- coding : utf-8 -*- although I think that is just to tell the editor the source should be unicode 

I don't "know " the answer but thats attempt at answering it 
 

On Thursday, July 26, 2012 8:04:07 PM UTC-5, Behold wrote:
While I have played around a bit off and on with CherryPy for about a year or so, I am still quite the newbie when it comes to actually using it properly. As such, I thought I would try my hands at building a larger web application to get more properly acquainted with the library. However, the project I am working on requires the ability to use Unicode (UTF-8) characters in the URL, as parameters to the page handler. But I have been going nearly insane trying to get this to actually work; it simply will not give me a properly decoded Unicode string. For example, when I try browsing to '/å', my 'default' method receives 'Ã¥' as a parameter (that's b'\xc3\x83\xc2\xa5' in raw bytes---each byte in the two-byte character 'å' gets decoded separately and then encoded as b'\xc3\x83' and b'\xc2\xa5', respectively). Here is a simple example:


#!/usr/bin/python3

import cherrypy

class Root:

   @cherrypy.expose
   def default(self, param):
     return "Got: {} ({})".format(param, repr(param.encode()))


def main():
   cherrypy.config.update({
     'server.socket_port': 8080,
     'tools.encode.on': True,
     'tools.encode.encoding': 'utf-8'
   })

   cherrypy.tree.mount(Root(), '/')

   cherrypy.server.start()
   cherrypy.engine.block()

if __name__ == "__main__":
  main()


The response when visiting localhost:8080/å is "Got: Ã¥ (b'\xc3\x83\xc2\xa5')". The GET request sent from the browser is 'GET /%C3%A5 HTTP/1.1 ', yet the request that CherryPy logs to stdout is 'GET /\xc3\x83\xc2\xa5 HTTP/1.1'. It would seem that the erroneous decoding takes place somewhere quite early in the process.

I simply do not know how to progress from here, nor what I am doing wrong. I cannot find much in the way of helpful information on Google, and I do not know where to start looking in trying to solve the problem myself. I have tried playing around a little bit---for example, changing the encoding in the global configuration to 'ascii' results in the error message (406, 'Your client did not send an Accept-Charset header. We tried these charsets: ascii.')---but other than that I have no idea where to look, as I do not have sufficient experience with CherryPy.

I am using CherryPy 3.2.2 with Python 3.2.3 on 64-bit Linux.

Alan Pound

unread,
Jul 29, 2012, 3:22:46 PM7/29/12
to cherryp...@googlegroups.com
Hi
I recall having encoding nightmares when I first started out with cherrypy. I ended up with the unpleasantness below, which I call just
before daemonising... Read it and you will get the idea (I haven't the foggiest how to deal with a windoze installation...)

In fact, you only have to do the sitecustomize.py thing once - but putting this in the code means it (usually) works when installed
elsewhere...

This might not be anything to do with the problem you are seeing - (and there may well be a *far more elegant* way of dealing with it -
in which case, just belly-laugh and move on...

;^)

Hope it helps (and if anyone knows better - please tell me...)

Alan


def check_encoding():
# this is a really horrible kludge to ensure support of utf-8 encoding.
# essentially, to support utf-8, we need "\nimport sys\nsys.setdefaultencoding('utf-8')" in sitecustomize.py (NOTICE THE 'z')
# you can't just setdefaultencoding() once running - it doesn't work...
# sitecustomize.py pops up in a number of seemingly random places... depending upon the installation
# the upshot is - if you change the type of linux you are running on - you must *check* that this still works...

if sys.getdefaultencoding() != 'utf-8':
if not os.path.exists( 'set_utf8.txt'):
print '\nEncoding is *not* utf-8, trying to set it for next time...'

fp = open( 'set_utf8.txt', 'w'); fp.close() # make a file to avoid trying this twice
sc = "\nimport sys\nsys.setdefaultencoding( 'utf-8')\n"

flag = False
if os.path.exists( '/usr/lib/python2.6/sitecustomize.py'): # ubuntu, append to existing file...
print 'Patching /usr/lib/python2.6/sitecustomize.py'
fp = open( '/usr/lib/python2.6/sitecustomize.py', 'a')
fp.write( sc); fp.close()
flag = True

if os.path.exists( '/usr/lib/python2.6/site-packages'): # for (fedora?), add to site-packages
print 'Patching /usr/lib/python2.6/site-packages/sitecustomize.py'
fp = open( '/usr/lib/python2.6/site-packages/sitecustomize.py', 'a')
fp.write( sc); fp.close()
flag = True

if os.path.exists( '/usr/lib/python2.7/sitecustomize.py'): # ubuntu, append to existing file...
print 'Patching /usr/lib/python2.7/sitecustomize.py'
fp = open( '/usr/lib/python2.7/sitecustomize.py', 'a')
fp.write( sc); fp.close()
flag = True

if os.path.exists( '/usr/lib/python2.7/site-packages'): # for (fedora?), add to site-packages
print 'Patching /usr/lib/python2.7/site-packages/sitecustomize.py'
fp.write( sc); fp.close()
flag = True

if flag:
time.sleep( 2)
vers = '.'.join( sys.version.split( '.')[:2])
os.system( 'python%s %s restart' % (vers, sys.argv[0]))
sys.exit( 0)

else:
print 'Sorry - run out of ideas here...\n'
else:
print "Encoding is *not* utf-8, but we've tried to set it once - so giving up trying to change it\n"
fp = open( 'set_utf8_failed.txt', 'w'); fp.close() # leave evidence of our failure...

print 'Info: Encoding is %s' % sys.getdefaultencoding()



winmail.dat

Alan Pound

unread,
Jul 29, 2012, 3:24:43 PM7/29/12
to cherryp...@googlegroups.com

Sorry, I just spotted you are using python3.
It might be the same, it might be different...

Alan



winmail.dat

Behold

unread,
Jul 30, 2012, 1:08:12 PM7/30/12
to cherryp...@googlegroups.com
It would seem that, fortunately (though I guess unfortunately
at the same time since I still do not have a proper solution to
my own problem), your... "solution"... isn't relevant to my
problem; sys.getdefaultencoding() does seem to return 'utf-8'
at all times. Thanks anyway.
Reply all
Reply to author
Forward
0 new messages