Yes, I did find a similar workaround as param.encode('latin1').decode('utf-8').
However, it would be much easier if the string was properly decoded as utf-8
to begin with, rather than first decoding each character separately as latin1
and then converting the string to utf-8 which is what it seems to be doing at
the moment. And I'm still not sure whether this is an error on my part for not
properly specifying the encoding somewhere, or a bug in CherryPy.
On Sunday, July 29, 2012 3:57:37 PM UTC+2, Tim wrote:
You might try specifying utf8 in the encoded method like so - param.encode('utf-8')
you might also try # -*- coding : utf-8 -*- although I think that is just to tell the editor the source should be unicode
I don't "know " the answer but thats attempt at answering it
On Thursday, July 26, 2012 8:04:07 PM UTC-5, Behold wrote:While I have played around a bit off and on with CherryPy for about a year or so, I am still quite the newbie when it comes to actually using it properly. As such, I thought I would try my hands at building a larger web application to get more properly acquainted with the library. However, the project I am working on requires the ability to use Unicode (UTF-8) characters in the URL, as parameters to the page handler. But I have been going nearly insane trying to get this to actually work; it simply will not give me a properly decoded Unicode string. For example, when I try browsing to '/å', my 'default' method receives 'Ã¥' as a parameter (that's b'\xc3\x83\xc2\xa5' in raw bytes---each byte in the two-byte character 'å' gets decoded separately and then encoded as b'\xc3\x83' and b'\xc2\xa5', respectively). Here is a simple example:
#!/usr/bin/python3
import cherrypy
class Root:
@cherrypy.expose
def default(self, param):
return "Got: {} ({})".format(param, repr(param.encode()))
def main():
cherrypy.config.update({
'server.socket_port': 8080,
'tools.encode.on': True,
'tools.encode.encoding': 'utf-8'
})
cherrypy.tree.mount(Root(), '/')
cherrypy.server.start()
cherrypy.engine.block()
if __name__ == "__main__":
main()
The response when visiting localhost:8080/å is "Got: Ã¥ (b'\xc3\x83\xc2\xa5')". The GET request sent from the browser is 'GET /%C3%A5 HTTP/1.1 ', yet the request that CherryPy logs to stdout is 'GET /\xc3\x83\xc2\xa5 HTTP/1.1'. It would seem that the erroneous decoding takes place somewhere quite early in the process.
I simply do not know how to progress from here, nor what I am doing wrong. I cannot find much in the way of helpful information on Google, and I do not know where to start looking in trying to solve the problem myself. I have tried playing around a little bit---for example, changing the encoding in the global configuration to 'ascii' results in the error message (406, 'Your client did not send an Accept-Charset header. We tried these charsets: ascii.')---but other than that I have no idea where to look, as I do not have sufficient experience with CherryPy.
I am using CherryPy 3.2.2 with Python 3.2.3 on 64-bit Linux.