Request's GET and str_GET properties

Nikita Borisenko

unread,

Aug 2, 2011, 7:01:56 PM8/2/11

to webapp2

Hi there,

Thanks for starting this project, it seems to be something we have
been missing since the end of webapp development.

I have a question about the Request object, namely about its GET and
str_GET properties. According to documentation: "Both carry the same
values, but in the first they are converted to unicode, and in the
latter they are strings". This seems a bit misleading for me, because
I always used to think that Unicode is just a string encoding where
each symbol is represented by 1 - 4 bytes. Does str_GET give byte
string representation of the values?

Regards,
NIkita

Rodrigo Moraes

unread,

Aug 2, 2011, 7:29:56 PM8/2/11

to webapp2

.str_GET is the raw string as transported by HTTP (str or StringType,
or "standard Python byte strings").

.GET is the string after calling .decode(request_encoding) (and
request_encoding is UTF-8 by default).

It was nice that you brought this up because I was about to remove
references from the .str_* versions from the docs. It is confusing,
and they are about to be deprecated in WebOb during their quest to
provide Python 3 support (I think they are already deprecated in the
repo).

I hope this is clear.

-- rodrigo

Gabor Lenard

unread,

Aug 3, 2011, 3:48:46 PM8/3/11

to web...@googlegroups.com

Speaking of MultiDict, what is the best practice to save a value from the GET/POST dictionary? Sorry that I am a newbie.

That's what I try and the error message I get:

Foo(bar = self.request.POST['bar'])
BadValueError: Property bar must be a str or unicode instance, not a tuple
Foo(bar = self.request.POST.get('bar'))
BadValueError: Property bar must be a str or unicode instance, not a tuple
Foo(bar = unicode(self.request.POST.get('bar')))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 44: ordinal not in range(128)

Thanks in advance!

Gabor

Rodrigo Moraes

unread,

Aug 3, 2011, 5:43:20 PM8/3/11

to webapp2

On Aug 3, 4:48 pm, Gabor Lenard wrote:
> That's what I try and the error message I get:
>
> Foo(bar = self.request.POST['bar'])
> BadValueError: Property bar must be a str or unicode instance, not a tuple
>
> Foo(bar = self.request.POST.get('bar'))
> BadValueError: Property bar must be a str or unicode instance, not a tuple

This is very weird. Both POST['bar'] and POST.get('bar') should work.
Can you post a full example?

> Foo(bar = unicode(self.request.POST.get('bar')))
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 44: ordinal not in range(128)

You don't need to do this as the values are already decoded (normally
using 'UTF-8', which is the default request encoding). I believe the
exception is related to the first 2 errors.

-- rodrigo

Gabor Lenard

unread,

Aug 4, 2011, 1:04:08 AM8/4/11

to web...@googlegroups.com

This is very weird. Both POST['bar'] and POST.get('bar') should work.
Can you post a full example?

Sorry, my mistake: I had a comma at the end of the line, like this:

foo.bar=self.request.POST.get('bar'),

But still, I am having trouble persisting a simple string with accented characters. Here is my code:

#!/usr/bin/env python
# encoding: utf-8
from google.appengine.ext import db
import webapp2

class Foo(db.Model):
bar = db.StringProperty()

class HomeHandler(webapp2.RequestHandler):
def get(self):
self.response.out.write('<html><body><form action="/post" method="post" accept-charset="utf-8"><p>')
self.response.out.write('<input type="text" name="bar" value="á"><input type="submit" value="submit">')
self.response.out.write('</p></form></body></html>')

class PostHandler(webapp2.RequestHandler):
def post(self):
bar=self.request.POST['bar']
if bar:
bar = bar[:500]
foo = Foo(bar=bar)
foo.put()

app = webapp2.WSGIApplication([
webapp2.Route('/', HomeHandler),
webapp2.Route('/post', PostHandler),
], debug=True)

def main():
app.run()

if __name__ == '__main__':
main()

I always receive this:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

Thanks,

Gabor

Rodrigo Moraes

unread,

Aug 4, 2011, 5:08:29 AM8/4/11

to webapp2

On Aug 4, 2:04 am, Gabor Lenard wrote:
> But still, I am having trouble persisting a simple string with accented characters. Here is my code:

> [...]

> I always receive this:
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

Hm, I can't reproduce it. For me it works. Even tried changing default
encoding from my browser. The page is served with utf-8 encoding
anyway, because that is set in the default Response Content-Type.

What could it be? I'm puzzled.

-- rodrigo

Lenard Gabor

unread,

Aug 4, 2011, 10:23:43 AM8/4/11

to web...@googlegroups.com

Uploaded the app here: http://utf--8.appspot.com/
Try it with normal and accented characters (like á).

Sent from my iPad

Gabor Lenard

unread,

Aug 4, 2011, 12:28:04 PM8/4/11

to webapp2

Uploaded source code here: https://github.com/lenardgabor/utf--8

Gabor Lenard

unread,

Aug 4, 2011, 3:47:28 PM8/4/11

to webapp2

Now I added this method to retrieve the posted values:

def get_posted_field(self, field, maxlength=FIELD_MAX_LENGTH):
field = self.request.POST.get(field)
if field:
field = field.decode('utf-8')
if len(field) > maxlength:
field = field[:maxlength]
return field

This correctly decodes from UTF-8 to unicode so its value can be stored in db.Model fields.

Would be great if this would be somehow included in the MultiDict.get() or somewhere.

Gabor

Rodrigo Moraes

unread,

Aug 4, 2011, 5:29:25 PM8/4/11

to webapp2

On Aug 4, 4:47 pm, Gabor Lenard wrote:
> Now I added this method to retrieve the posted values:

> [...]

> This correctly decodes from UTF-8 to unicode so its value can be stored in db.Model fields.
> Would be great if this would be somehow included in the MultiDict.get() or somewhere.

Hi,

I reproduced it and it seems to be a bug. The values in the MultiDict
are str, while they should be unicode.

The quick solution is to add this to the beginning of
Request.__init__:

if kwargs.get('charset') is None:
_charset_re = re.compile(r';\s*charset=([^;\s]*)', re.I)
match = _charset_re.search(environ.get('CONTENT_TYPE',
''))
kwargs['charset'] = match.group(1).lower() if match else
'utf-8'

A permanent solution is coming.

Thanks for tracking this. :)

-- rodrigo

Rodrigo Moraes

unread,

Aug 5, 2011, 1:16:55 AM8/5/11

to webapp2

On Aug 4, 6:29 pm, Rodrigo Moraes wrote:
> A permanent solution is coming.

Changes are in the repo. New release is on the way, this weekend at
most. Sorry for the trouble. :)

-- rodrigo

Gabor Lenard

unread,

Aug 5, 2011, 1:28:53 AM8/5/11

to web...@googlegroups.com

Thank you very much. I am very grateful for that!

Gabor

Gabor Lenard

unread,

Aug 6, 2011, 4:44:04 PM8/6/11

to web...@googlegroups.com

Hi Rodrigo,

I've got another issue after the fix below. This looks like a side effect since the issue is just the opposite:

self.request.cookies.get() should return str but it returns unicode.

Because of that decoding of the previously encoded text fails.

This should work:

self.response.set_cookie('foo', urllib.quote(u'á').encode('utf-8'))
...
foo = urllib.unquote(self.request.cookies.get('foo')).decode('utf-8')

However, this fails because self.request.cookies.get() incorrectly returns unicode instead of str.

Could you look into this?

Thanks,

Gabor

Gabor Lenard

unread,

Aug 6, 2011, 4:53:28 PM8/6/11

to webapp2

However, if you would add support for transparent encoding/decoding, then request.cookies.get() could return unicode with all decoding already applied.

In this case I could remove all those quoute(s.encode()) and unquote(s).decode() parts as the framework would take care of that for me. My code would be much clearer and I could be sure that encoding/decoding only happens once.

And it would not change normal ASCII cookie contents as there would be nothing to be encoded/decoded in them. But they would also benefit from the automatic quote/unquote in the framework, and we would not have that issue that I reporter earlier that only the first word is returned from the cookie value if the cookie content was not quoted in set_cookie().

Rodrigo Moraes

unread,

Aug 6, 2011, 7:39:02 PM8/6/11

to webapp2

On Aug 6, 5:44 pm, Gabor Lenard wrote:
> This should work:
> self.response.set_cookie('foo', urllib.quote(u'á').encode('utf-8'))

Hmm. That didn't work for me:

>>> import urllib
>>> urllib.quote(u'á')
/usr/lib/python2.7/urllib.py:1238: UnicodeWarning: ...

To quote a unicode value, you'll have to encode it first.

Could you provide some test cases?

-- rodrigo

Gabor Lenard

unread,

Aug 7, 2011, 1:01:32 AM8/7/11

to web...@googlegroups.com

Sorry, it was quite late yesterday. I put the encode() one parenthesis off. This is what I meant that it should work:

self.response.set_cookie('foo', urllib.quote(u'á'.encode('utf-8')))

...
foo = urllib.unquote(self.request.cookies.get('foo')).decode('utf-8')

But I also updated my test project: https://github.com/lenardgabor/utf--8

You can see it running on appengine: http://utf--8.appspot.com/

Thank you,

Gabor

Rodrigo Moraes

unread,

Aug 7, 2011, 8:57:04 AM8/7/11

to webapp2

On Aug 7, 2:01 am, Gabor Lenard wrote:
> Sorry, it was quite late yesterday. I put the encode() one parenthesis off. This is what I meant that it should work:
>
> self.response.set_cookie('foo', urllib.quote(u'á'.encode('utf-8')))

I see what you mean. A step by step:

# Here is our test value.
x = u'föö'
# We must store cookies quoted. To quote unicode, we need to
encode it.
y = urllib.quote(x.encode('utf8'))
# The encoded, quoted string looks ugly.
self.assertEqual(y, 'f%C3%B6%C3%B6')
# But it is easy to get it back to our initial value.
z = urllib.unquote(y).decode('utf8')
# And it is indeed the same value.
self.assertEqual(z, x)

# Set a cookie using the encoded/quoted value.
rsp = webapp2.Response()
rsp.set_cookie('foo', y)
cookie = rsp.headers.get('Set-Cookie')
self.assertEqual(cookie, 'foo=f%C3%B6%C3%B6; Path=/')

# Get the cookie back.
req = webapp2.Request.blank('/', headers=[('Cookie', cookie)])
self.assertEqual(req.cookies.get('foo'), y)
# Here is our original value, again. Problem: the value is decoded
# before we had a chance to unquote it.
w =
urllib.unquote(req.cookies.get('foo').encode('utf8')).decode('utf8')
# And it is indeed the same value.
self.assertEqual(w, x)

I'm not sure how to "fix" this without introducing unexpected
behavior, given that webapp.Request (the original) always set the
charset to utf-8 if it is not defined in the Content-Type header. What
do you suggest?

-- rodrigo

Rodrigo Moraes

unread,

Aug 7, 2011, 9:54:46 AM8/7/11

to webapp2

Hey,

After a chat with WebOb maintainer, we have 2 solutions. It is
actually easier than we thought.

1) Use latest WebOb. It takes care of quoting cookie values since 18
months ago. App Engine will update WebOb with the upcoming Python 2.7
support, so this is a temporary solution.

# Most recent WebOb versions take care of quoting.
# (not the version available on App Engine though)

value = u'föö=bär; föo, bär, bäz=dïng;'
rsp = webapp2.Response()
rsp.set_cookie('foo', value)

cookie = rsp.headers.get('Set-Cookie')

req = webapp2.Request.blank('/', headers=[('Cookie', cookie)])

self.assertEqual(req.cookies.get('foo'), value)

2) Use WebOb that ships with App Engine, and quote values but get
cookies using .str_cookies:

# With quote, easy way

value = u'föö=bär; föo, bär, bäz=dïng;'
quoted_value = urllib.quote(value.encode('utf8'))
rsp = webapp2.Response()
rsp.set_cookie('foo', quoted_value)

cookie = rsp.headers.get('Set-Cookie')

req = webapp2.Request.blank('/', headers=[('Cookie', cookie)])

cookie_value = req.str_cookies.get('foo')
unquoted_cookie_value =
urllib.unquote(cookie_value).decode('utf-8')
self.assertEqual(cookie_value, quoted_value)
self.assertEqual(unquoted_cookie_value, value)

And here's the chat:

<sergeys> moraes: hi
<moraes> hi sergeys
<sergeys> i guess this could be the channel, if mcdonc doesn't mind
<sergeys> i'm the webob maintainer anyway
<moraes> great.
<moraes> thanks for the good work. :)
<sergeys> did you have anything in particular to ask/suggest?
<sergeys> moraes: thanks :)
<moraes> could you please take a look at this?
http://groups.google.com/group/webapp2/msg/ce5bc941b886ffa0
<sergeys> storing unicode cookies is done like this
resp.set_cookie('key', u'foo')
<moraes> thing is. probably we shoudl not do this. we set charset to
utf8 if it is not set in content-type (like webapp, from app engine)
<sergeys> you just need a non-ancient webob
<moraes> i tested with 1.0.8
<sergeys> ok, there are a lof emails in that thread, i'll read them,
but what specificically is the question?
<moraes> that last one summarizes it.
<moraes> the problem is: i set a quoted value. it is a str. because my
request uses utf-8 by default, when i get the cookie it decodes it.
<moraes> and in tne end, to get the original value i have:

urllib.unquote(req.cookies.get('foo').encode('utf8')).decode('utf8')

<sergeys> don't set quoted values, cookie quoting is different and
weird, just let cookies module handle the quoting
<moraes> :P
<moraes> ah i thought webob didn't do any quoting
<moraes> so i always had to quote or serialize in some way
<sergeys> it didn't some time ago, but it does for something like 18
months now
<moraes> that's much better then
<moraes> yeah i guess the one gae uses still doesn't do
<sergeys> the quoting necessary is weird and i put it all in webob, so
people don't have to mess with it
<moraes> but i can tell the user to use latest one (and gae will soon
have it too)
<sergeys> anyway, if you really already have a cookie like that you
can do urllib.unquote(req.str_cookies['foo']).decode('utf8')
<moraes> ok, thanks! i wasn't aware.
<moraes> ah and theres's .str_cookies too.
sergeys> also, cookies are stored as non-unicode strings on the
browser side, they are quoted to only have a subset of ASCII
characters in them, so the page encoding should not change what data
gets sent as a cookie in a request

-- rodrigo

Gabor Lenard

unread,

Aug 7, 2011, 10:00:40 AM8/7/11

to web...@googlegroups.com

On 07.08.2011, at 15:54, Rodrigo Moraes wrote:

After a chat with WebOb maintainer, we have 2 solutions. It is
actually easier than we thought.