UTF-8 in query string

67 views
Skip to first unread message

Pete Zaitcev

unread,
May 9, 2012, 4:02:52 PM5/9/12
to paste...@googlegroups.com, Pete Zaitcev
Hello:

OpenStack Swift contains a build-time (unit) test that looks like this:

......
req = Request.blank('/sda1/p/a?%s=\xce' % param,
environ={'REQUEST_METHOD': 'GET'})
resp = self.controller.GET(req)
self.assertEquals(resp.status_int, 400)
req = Request.blank('/sda1/p/a?%s=\xce\xa9' % param,
environ={'REQUEST_METHOD': 'GET'})
resp = self.controller.GET(req)
self.assert_(resp.status_int in (204, 412), resp.status_int)

As you can see, a UTF-8 character in the first test is incomplete and
is supposed to throw an exception. But the second test should work.
The tests like these blow up:

Traceback (most recent call last):
File "/q/zaitcev/hail/swift-tip/test/unit/account/test_server.py", line 976, in test_params_utf8
resp = self.controller.GET(req)
File "/q/zaitcev/hail/swift-tip/swift/account/server.py", line 220, in GET
req.accept = 'application/%s' % query_format.lower()
File "/usr/lib/python2.6/site-packages/webob/request.py", line 1173, in __setattr__
object.__setattr__(self, attr, value)
File "/usr/lib/python2.6/site-packages/webob/acceptparse.py", line 354, in fset
val = str(val)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u03c9' in position 12: ordinal not in range(128)

So, two questions:

1. Is it legal to supply a query string with parameters encoded this way?

2. Any recommendations as to how to deal with applications that do it?

Thanks,
-- Pete

Sergey Schetinin

unread,
May 9, 2012, 4:07:03 PM5/9/12
to Pete Zaitcev, paste...@googlegroups.com
Please provide an actual test case, so it's easier to see what's happening.
> --
> You received this message because you are subscribed to the Google Groups "Paste Users" group.
> To post to this group, send email to paste...@googlegroups.com.
> To unsubscribe from this group, send email to paste-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/paste-users?hl=en.
>

Ian Bicking

unread,
May 9, 2012, 4:08:38 PM5/9/12
to Pete Zaitcev, paste...@googlegroups.com
If you look at the exception, the problem is:


   req.accept = 'application/%s' % query_format.lower()

HTTP headers are expected to be ASCII (though arguably Latin1 would be okay).  I don't know what's being stuffed in there, though it isn't \xce\xa9 (\u03c9 is \xcf\x89 in utf8)


Pete Zaitcev

unread,
May 9, 2012, 4:33:17 PM5/9/12
to Ian Bicking, paste...@googlegroups.com
On Wed, 9 May 2012 15:08:38 -0500
Ian Bicking <i...@ianbicking.org> wrote:

> req.accept = 'application/%s' % query_format.lower()

> I don't know what's being stuffed in there, though it isn't
> \xce\xa9 (\u03c9 is \xcf\x89 in utf8)

Oooh, yes! Thanks, Ian. I am sure now that .lower() damages it.
One of the tested query parameters of CloudFiles is "format="
that sets HTTP headers indirectly.

-- Pete
Reply all
Reply to author
Forward
0 new messages