POST data causes error: type 'exceptions.UnicodeDecodeError'>: 'ascii' codec can't decode byte X in position Y: ordinal not in range(128)

122 views
Skip to first unread message

dburns

unread,
Jun 25, 2009, 9:51:08 PM6/25/09
to Google App Engine
Hi,

My app normally works fine, but certain POST data is causing it to
blow up with a 500 (internal server error). The log contains the
message in the stack trace below. Trouble is, my app hasn't even got
started yet! It happens right at the run_wsgi_app line. My
webapp.RequestHandler has not received control yet, so what can I do?

The error mentions ascii, but I don't know why it assumes the POST
data was ascii. It's probably UTF-8. I say probably since the POST
data isn't directly in my control (it's from Facebook, since this is a
Facebook app).

Some thoughts:
1) Is there a bug in the run_wsgi_app code? Seems to me that data
arriving from outside shouldn't be able to cause an internal server
error.
2) Could the POST data be malformed? I don't know much about POST
data -- perhaps it's identified as ascii, in error? I'd like to trap
it and examine it but I don't know how to do that without the help of
webapp.RequestHandler (the app dies before this gets invoked).

My code is the standard mainline code

def main():
run_wsgi_app(application)

if __name__ == "__main__":
main()


Here's the stack trace from my log. Thanks for any help!


<type 'exceptions.UnicodeDecodeError'>: 'ascii' codec can't decode
byte 0xc3 in position 10988: ordinal not in range(128)
Traceback (most recent call last):
File "/base/data/home/apps/(my directory)/index.py", line 216, in
<module>
main()
File "/base/data/home/apps/(my directory)/index.py", line 213, in
main
run_wsgi_app(application)
File "/base/python_lib/versions/1/google/appengine/ext/webapp/
util.py", line 76, in run_wsgi_app
result = application(env, _start_response)
File "/base/python_lib/versions/1/google/appengine/ext/webapp/
__init__.py", line 521, in __call__
response.wsgi_write(start_response)
File "/base/python_lib/versions/1/google/appengine/ext/webapp/
__init__.py", line 241, in wsgi_write
body = self.out.getvalue()
File "/base/python_dist/lib/python2.5/StringIO.py", line 270, in
getvalue
self.buf += ''.join(self.buflist)

dburns

unread,
Jun 26, 2009, 12:16:39 AM6/26/09
to Google App Engine
Here's an update. I found out how to do basic CGI so I was able to
examine the POST data directly. Interestingly, I got no errors
decoding the POST data as ascii! Does this point to a bug in the
run_wsgi_app code?

Here's my new mainline. The output I got (minus the very long POST
data) is "is_ascii: True No error decoding as ascii".

def is_ascii(s):
return all(ord(c) in range(128) for c in s)

def main():
print "Content-Type: text/html" # HTML is following
print # blank line, end of headers
s = sys.stdin.read()
print 'DATA: '
print s
print 'is_ascii: ' + str(is_ascii(s))
try:
s.decode('ascii')
except UnicodeDecodeError:
print "UnicodeDecodeError: not a ascii-encoded unicode string"
else:
print "No error decoding as ascii"
# run_wsgi_app(application)

dburns

unread,
Jul 1, 2009, 1:20:24 AM7/1/09
to Google App Engine
Here's what I found out, in case it helps someone. Turns out I mis-
read the stack trace. The exception happened AFTER calling my
handler, not before (trace messages in my handler didn't show up
because of buffering, reinforcing my incorrect theory that it hadn't
been invoked).

The issue was that in this one case I happened to be emitting some
valid text that contained the 0xc3 byte. After my handler was
invoked, the framework was trying to gather the output using a
StringIO, which contains this warning:

The StringIO object can accept either Unicode or 8-bit strings,
but
mixing the two may take some care. If both are used, 8-bit strings
that
cannot be interpreted as 7-bit ASCII (that use the 8th bit) will
cause
a UnicodeError to be raised when getvalue() is called.

and that's exactly what happened to me. Turns out because I was
inadvertently emitting a unicode string at one point, followed later
by the 0xc3 byte, it caused the exception. Simply by wrapping the
unicode string in str(variable_name), I was able to make the problem
go away. I'm not 100% sure I understand why, but emitting u"UTEST"
causes the exception but "TEST" does not (when the 0xc3 byte is also
present somewhere in the output).

It's a bit scary just how easy it is to lay a really obscure trap for
yourself like this. I have to be very careful not to emit any unicode
strings, it seems. Any guidance/advice appreciated.
Reply all
Reply to author
Forward
0 new messages