Webpy and UnicodeDecodeError

Oscar Del Ben

unread,

Dec 18, 2009, 6:05:48 AM12/18/09

to

So I'm trying to send a file through webpy and urllib2 but I can't get
around these UnicodeErrors. Here's the code:

# controller

x = web.input(video_original={})
params = {'foo': x['foo']}

files = (('video[original]', 'test', x['video_original'].file.read
()),)
client.upload(upload_url, params, files, access_token())

# client library

def __encodeMultipart(self, fields, files):
"""
fields is a sequence of (name, value) elements for regular
form fields.
files is a sequence of (name, filename, value) elements for
data to be uploaded as files
Return (content_type, body) ready for httplib.HTTP instance
"""
boundary = mimetools.choose_boundary()
crlf = '\r\n'

l = []
for k, v in fields.iteritems():
l.append('--' + boundary)
l.append('Content-Disposition: form-data; name="%s"' % k)
l.append('')
l.append(v)
for (k, f, v) in files:
l.append('--' + boundary)
l.append('Content-Disposition: form-data; name="%s";
filename="%s"' % (k, f))
l.append('Content-Type: %s' % self.__getContentType(f))
l.append('')
l.append(v)
l.append('--' + boundary + '--')
l.append('')
body = crlf.join(l)

return boundary, body

def __getContentType(self, filename):
return mimetypes.guess_type(filename)[0] or 'application/octet-
stream'

def upload(self, path, post_params, files, token=None):

if token:
token = oauth.OAuthToken.from_string(token)

url = "http://%s%s" % (self.authority, path)

(boundary, body) = self.__encodeMultipart(post_params, files)

headers = {'Content-Type': 'multipart/form-data; boundary=%s' %
boundary,
'Content-Length': str(len(body))
}

request = oauth.OAuthRequest.from_consumer_and_token(
self.consumer,
token,
http_method='POST',
http_url=url,
parameters=post_params
)

request.sign_request(oauth.OAuthSignatureMethod_HMAC_SHA1(),
self.consumer, token)

request = urllib2.Request(request.http_url, postdata=body,
headers=headers)
request.get_method = lambda: 'POST'

return urllib2.urlopen(request)

Unfortunately I get two kinds of unicode error, the first one in the
crlf.join(l):

Traceback (most recent call last):
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 242, in process
return self.handle()
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 233, in handle
return self._delegate(fn, self.fvars, args)
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 412, in _delegate
return handle_class(cls)
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 387, in handle_class
return tocall(*args)
File "/Users/oscar/projects/work/whitelabel/code.py", line 328, in
POST
return simplejson.load(client.upload(upload_url, params, files,
access_token()))
File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
131, in upload
(boundary, body) = self.__encodeMultipart(post_params, files)
File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
111, in __encodeMultipart
body = crlf.join(l)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position
42: ordinal not in range(128)

And here's another one:

Traceback (most recent call last):
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 242, in process
return self.handle()
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 233, in handle
return self._delegate(fn, self.fvars, args)
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 412, in _delegate
return handle_class(cls)
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 387, in handle_class
return tocall(*args)
File "/Users/oscar/projects/work/whitelabel/code.py", line 328, in
POST
return simplejson.load(client.upload(upload_url, params, files,
access_token()))
File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
131, in upload
(boundary, body) = self.__encodeMultipart(post_params, files)
File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
111, in __encodeMultipart
body = crlf.join(l)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position
42: ordinal not in range(128)

Does anyone know why this errors happens and what I should do to
prevent them? Many thanks.

Oscar

Dave Angel

unread,

Dec 18, 2009, 10:43:21 AM12/18/09

to Oscar Del Ben, pytho...@python.org

I did a short test to demonstrate the likely problem, without all the
other libraries and complexity.

lst = ["abc"]
lst.append("def")
lst.append(u"abc")
lst.append("g\x48\x82\x94i")
print lst
print "**".join(lst)

That fragment of code generates (in Python 2.6) the following output and
traceback:

['abc', 'def', u'abc', 'gH\x82\x94i']

Traceback (most recent call last):

File "M:\Programming\Python\sources\dummy\stuff2.py", line 10, in <module>
print "**".join(lst)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x82 in position 2:
ordinal not in range(128)

You'll notice that one of the strings is a unicode one, and another one
has the character 0x82 in it. Once join() discovers Unicode, it needs
to produce a Unicode string, and by default, it uses the ASCII codec to
get it.

If you print your 'l' list (bad name, by the way, looks too much like a
'1'), you can see which element is Unicode, and which one has the \xb7
in position 42. You'll have to decide which is the problem, and solve
it accordingly. Was the fact that one of the strings is unicode an
oversight? Or did you think that all characters would be 0x7f or less?
Or do you want to handle all possible characters, and if so, with what
encoding?

DaveA

Oscar Del Ben

unread,

Dec 18, 2009, 10:51:29 AM12/18/09

to

Thanks for your reply DaveA.

Since I'm dealing with file uploads, I guess I should only care about
those. I understand the fact that I'm trying to concatenate a unicode
string with a binary, but I don't know how to deal with this. Perhaps
the uploaded file should be encoded in some way? I don't think this is
the case though.

Dave Angel

unread,

Dec 18, 2009, 10:18:34 PM12/18/09

to Oscar Del Ben, pytho...@python.org

Oscar Del Ben wrote:
> <snip>

>> You'll notice that one of the strings is a unicode one, and another one
>> has the character 0x82 in it. Once join() discovers Unicode, it needs
>> to produce a Unicode string, and by default, it uses the ASCII codec to
>> get it.
>>
>> If you print your 'l' list (bad name, by the way, looks too much like a
>> '1'), you can see which element is Unicode, and which one has the \xb7
>> in position 42. You'll have to decide which is the problem, and solve
>> it accordingly. Was the fact that one of the strings is unicode an
>> oversight? Or did you think that all characters would be 0x7f or less?
>> Or do you want to handle all possible characters, and if so, with what
>> encoding?
>>
>> DaveA
>>
>
> Thanks for your reply DaveA.
>
> Since I'm dealing with file uploads, I guess I should only care about
> those. I understand the fact that I'm trying to concatenate a unicode
> string with a binary, but I don't know how to deal with this. Perhaps
> the uploaded file should be encoded in some way? I don't think this is
> the case though.
>
>

You have to decide what the format of the file is to be. If you have
some in bytes, and some in Unicode, you have to be explicit about how
you merge them. And that depends who's going to use the file, and for
what purpose.

Before you try to do a join(), you have to do a conversion of the
Unicode string(s) to bytes. Try str.encode(), where you get to specify
what encoding to use.

In general, you want to use the same encoding for all the bytes in a
given file. But as I just said, that's entirely up to you.

DaveA