UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 11

4,813 views
Skip to first unread message

mcl

unread,
Jul 15, 2008, 3:57:37 AM7/15/08
to Google App Engine
The above gives a server error and the message does not identify my
statement causing the error.

I am reading data from a csv file which has the odd character with
ascii values above 127. In the above error case the character is 0x92.

I am using normal string functions and concatenation by 'string +
string'

How do I solve the above, apart from converting all characters above
127 offline.

At the moment I have no code / statements indicating what character
set I am using.

I have no serious knowledge of character sets other than knowing
unicode is 16 bits or 2 bytes, but I did not think I needed Unicode
for 0x92

I probably need some idiots guide,

Any help / guidance appreciated

Richard

chenbaiping

unread,
Jul 15, 2008, 4:11:49 AM7/15/08
to google-a...@googlegroups.com
Save your csv file as utf-8 encoded. And try
Your_data.decode( 'utf-8', 'ignore')

Or if you know the charset of the cvs file, try:
Your_data.decode( cvs_file_charset, 'ignore')

发件人: google-a...@googlegroups.com
[mailto:google-a...@googlegroups.com] 代表 mcl
发送时间: 2008年7月15日 15:58
收件人: Google App Engin
主题: [google-appengine] UnicodeDecodeError: 'ascii' codec can't decode byte
0x92 in position 11

Richard
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com
To unsubscribe from this group, send email to
google-appengi...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en


anteater_sa

unread,
Jul 15, 2008, 9:50:51 AM7/15/08
to Google App Engine

anteater_sa

unread,
Jul 15, 2008, 10:32:17 AM7/15/08
to Google App Engine
welcome to the wonderful world of unicode my friend:

http://boodebr.org/main/python/all-about-python-and-unicode

good luck

On Jul 15, 8:57 am, mcl <mcl.off...@googlemail.com> wrote:

Haitao

unread,
Jul 15, 2008, 4:26:14 PM7/15/08
to Google App Engine
Another lesson I have learned is that when you deal with text input
within <form> tags, use request.str_GET[ ] and request.str_POST[ ]
instead of request.get(). request.get() can return either unicode
string or encoded byte string, based on charset header sent by client.
But FF and IE have different behaviors with that header, and you will
get UnicodeDecodeError here and there. str_GET and str_POST return the
raw string and you can always call decode() to get unicode string.

Haitao Li

=======================================
http://my-life.appspot.com - Your private online journal.

mcl

unread,
Jul 15, 2008, 7:26:06 PM7/15/08
to Google App Engine
Thank you all for your useful replies.

I obviously need to up my game, with regard to charsets and unicode.

anteater_sa

unread,
Jul 17, 2008, 1:09:11 PM7/17/08
to Google App Engine
I think this can be fixed by setting the correct charset in your html
document i.e. <meta http-equiv="Content-Type" content="text/html;
charset=UTF-8">


On Jul 15, 9:26 pm, Haitao <lht1...@gmail.com> wrote:
> Another lesson I have learned is that when you deal with text input
> within <form> tags, use request.str_GET[ ] and request.str_POST[ ]
> instead of request.get(). request.get() can return either unicode
> string or encoded byte string, based on charset header sent by client.
> But FF and IE have different behaviors with that header, and you will
> get UnicodeDecodeError here and there. str_GET and str_POST return the
> raw string and you can always call decode() to get unicode string.
>
> Haitao Li
>

José Oliver Segura

unread,
Jul 18, 2008, 5:38:25 AM7/18/08
to google-a...@googlegroups.com
On Thu, Jul 17, 2008 at 7:09 PM, anteater_sa <mr.an...@gmail.com> wrote:
>
> I think this can be fixed by setting the correct charset in your html
> document i.e. <meta http-equiv="Content-Type" content="text/html;
> charset=UTF-8">
>
> On Jul 15, 9:26 pm, Haitao <lht1...@gmail.com> wrote:
>> Another lesson I have learned is that when you deal with text input
>> within <form> tags, use request.str_GET[ ] and request.str_POST[ ]
>> instead of request.get(). request.get() can return either unicode
>> string or encoded byte string, based on charset header sent by client.
>> But FF and IE have different behaviors with that header, and you will
>> get UnicodeDecodeError here and there. str_GET and str_POST return the
>> raw string and you can always call decode() to get unicode string.

I'm having a similar problem, with a little weird behaviour:

I have an entity stored in the DataStore with a name of: "Las crónicas
de Narnia: El príncipe Caspian" (a movie, Narnia's chronicles). I can view it
without any problem in the data viewer and in the corresponding detail view
in my app. I can even view it in a list results page when I do a search by
entities having "narnia". The problem only arises when I use non-ascii
characters
in the searchbox; for example, when I look for "crónicas", I get the typical
error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in
position 2: ordinal not in range(128)

(but the curious thing is that it comes from:)

File "/home/joseo/Dev/google_appengine/google/appengine/ext/db/__init__.py",
line 1301, in fetch
raw = self._get_query().Get(limit, offset)
File "/home/joseo/Dev/google_appengine/google/appengine/api/datastore.py",
line 928, in Get
return self._Run(limit, offset)._Next(limit)
File "/home/joseo/Dev/google_appengine/google/appengine/api/datastore.py",
line 869, in _Run
apiproxy_stub_map.MakeSyncCall('datastore_v3', 'RunQuery', pb, result)
File "/home/joseo/Dev/google_appengine/google/appengine/api/apiproxy_stub_map.py",
line 46, in MakeSyncCall
stub.MakeSyncCall(service, call, request, response)
File "/home/joseo/Dev/google_appengine/google/appengine/api/datastore_file_stub.py",
line 264, in MakeSyncCall
(getattr(self, "_Dynamic_" + call))(request, response)
File "/home/joseo/Dev/google_appengine/google/appengine/api/datastore_file_stub.py",
line 531, in _Dynamic_RunQuery
if clone in self.__query_history:
File "/home/joseo/Dev/google_appengine/google/appengine/api/datastore_file_stub.py",
line 65, in <lambda>
datastore_pb.Query.__hash__ = lambda self: hash(self.Encode())
File "/home/joseo/Dev/google_appengine/google/net/proto/ProtocolBuffer.py",
line 50, in Encode
self.Output(e)
File "/home/joseo/Dev/google_appengine/google/net/proto/ProtocolBuffer.py",
line 147, in Output
self.OutputUnchecked(e)
File "/home/joseo/Dev/google_appengine/google/appengine/datastore/datastore_pb.py",
line 654, in OutputUnchecked
self.filter_[i].OutputUnchecked(out)
File "/home/joseo/Dev/google_appengine/google/appengine/datastore/datastore_pb.py",
line 222, in OutputUnchecked
self.property_[i].OutputUnchecked(out)
File "/home/joseo/Dev/google_appengine/google/appengine/datastore/entity_pb.py",
line 1030, in OutputUnchecked
self.value_.OutputUnchecked(out)
File "/home/joseo/Dev/google_appengine/google/appengine/datastore/entity_pb.py",
line 702, in OutputUnchecked
out.putPrefixedString(self.stringvalue_)
File "/home/joseo/Dev/google_appengine/google/net/proto/ProtocolBuffer.py",
line 328, in putPrefixedString
a.fromstring(v)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfa' in
position 0: ordinal not in range(128)

that is, when trying to get a value from the datastore that,
before, and after, is ok (can be retrievend and rendered ok, only
fails when the search string includes non-ascii characters)

I've tried with request.str_GET[] instead of request.get(), all
my pages include the meta utf-8 tag, but I can't manage it to work;
I've tried all combinations like:

a)

what = self.request.str_GET['what']
if (what is not None):
what = what.decode('utf-8')

b)

what = self.request.get('what')

c)

what = self.request.str_GET['what']

d)

what = self.request.get('what')
if (what is not None):
what = what.decode('utf-8')

The only strange thing is that this entity is a
search.Searchable subclass, not a db.Model subclass, and I'm
performing a filter using the search(what) method.

If I change the entity to be a db.Model subclass and use a
filter('tags =',what) filter, it works without any problem just using
the string from self.request.get('what') (it fails with any of the
str_GET/decode utf-8)

So, it looks like search.Searchable produces errors where
db.Model pass without any problem?

I'm a little bit confused about this behaviour... any hint?

best,
Jose

Reply all
Reply to author
Forward
0 new messages