Unicode Support for Python 2.5

120 views
Skip to first unread message

Prakhil Samar

unread,
Jun 30, 2011, 3:41:26 AM6/30/11
to Google App Engine
Hi All

Please help me out with the UNICODE issue on Google App Engine

As GAE is using Python 2.5 which has default encoding system as
"ASCII" and python 3.0 has default encoding system as "UNICODE"

I have created one CSV file which contains some unicode characters
like à á â these characters are not in ASCII range, so when i import
the CSV file i get following error:

" <type 'exceptions.UnicodeDecodeError'>: 'ascii' codec can't decode
byte 0xe0 in position 0: ordinal not in range(128) "

I have set the default encoding as "UTF 8 " in the Lib folder of
Python 2.5 and it works all fine at my local server but when i deploy
the application on the Google App server then it gives me the above
error.

Is there any way to set the default encoding for my application on
Google App Engine platform ??

Is there any new version of Google Appengine supporting Python 3.0 ??

Anyone out there please help me out to resolve this issue

Thanks in Advance
Prakhil :)

T. Abilo

unread,
Jun 30, 2011, 5:26:38 AM6/30/11
to google-a...@googlegroups.com
I had the same issue. You can avoid this issue (which is caused by your unicode wrongly being handed as a str), simply by forcing it to be unicode like u'mystring' for example.

Julian Namaro

unread,
Jun 30, 2011, 9:15:59 PM6/30/11
to Google App Engine
Python2.5 supports unicode and GAE uses it pretty much everywhere
already. I'd say you just have to tell the python method opening your
CSV file to use utf8 encoding, or use decode('utf-8') on the raw data.

Prakhil Samar

unread,
Jul 1, 2011, 2:45:44 AM7/1/11
to Google App Engine
Hi Julian,

Thanks for your information.

I am trying to upload a CSV file which contains some special
characters (Unicode characters which are not in the ASCII range). The
following is the data which the CSV file contains:

Name Address City
Country State Zip Cde Region
à á â 5334 Swenson Avenue New York United States of America
California 12345 North America


I am using the following Code for reading the data from HTML and
process the data in CSV file:

HTML page:
<input type='file' name="file1" id="idfileupload” />

Myfile.py:
csv_file = self.request.get('file1')
csv_reader = unicode_csv_reader(csv_file)
fileReader=csv.reader(csv_file.split("\r
\n"),skipinitialspace=True,quotechar='"',quoting=csv.QUOTE_MINIMAL)
for reader in fileReader:
for read in reader:
r = read.strip()
r = unicode(r, 'utf-8')
<process the data>

Now, when I try to upload the file, I am getting the following error:”

“ UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position
0: ordinal not in range(128) “

What I understood is, After reading the data from CSV file, I am
getting a 2 byte string in variable "read" in the above code. and as
per my understanding we cannot convert 2 bytes string to unicode so it
is giving the above error

I also tried, decode() and encode() methods. I am getting the same
error.

Please help me out, to read the data from CSV file and convert that to
Unicode and upload in the database?

Looking forward to hear from you

Regards
Prakhil

On Jul 1, 6:15 am, Julian Namaro <namarojul...@gmail.com> wrote:
> Python2.5 supportsunicodeand GAE uses it pretty much everywhere
> already. I'd say you just have to tell the python method opening your
> CSV file to use utf8 encoding, or use decode('utf-8') on the raw data.
>
> On Jun 30, 4:41 pm, prakhil samar <prakhilsa...@gmail.com> wrote:
>
>
>
> > Hi All
>
> > Please help me out with theUNICODEissue on Google App Engine

Julian Namaro

unread,
Jul 4, 2011, 7:26:32 AM7/4/11
to Google App Engine
Prakhil,
Did you resolve this ?
You need to do the convertion to unicode before any split or strip,
try
csv_file = self.request.get('file1').decode('utf-8') on your first
line or if it is urlencoded:
csv_file = urllib.unquote( self.request.get('file1') ).decode('utf-8')

Prakhil Samar

unread,
Jul 5, 2011, 2:34:22 AM7/5/11
to Google App Engine
Thanks again Julian

i tried the syntax that you have given, but this does not solves our
problem

This is the error now i m getting when i use csv_file =
self.request.get('file1').decode('utf-8')

Traceback (most recent call last):
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/
webapp/__init__.py", line 702, in _call_
handler.post(*groups)
File "/base/data/home/apps/appsarp56/12.351609311322083742/csv123.py",
line 183, in post
csv_file = self.request.get('file1').decode('utf-8')
File "/base/python_runtime/python_dist/lib/python2.5/encodings/
utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position
164-165: invalid data

Please suggest me any way to read the unicode data from the CSV file
in python 2.5.


Regards
Prakhil


On Jul 4, 4:26 pm, Julian Namaro <namarojul...@gmail.com> wrote:
> Prakhil,
> Did you resolve this ?
> You need to do the convertion tounicodebefore any split or strip,
> try
> csv_file = self.request.get('file1').decode('utf-8') on your first
> line or if it is urlencoded:
> csv_file = urllib.unquote( self.request.get('file1') ).decode('utf-8')
>
> On Jul 1, 3:45 pm, prakhil samar <prakhilsa...@gmail.com> wrote:
>
>
>
> > Hi Julian,
>
> > Thanks for your information.
>
> > I am trying to upload a CSV file which contains some special
> > characters (Unicodecharacters which are not in the ASCII range). The
> > following is the data which the CSV file contains:
>
> > Name              Address                       City
> > Country                         State           Zip Cde Region
> > à á â        5334           Swenson Avenue   New York        United States of America
> > California      12345   North America
>
> > I am using the following Code for reading the data from HTML and
> > process the data in CSV file:
>
> > HTML page:
> > <input type='file' name="file1" id="idfileupload” />
>
> > Myfile.py:
> > csv_file = self.request.get('file1')
> > csv_reader = unicode_csv_reader(csv_file)
> > fileReader=csv.reader(csv_file.split("\r
> > \n"),skipinitialspace=True,quotechar='"',quoting=csv.QUOTE_MINIMAL)
> > for reader in fileReader:
> > for read in reader:
> >         r = read.strip()
> >         r  =unicode(r, 'utf-8')
> >         <process the data>
>
> > Now, when I try to upload the file, I am getting the following error:”
>
> > “ UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position
> > 0: ordinal not in range(128) “
>
> > What I understood is, After reading the data from CSV file, I am
> > getting a 2 byte string in variable "read" in the above code. and as
> > per my understanding we cannot convert 2 bytes string tounicodeso it
> > is giving the above error
>
> > I also tried, decode() and encode() methods. I am getting the same
> > error.
>
> > Please help me out, to read the data from CSV file and convert that to
> >Unicodeand upload in the database?

Prakhil Samar

unread,
Jul 8, 2011, 3:08:12 AM7/8/11
to Google App Engine
Hey is there anyone GAE expert to resolve this issue
I m stucked over this and need to resolve this ASAP.

Thanks

On Jul 5, 11:34 am, prakhil samar <prakhilsa...@gmail.com> wrote:
> Thanks again Julian
>
> i tried the syntax that you have given, but this does not solves our
> problem
>
> This is the error now i m getting when i use   csv_file =
> self.request.get('file1').decode('utf-8')
>
> Traceback (most recent call last):
> File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/
> webapp/__init__.py", line 702, in _call_
> handler.post(*groups)
> File "/base/data/home/apps/appsarp56/12.351609311322083742/csv123.py",
> line 183, in post
> csv_file = self.request.get('file1').decode('utf-8')
> File "/base/python_runtime/python_dist/lib/python2.5/encodings/
> utf_8.py", line 16, in decode
> return codecs.utf_8_decode(input, errors, True)
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position
> 164-165: invalid data
>
> Please suggest me any way to read theunicodedata from the CSV file

Branko Vukelic

unread,
Jul 8, 2011, 5:39:13 AM7/8/11
to google-a...@googlegroups.com
Have you tried unicode(self.request.get('file1'))? There's also
force_unicode function in Django. I use a simplified version of it:

def force_unicode(s):
if not s:
return u''
if isinstance(s, unicode):
return s
try:
return unicode(s)
except:
try:
return s.decode('UTF-8')
except:
raise

So far it has served me well, but on one condition. That the incoming
data really is unicode.

> --
> You received this message because you are subscribed to the Google Groups "Google App Engine" group.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
>
>

--
Branko Vukelić
bra...@herdhound.com

Lead Developer
Herd Hound (tm) - Travel that doesn't bite
www.herdhound.com

Love coffee? You might love Loveffee, too.
loveffee.appspot.com

Adam Sah

unread,
Jul 9, 2011, 4:55:28 PM7/9/11
to google-a...@googlegroups.com
here's something I've used and worked reasonably well, though not perfectly.
   Suggestions/improvements welcome.

adam


def force_to_utf8(text):
  """ This has been very complicated and painful for us to get this right in Python 2
  and App engine.  See preso: http://farmdev.com/talks/unicode/ """
  global ENCODING_ERRORS
  if isinstance(text, unicode):
    return force_to_utf8(text.encode('utf8'))
  if not isinstance(text, basestring):
    text = str(text)
  try:
    # detect if it's already utf8
    text.decode('utf8')
    return text
  except:
    ENCODING_ERRORS += 1
  try:
    res = text.decode('latin1').encode('utf8')
    #logging.error("decode(latin1) worked: "+repr(res))
    return res
  except:
    ENCODING_ERRORS += 1
  try:
    res = text.decode('8859-1').encode('utf8')
    #logging.error("decode(latin1) worked: "+repr(res))
    return res
  except:
    ENCODING_ERRORS += 1
  try:
    res = text.decode('utf16').encode('utf8')
    #logging.error("decode(utf16) worked: "+repr(res))
    return res
  except:
    ENCODING_ERRORS += 1
  logging.error("tried every encoding method for '"+repr(text)[:50]+"'")
  return text


Ross M Karchner

unread,
Jul 9, 2011, 5:46:26 PM7/9/11
to google-a...@googlegroups.com
The CSV library docs include an example Unicode CSV reader:

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/ez2AnX78zFcJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.



--
Ross M Karchner


Reply all
Reply to author
Forward
0 new messages