I 've finally gotten utf-8 data to import. Here's the diff:
--- google_appengine/google/appengine/ext/bulkload/__init__.py
2008-04-03 09:05:25.000000000 +0900
+++ google_appengine-fixed/google/appengine/ext/bulkload/__init__.py
2008-04-11 23:10:43.000000000 +0900
@@ -225,7 +225,7 @@
entity = datastore.Entity(self.__kind)
for (name, converter), val in zip(self.__properties, values):
- entity[name] = converter(val)
+ entity[name] = converter(val.decode('utf-8'))
entities = self.HandleEntity(entity)
@@ -349,7 +349,7 @@
output.append('Error: no Loader defined for kind %s.' % kind)
return (httplib.BAD_REQUEST, ''.join(output))
- buffer = StringIO.StringIO(data)
+ buffer = StringIO.StringIO(data.encode('utf-8'))
reader = csv.reader(buffer, skipinitialspace=True)
entities =
On Apr 11, 6:42 pm, Dan Bravender <
dan.braven...@gmail.com> wrote:
> Brian,
>
> Thanks a ton. I almost slammed my head through my computer last night.
> ^^
>
> 댄
>
> On Apr 11, 2:17 pm, gearhead <
brianob...@gmail.com> wrote:
>
> > Dan,
>
> > Yeah, this is a common problem, see this article on solving it:
> >
http://www.amk.ca/python/howto/unicode
>
> > In short, you need to specify 'ignore', 'strict', etc in the error
> > parameter, e.g.,
> > val =unicode('\x80abc', errors='ignore')
> > I believe that the default is to be strict.
>
> > Brian
>
> > On Apr 10, 7:13 am, Dan Bravender <
dan.braven...@gmail.com> wrote:
>
> > > I'm looking at a way around this, but for the time beingUTF-8isn't
> > > working with bulk_client.py:
>
> > > Traceback (most recent call last):
> > > File "/Users/dbravender/Desktop/google_appengine/google/appengine/
> > > ext/webapp/__init__.py", line 486, in __call__
> > >
handler.post(*groups)
> > > File "/Users/dbravender/Desktop/google_appengine/google/appengine/
> > > ext/bulkload/__init__.py", line 287, in post
> > > self.request.get(constants.CSV_PARAM))
> > > File "/Users/dbravender/Desktop/google_appengine/google/appengine/
> > > ext/bulkload/__init__.py", line 357, in Load
> > > for columns in reader:
> > > UnicodeEncodeError: 'ascii' codec can't encode characters in position
> > > 0-2: ordinal not in range(128)
>
> > > The csv file that I'm trying to update has Korean and Chinese
> > > characters. I found some code for reading autf-8cvs file on
> > >
python.org:
>
> > > def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
> > > # csv.py doesn't doUnicode; encode temporarily asUTF-8:
> > > csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),
> > > dialect=dialect, **kwargs)
> > > for row in csv_reader:
> > > # decodeUTF-8back toUnicode, cell by cell: