It looks to me like this is actually an issue with the json exporter,
rather than the spider. Under the covers, the scrapy json exporter is
using json.JSONEncoder. It has an argument to it's constructor called
ensure_ascii. From
http://docs.python.org/library/json.html#json.JSONEncoder :
"If ensure_ascii is True (the default), the output is guaranteed to be
str objects with all incoming unicode characters escaped. If
ensure_ascii is False, the output will be a unicode object."
Scrapy is not setting this parameter, so it's defaulting to escaping
unicode. It looks like most of the underlying code supports passing
in arguments to the JSONEncoder, all the way up to
FeedExporter._get_exporter in scrapy/contrib/feedexport.py. However,
that method is only called from FeedExporter.open_spider, which
doesn't pass in any arguments other than the temporary file.
It looks like a quick workaround would be to implement your own feed
exporter to replace JsonItemExporter or JsonLinesItemExporter, and
override the builtin one using the FEED_EXPORTERS setting.
A more longterm solution would be to provide a way to pass in feed
exporter parameters via the settings file.
-Scotty
> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.
>
>
import json
from scrapy.contrib.exporter import BaseItemExporter
class UnicodeJsonLinesItemExporter(BaseItemExporter):
def __init__(self, file, **kwargs):
self._configure(kwargs)
self.file = file
self.encoder = json.JSONEncoder(ensure_ascii=False, **kwargs)
def export_item(self, item):
itemdict = dict(self._get_serialized_fields(item))
self.file.write(self.encoder.encode(itemdict) + '\n')
-Scotty
-Scotty
2011/10/30 Максим Горковский <ragzo...@gmail.com>: