Removing escape characters

413 views
Skip to first unread message

Bob

unread,
Dec 28, 2010, 10:41:47 PM12/28/10
to scrapy-users
I must be missing something obvious but I'm having a hell of a time
removing escape characters

#items.py
from scrapy.item import Item, Field
from scrapy.utils.markup import replace_escape_chars

class FisherItem(Item):
item = Field(
input_processor=MapCompose(replace_escape_chars)
)

#FisherSpider.py
def parse(self, response):
hxs = HtmlXPathSelector(response)
items = []
item = FisherItem()
item['item'] = hxs.select('//h4/a/text()').extract()
items.append(item)
return items

but my json still looks like...

[{"item": ["N-Methyl-p-toluenesulfonamide, 98%, Acros Organics\n\t\t\t
\t\t\t"]},

Bob

unread,
Dec 29, 2010, 10:45:27 AM12/29/10
to scrapy-users
I should note, I've also tried:
input_processor=MapCompose(unicode.strip)

jms415

unread,
Dec 30, 2010, 6:03:01 PM12/30/10
to scrapy-users
You have to use item loaders to use the input processors.
http://doc.scrapy.org/topics/loaders.html
Try this:

#items.py
from scrapy.item import Item, Field
from scrapy.contrib.loader.processor import MapCompose, Join
class FisherItem(Item):
item = Field(
input_processor=MapCompose(unicode.strip) ,
output_processor=Join(),
)
#FisherSpider.py
from scrapy.contrib.loader import XPathItemLoader
def parse(self, response):
load = XPathItemLoader(item=FisherItem(), response=response)
load.add_xpath('item', '//h4/a/text()')
return load.load_item()
Reply all
Reply to author
Forward
0 new messages