How to split items?

218 views
Skip to first unread message

Hara

unread,
Oct 17, 2010, 5:28:23 AM10/17/10
to scrapy-users
Hi everyone, I'm a new scrapy user and I have try to extract the item
from my crawling for several days.

#######################################################
Spider.py
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.loader import XPathItemLoader

from trip.items import TripItem, Field

class TripSpider(BaseSpider):
name = "tripadvisor"
domain_name = "tripadvisor.com"
start_urls = ['http://www.tripadvisor.com/ShowUserReviews-g293916-
d311043-r59187869.html']
def parse(self, response):
hxs = HtmlXPathSelector(response)
selector = hxs.select('//div[contains(@class,"deckC")]')
items = []
## ItemLoader
l = XPathItemLoader(item=TripItem(), selector= selector)
l.add_xpath('place', '//div[@id and @class]/h2[@class="name
hotel"]/a/text()')
l.add_xpath('quote','//div[@class and @id]/div[1]
[@class="quote"]/text()')
## l.add_xpath('comment','//p[@id]/text()')
l.add_xpath('date','//div[@id and @class]/div[2][@class]/
div[@class="profile"]/div[6][@class="date "]/text()')
yield l.load_item()

SPIDER = TripSpider()
##########################################################
Item.py
class TripItem(Item):
# define the fields for your item here like:
# name = Field()
quote = Field()
date = Field()
place = Field()
##########################################################
pipeline.py
import csv

class CsvWriterPipeline(object):

def __init__(self):
self.csvwriter = csv.writer(open('items.csv', 'wb'))

def process_item(self, item,spider ):

self.csvwriter.writerow([item['quote'], item['date']])
return item
#########################################################
There are my output....

2010-10-17 16:12:43+0700 [tripadvisor] INFO: Passed TripItem(
date=[u'\nMar 22, 2010\n', u'\nMar 4, 2010\n', u'\nFeb 11, 2010\n',
u'\nFeb 4, 2010\n', u'\nFeb 2,
2010\n'],
quote=[u'Large, impressive Buddha, but little quietness...', u'wat pho
thailand nice toursit spot', u'Size queens will love it!', u'Favourite
temple in Thailand', u'Beautiful'],
place=[u'Temple of the Reclining Buddha (Wat Pho)', u'Temple of the
Reclining Buddha (Wat Pho)', u'Temple of the Reclining Buddha (Wat
Pho)', u'Temple of the Reclining Buddha (Wat Pho)', u'Temple of the
Reclining Buddha (Wat Pho)'])

but i wish to export to XML or CSV file and my expect output is

TripItem( ## First Item
date=[u'\nMar 22, 2010\n'],
quote=[u'Large, impressive Buddha, but little quietness...'],
place=[u'Temple of the Reclining Buddha (Wat Pho)'])
TripItem( ## Second Item
date=[u'\nMar 4, 2010\n'],
quote=[u'wat pho thailand nice toursit spot'],
place=[u'Temple of the Reclining Buddha (Wat Pho)'])
....

I don't know how to split them and they make me mortal for several
days. I'm a bachelor's student and this is one part of my senior
project. I have try many of Scrapy's tutorials but there're not help
me anymore.

Highly hope someone can help me
Thank you in advance




Pablo Hoffman

unread,
Oct 18, 2010, 6:45:46 PM10/18/10
to scrapy...@googlegroups.com
Have you tried with Feed Exports?
http://doc.scrapy.org/topics/feed-exports.html

Exporting as XML would be as easy as running:

scrapy crawl myspider --set FEED_URI=items.xml --set FEED_FORMAT=xml

Or, for CSV:

scrapy crawl myspider --set FEED_URI=items.csv --set FEED_FORMAT=csv

Pablo.

> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.

Hara

unread,
Oct 20, 2010, 3:24:18 AM10/20/10
to scrapy-users
Thank you for your reply

I have solved this problem yesterday by modified the own pipeline.
I try XML Exporting and the result isn't as my expect.

Thank you again :3

On Oct 19, 5:45 am, Pablo Hoffman <pablohoff...@gmail.com> wrote:
> Have you tried with Feed Exports?http://doc.scrapy.org/topics/feed-exports.html
Reply all
Reply to author
Forward
0 new messages