Scraping multiple items per page

4,056 views
Skip to first unread message

Chris

unread,
Nov 7, 2010, 10:19:52 AM11/7/10
to scrapy-users
I'm sure this is fairly easy to do, but the things I've been able to
find all seem to discuss the issue in generalities, without providing
specific examples. Given my low level of proficiency with Python, I'm
feeling a little lost.

All I'm trying to do is to pull multiple items off of a page. Each
page that I use my "callback='parse_item'" rule on actually has
several items. But I only know how to write the parse_item function
where it returns a single item at the end. How would I get multiple
items parsed? Do I set my rule to call a new function called
'parse_page' and then have parse_page loop through the items calling
pare_item each time?

Here's a simplified version of my code. It grabs a single product name
from the first xpath that matches my selector. What I want to be able
to do is grab all of the product names, and create a new item for each
one.

http://dpaste.org/T9zA/

Any help greatly appreciated! :)

Javier

unread,
Nov 7, 2010, 2:03:34 PM11/7/10
to scrapy-users
Create a list of items and return the list. There's an example in the
tutorial:

http://doc.scrapy.org/intro/tutorial.html#using-our-item

Chris

unread,
Nov 8, 2010, 12:47:30 AM11/8/10
to scrapy-users
Thanks for that. That's the sort of thing I was looking for.

I had done some more searching and ended up just running a loop and
using "yield" item in each loop instead of a single "return" item.

Is there a performance difference between multiple "yield" calls vs.
returning a list of items?

Shahin

unread,
Nov 8, 2010, 2:15:10 AM11/8/10
to scrapy-users
My understanding is that generators are preferred to in-memory data
structures when you're expecting a large amount of data and the caller
only needs to process it sequentially. Since callbacks require only
that we return iterables (http://doc.scrapy.org/topics/
spiders.html#scrapy.spider.BaseSpider.parse), I think yield is as good
or better than returning the full data structure in one go --
hopefully someone will correct me if I'm wrong.

I found this writeup helpful:
http://stackoverflow.com/questions/231767/the-python-yield-keyword-explained/231855#231855

Shahin
Reply all
Reply to author
Forward
0 new messages