Problem with BaseSpider, does not follow Requests from parse()

38 views
Skip to first unread message

byzantine

unread,
Nov 20, 2010, 1:28:39 PM11/20/10
to scrapy-users
I am trying to get a spider to extract each link and some additional
data related to the link and then process each link with that related
data. I have something like:

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.http import Request

class MySpider(BaseSpider):
name = "example.org"
allowed_domains = ["example.org"]
start_urls = [
"http://example.org/index.html"
]

def parse(self, response):
hxs = HtmlXPathSelector(response)
links = hxs.select('//div/ul/li/a')
for link in links:
url = link.select('@href').extract()[0]
myarg = category.select('b/text()').extract()[0]
print url, myarg
yield Request(url, callback=lambda r: self.parse2(r,
myarg))

def parse2(self, response, myarg):
print myarg

When I run scrapy, it prints each url and the associated myarg, but
doesn't do anything to the urls. Parse2 is never called. It seems as
if the Requests that parse creates are never processed. Is there some
obvious error that I have made? Which is the easiest way to debug how
scrapy handles the requests?

Pablo Hoffman

unread,
Nov 21, 2010, 8:40:08 PM11/21/10
to scrapy...@googlegroups.com
There are many things that may be going on here, but it's hard to tell without
more info. A couple of possible issues off the top of my head:

* the links being extracted are not absolute, but relative - you need to
absolutize them (CrawlSpider link extractor already does this)
* the links being extracted are not in the allowed domains (but you should see
a warning message about this, for the first offsite link of each domain)

Pablo.

> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.

Reply all
Reply to author
Forward
0 new messages