I have a very simple Scrapy CrawlSpider
and I have given
it a simple rule "Crawl/Follow any link that contains
'/search/listings'". But the spider is not crawling/following any of
these links?
I have confirmed that the start url contains many links with the href '/search/listings' so the links are there.
Any idea whats going wrong?
class MySpider(CrawlSpider):
name = "MySpider"
allowed_domains = ["mywebsite.com"]
start_urls = ["http://www.mywebsite.com/results"]
rules = [Rule(LinkExtractor(allow=['/search/listings(.*)']), callback="parse2")]
def parse2(self, response):
# This function is never called
log.start("log.txt")
log.msg("Page crawled: " + response.url)
<a href='/search/listings?clue=healthcare&eventType=sort&p=2' class='button button-pagination' data-page='2' >2</a>
<a href='/search/listings?clue=healthcare&eventType=sort&p=3' class='button button-pagination' data-page='3' >3</a>
<a href='/search/listings?clue=healthcare&eventType=sort&p=4' class='button button-pagination' data-page='4' >4</a>
--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.