Scrapy Simple Rule Doesn't Follow Links

147 views
Skip to first unread message

jillabra...@gmail.com

unread,
May 17, 2015, 2:09:34 AM5/17/15
to scrapy...@googlegroups.com
Hello

I have a very simple Scrapy CrawlSpider and I have given it a simple rule "Crawl/Follow any link that contains '/search/listings'". But the spider is not crawling/following any of these links?

I have confirmed that the start url contains many links with the href '/search/listings' so the links are there.

Any idea whats going wrong?

class MySpider(CrawlSpider):

    name
= "MySpider"
    allowed_domains
= ["mywebsite.com"]
    start_urls
= ["http://www.mywebsite.com/results"]
    rules
= [Rule(LinkExtractor(allow=['/search/listings(.*)']), callback="parse2")]

   
def parse2(self, response):

       
# This function is never called
        log
.start("log.txt")
        log
.msg("Page crawled: " + response.url)

The start url "http://www.mywebsite.com/results" contains these links that I want the rule to apply to:

<a href='/search/listings?clue=healthcare&amp;eventType=sort&amp;p=2' class='button button-pagination' data-page='2' >2</a>
<a href='/search/listings?clue=healthcare&amp;eventType=sort&amp;p=3' class='button button-pagination' data-page='3' >3</a>
<a href='/search/listings?clue=healthcare&amp;eventType=sort&amp;p=4' class='button button-pagination' data-page='4' >4</a>


Asheesh Laroia

unread,
May 17, 2015, 2:29:41 AM5/17/15
to scrapy...@googlegroups.com
Hi there,

Glad you're emailing the Scrapy list!

First question -- if you remove the `allow=` parameter, do things start to work? That is, if you send all responses to parse2, do things seem OK?

With this question, I hope to make sure that the page is at least being parsed OK.

I'm not sure I'll be able to answer/ask any follow-up questions, but I wanted to at least ask this one. Hopefully I will, but if I'm not around, please don't take it personally, merely that the situation that allowed me the time to write this particular reply didn't re-occur.

Thanks and good luck,

Asheesh.

--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages