class Spider(BaseSpider): name = "lonamrk" allowed_domains = ["lonmark.org"] # Request.meta = {'dont_redirect': True, # 'handle_httpstatus_list': [302]}
start_urls = ["http://www.lonmark.org/membership/directory/partners"]
def parse(self, response): print response.url hxs = HtmlXPathSelector(response) company_links = hxs.select("//*[@id='page_content']/table/tbody/tr[1]/td[1]/a/@href") for link in company_links: yield Request("http://www.lonmark.org/membership/directory/"+link._root, callback=self.parse_company_info)
Sounds like the site is detecting you're scraping and trying to prevent it. Id suggest looking into user agent middlewares to mimic a browser UA string
--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "scrapy-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scrapy-users/Jx-zq7QNw5A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scrapy-users...@googlegroups.com.