Re: Pleae find the attached docuemt, I am troubling to crawl nextpage link?

59 views
Skip to first unread message
Message has been deleted

Lhassan Baazzi

unread,
Sep 2, 2014, 4:48:36 AM9/2/14
to scrapy...@googlegroups.com
Hi,

For this notice:
scrapy_demo\spiders\test.py:43: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.

Just replace select by xpath.



Regards.
---------
Lhassan Baazzi | Web Developer PHP / Python - Symfony - JS - Scrapy
Email/Gtalk: baazzi...@gmail.comSkype: baazzilhassan - Twitter: @baazzilhassan



2014-09-02 9:45 GMT+01:00 james josh <jamesjo...@gmail.com>:
I am debugging this code and get different job counts; but its not 
giving me all the jobs as they are spread across multiple pages.

I also get the following error, but I'm not sure what to do about it:


scrapy_demo\spiders\test.py:43: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.


next_page = None if hxs.select('//div[@class="paggingNext"]/a[@class="blue"]/@href').extract(): next_page = hxs.select('//div[@class="paggingNext"]/a[@class="blue"]/@href').extract()[0] if next_page: yield Request(urlparse.urljoin(response.url, next_page), self.parse)



from
scrapy.spider import BaseSpider from scrapy.selector import HtmlXPathSelector import urlparse from scrapy.http.request import Request from scrapy.contrib.spiders import CrawlSpider,Rule from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from scrapy.item import Item, Field class ScrapyDemoSpiderItem(Item): link = Field() title = Field() city = Field() salary = Field() content = Field() class ScrapyDemoSpider(BaseSpider): name = 'eujobs77' allowed_domains = ['eujobs77.com'] start_urls = ['http://www.eujobs77.com/jobs'] def parse(self,response): hxs = HtmlXPathSelector(response) listings = hxs.select('//div[@class="jobSearchBrowse jobSearchBrowsev1"]') links = [] #scrap listings page to get listing links for listing in listings: link=listing.select('//h2[@class="jobtitle"]/a[@class="blue"]/@href').extract() links.extend(link) #parse listing url to get content of the listing page for link in links: item=ScrapyDemoSpiderItem() item['link']=link yield Request(urlparse.urljoin(response.url, link), meta={'item':item},callback=self.parse_listing_page) #get next button link next_page = None if hxs.select('//div[@class="paggingNext"]/@href').extract(): next_page = hxs.select('//div[@class="paggingNext"]/@href').extract() if next_page: yield Request(urlparse.urljoin(response.url, next_page), self.parse) #scrap listing page to get content def parse_listing_page(self,response): hxs = HtmlXPathSelector(response) item = response.request.meta['item'] item ['link'] = response.url item['title'] = hxs.select("//h1[@id='share_jobtitle']/text()").extract() item['city'] = hxs.select("//html/body/div[3]/div[3]/div[2]/div[1]/div[3]/ul/li[1]/div[2]/text()").extract() item['salary'] = hxs.select("//html/body/div[3]/div[3]/div[2]/div[1]/div[3]/ul/li[3]/div[2]/text()").extract() item['content'] = hxs.select("//div[@class='detailTxt deneL']/text()").extract() yield item

--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
0 new messages