Scrapy sometimes downloading response and some times not

24 views
Skip to first unread message

shiva krishna

unread,
Jul 17, 2012, 8:15:36 AM7/17/12
to scrapy...@googlegroups.com
I am presently working on scrapy, below is my spider.py code

    class ExampleSpider(BaseSpider):
        name = "example"
        allowed_domains = {"careers-preftherapy.icims.com"}
    
    
        start_urls = [
        ]
    
        def parse(self, response):
            hxs = HtmlXPathSelector(response)
            pageCount = hxs.select('//td[@class = "iCIMS_JobsTablePaging"]/table/tr/td[2]/text()').extract()[0].rstrip().lstrip()[-2:].strip()
            for i in range(1,int(pageCount)+1):
                yield Request("https://careers-preftherapy.icims.com/jobs/search?pr=%d"%i, callback=self.parsePage)
    
        def parsePage(self, response):
            hxs = HtmlXPathSelector(response)
            urls_list_odd_id = hxs.select('//table[@class="iCIMS_JobsTable"]/tr/td[@class="iCIMS_JobsTableOdd iCIMS_JobsTableField_1"]/a/@href').extract()
            print urls_list_odd_id,">>>>>>>odddddd>>>>>>>>>>>>>>>>"
            urls_list_even_id = hxs.select('//table[@class="iCIMS_JobsTable"]/tr/td[@class="iCIMS_JobsTableEven iCIMS_JobsTableField_1"]/a/@href').extract()
            print urls_list_odd_id,">>>>>>>Evennnn>>>>>>>>>>>>>>>>"
            urls_list = []
            urls_list.extend(urls_list_odd_id)
            urls_list.extend(urls_list_even_id)
            for i in urls_list:
                yield Request(i.encode('utf-8'), callback=self.parseJob)
           

        def parseJob(self, response):
            pass

Here after opening the page i am achieving pagination like
 


 ...........so on

I yielded request for each url(suppose here there are 6 pages).When scrapy reached 1st url
i am trying to collect all href tags from the first url
and when it reaches second url same collecting all href tags.


Now in my code as u see there are totally 20 href tags in each page in that 10 href tags are under `td[@class="iCIMS_JobsTableOdd iCIMS_JobsTableField_1"]`  \
and remaining are under `td[@class="iCIMS_JobsTableEven iCIMS_JobsTableField_1"]` .


What the problem is here scrapy some times downloading the tags and some times not i dont know whats happening, i mean when we run spider file two times it is downloading and when another time its returning an empty list like below


**1st time run:**

    2012-07-17 17:05:20+0530 [Preferredtherapy] DEBUG: Crawled (200) <GET https://careers-preftherapy.icims.com/jobs/search?pr=2> (referer: https://careers-preftherapy.icims.com/jobs/search)
    [] >>>>>>>odddddd>>>>>>>>>>>>>>>>
    [] >>>>>>>Evennnn>>>>>>>>>>>>>>>>


**Second time run**

    2012-07-17 17:05:20+0530 [Preferredtherapy] DEBUG: Crawled (200) <GET https://careers-preftherapy.icims.com/jobs/search?pr=2> (referer: https://careers-preftherapy.icims.com/jobs/search)

My question is why it is sometimes downloading and sometimes not, please try to reply me its really helpful for me.


Thanks in advance.....
   

Reply all
Reply to author
Forward
0 new messages