Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Scrapy sometimes downloading response and some times not
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  1 message - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
shiva krishna  
View profile  
 More options Jul 17 2012, 8:15 am
From: shiva krishna <shivakrsh...@gmail.com>
Date: Tue, 17 Jul 2012 05:15:36 -0700 (PDT)
Local: Tues, Jul 17 2012 8:15 am
Subject: Scrapy sometimes downloading response and some times not

I am presently working on scrapy, below is my spider.py code

    class ExampleSpider(BaseSpider):
        name = "example"
        allowed_domains = {"careers-preftherapy.icims.com"}

        start_urls = [
            "https://careers-preftherapy.icims.com/jobs/search"
        ]

        def parse(self, response):
            hxs = HtmlXPathSelector(response)
            pageCount = hxs.select('//td[@class =
"iCIMS_JobsTablePaging"]/table/tr/td[2]/text()').extract()[0].rstrip().lstr ip()[-2:].strip()
            for i in range(1,int(pageCount)+1):
                yield
Request("https://careers-preftherapy.icims.com/jobs/search?pr=%d"%i,
callback=self.parsePage)

        def parsePage(self, response):
            hxs = HtmlXPathSelector(response)
            urls_list_odd_id =
hxs.select('//table[@class="iCIMS_JobsTable"]/tr/td[@class="iCIMS_JobsTable Odd
iCIMS_JobsTableField_1"]/a/@href').extract()
            print urls_list_odd_id,">>>>>>>odddddd>>>>>>>>>>>>>>>>"
            urls_list_even_id =
hxs.select('//table[@class="iCIMS_JobsTable"]/tr/td[@class="iCIMS_JobsTable Even
iCIMS_JobsTableField_1"]/a/@href').extract()
            print urls_list_odd_id,">>>>>>>Evennnn>>>>>>>>>>>>>>>>"
            urls_list = []
            urls_list.extend(urls_list_odd_id)
            urls_list.extend(urls_list_even_id)
            for i in urls_list:
                yield Request(i.encode('utf-8'), callback=self.parseJob)

        def parseJob(self, response):
            pass

Here after opening the page i am achieving pagination like

    https://careers-preftherapy.icims.com/jobs/search?pr=1
     https://careers-preftherapy.icims.com/jobs/search?pr=2

 ...........so on

I yielded request for each url(suppose here there are 6 pages).When scrapy
reached 1st url
i am trying to collect all href tags from the first url
`(https://careers-preftherapy.icims.com/jobs/search?pr=1)`
and when it reaches second url same collecting all href tags.

Now in my code as u see there are totally 20 href tags in each page in that
10 href tags are under `td[@class="iCIMS_JobsTableOdd
iCIMS_JobsTableField_1"]`  \
and remaining are under `td[@class="iCIMS_JobsTableEven
iCIMS_JobsTableField_1"]` .

What the problem is here scrapy some times downloading the tags and some
times not i dont know whats happening, i mean when we run spider file two
times it is downloading and when another time its returning an empty list
like below

**1st time run:**

    2012-07-17 17:05:20+0530 [Preferredtherapy] DEBUG: Crawled (200) <GET
https://careers-preftherapy.icims.com/jobs/search?pr=2> (referer:
https://careers-preftherapy.icims.com/jobs/search)
    [] >>>>>>>odddddd>>>>>>>>>>>>>>>>
    [] >>>>>>>Evennnn>>>>>>>>>>>>>>>>

**Second time run**

    2012-07-17 17:05:20+0530 [Preferredtherapy] DEBUG: Crawled (200) <GET
https://careers-preftherapy.icims.com/jobs/search?pr=2> (referer:
https://careers-preftherapy.icims.com/jobs/search)
    [u'https://careers-preftherapy.icims.com/jobs/1836/job',
u'https://careers-preftherapy.icims.com/jobs/1813/job',
u'https://careers-preftherapy.icims.com/jobs/1763/job']>>>>>>>odddddd>>>>>>>>>>>>>>>>
    [preftherapy.icims.com/jobs/1811/job',
u'https://careers-preftherapy.icims.com/jobs/1787/job']>>>>>>>Evennnn>>>>>>>>>>>>>>>>

My question is why it is sometimes downloading and sometimes not, please
try to reply me its really helpful for me.

Thanks in advance.....


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »