I am presently working on scrapy, below is my spider.py code
class ExampleSpider(BaseSpider):
name = "example"
start_urls = [
]
def parse(self, response):
hxs = HtmlXPathSelector(response)
pageCount = hxs.select('//td[@class = "iCIMS_JobsTablePaging"]/table/tr/td[2]/text()').extract()[0].rstrip().lstrip()[-2:].strip()
for i in range(1,int(pageCount)+1):
def parsePage(self, response):
hxs = HtmlXPathSelector(response)
urls_list_odd_id = hxs.select('//table[@class="iCIMS_JobsTable"]/tr/td[@class="iCIMS_JobsTableOdd iCIMS_JobsTableField_1"]/a/@href').extract()
print urls_list_odd_id,">>>>>>>odddddd>>>>>>>>>>>>>>>>"
urls_list_even_id = hxs.select('//table[@class="iCIMS_JobsTable"]/tr/td[@class="iCIMS_JobsTableEven iCIMS_JobsTableField_1"]/a/@href').extract()
print urls_list_odd_id,">>>>>>>Evennnn>>>>>>>>>>>>>>>>"
urls_list = []
urls_list.extend(urls_list_odd_id)
urls_list.extend(urls_list_even_id)
for i in urls_list:
yield Request(i.encode('utf-8'), callback=self.parseJob)
def parseJob(self, response):
pass
Here after opening the page i am achieving pagination like
...........so on
I yielded request for each url(suppose here there are 6 pages).When scrapy reached 1st url
i am trying to collect all href tags from the first url
and when it reaches second url same collecting all href tags.
Now in my code as u see there are totally 20 href tags in each page in that 10 href tags are under `td[@class="iCIMS_JobsTableOdd iCIMS_JobsTableField_1"]` \
and remaining are under `td[@class="iCIMS_JobsTableEven iCIMS_JobsTableField_1"]` .
What the problem is here scrapy some times downloading the tags and some times not i dont know whats happening, i mean when we run spider file two times it is downloading and when another time its returning an empty list like below
**1st time run:**
[] >>>>>>>odddddd>>>>>>>>>>>>>>>>
[] >>>>>>>Evennnn>>>>>>>>>>>>>>>>
**Second time run**
My question is why it is sometimes downloading and sometimes not, please try to reply me its really helpful for me.
Thanks in advance.....