Pause the first parse function when a new request is yielded

22 views

Skip to first unread message

Χρατς Χρουτς

unread,

May 25, 2017, 3:42:49 PM5/25/17

to scrapy-users

Hello. I'm writing a spider for a website that gathers some hyperlinks, then visits them and checks if something exists and returns the results into a text file.

I have a for loop that yields requests, calling a parse2 function that checks the link and updates the text file.

evenselectorlist = response.css('table[id="result_table"] tr.even')

for evenselector in evenselectorlist:

relative = evenselector.css('a[title="Link"]::attr(href)').extract_first()

yield scrapy.Request(response.urljoin(relative), callback=self.parse2,meta={'item':item},dont_filter=True)

def parse2(self, response):

#txt file stuff

Is there a way to make the first parse function pause when the request is yielded? I would like to continue to do some stuff AFTER the new requests have ended.

For example, I'd like to have a counter to see how many links have the information I want, which is available only after all the links have been visited.