Maintain relationships in scrapy

Skip to first unread message

Rajeswari Rajkumar

Apr 18, 2017, 2:02:43 AM4/18/17
to scrapy-users
Is there way the relationship between pages can be maintained. For eg: we need to crawl page having shows then Seasons and episodes, but need to maintain which show and season and episode relation in all the stages. 


Palash Kulshrestha

Apr 18, 2017, 6:19:23 AM4/18/17
to scrapy-users
Hi Rajeswari

one way can be passing the details which you wan to percolate from shows page to seasons page and seasons page to episodes page.

For eg. you wont be crawling to all urls on the shows page, you would be crawling url of a certain type only.For this you will be yielding new requests for seasons pages from shows page.
 if rows has all the urls for seasons then
you can do something like

        for row in rows:      
            yield Request(url=row[0],meta={'show':row[2]},callback=self.parse)

    def parse(self, response):
            print(response.meta['show']) # prints the shows name

in this way you will be passing the name of the show while calling request for each season url. Similarly you can do while crawling episodes on seasons page.

I hope this helps.
in this way all the seasons

Rajeswari Rajkumar

Apr 26, 2017, 9:57:36 AM4/26/17
to scrapy-users
Hi Palash,

Thanks for the inputs. Let me try and reach out to you in case of further queries.
Reply all
Reply to author
0 new messages