Proper way of contrusting scrapy start_requests()

123 views

Skip to first unread message

Petar Pilipovic

unread,

Jun 10, 2015, 5:07:35 AM6/10/15

to scrapy...@googlegroups.com

I want to scrape the TOP TEN RELEASES table from torrenting.com, and I have made a crawler for that purpose, but you first need to be logged in to the site. The initial data that I have scraped was basically nothing, so I started rebuilding mine `torrent_spider.py` for that purpose and because I am new to web scraping I am stuck whit this issue.

I am reading the Scrapy docs on this and I have found that `start_requests()` will help me connect to torrenting and start scraping for the table.

Mine question is, can someone explain to me how do I return the `https://www.torrenting.com/browse.php` page after mine spider is logged in, so I can start scraping the wanted data.

This is `torrent_spider.py`:

  from scrapy import Spider
    from scrapy.selector import Selector
    
    
    class TorrentSpider(Spider):
        """ TorrentSpider who will Scrape the Top Then Relese Table. """
        name = "torrenting"
        allowed_domains = ["torrenting.com"]
        start_urls = [
            "https://www.torrenting.com/browse.php",
        ]
    
        def start_request(self):
            return [scrapy.FormRequest("https://www.torrenting.com/login.php?returnto=Login",
                                        formdata={'user': 'example', 'pass': 'somepass'},
                                        callback = self.logged_in)
    
        def logged_in(self, response):
            pass
    
    
        def parse(self, response):
            pass

Reply all

Reply to author

Forward

0 new messages