Proper way of contrusting scrapy start_requests()

123 views
Skip to first unread message

Petar Pilipovic

unread,
Jun 10, 2015, 5:07:35 AM6/10/15
to scrapy...@googlegroups.com
I want to scrape the TOP TEN RELEASES table from torrenting.com, and I have made a crawler for that purpose, but you first need to be logged in to the site. The initial data that I have scraped was basically nothing, so I started rebuilding mine `torrent_spider.py` for that purpose and because I am new to web scraping I am stuck whit this issue.

I am reading the Scrapy docs on this and I have found that  `start_requests()` will help me connect to torrenting and start scraping for the table.

Mine question is, can someone explain to me how do I return the `https://www.torrenting.com/browse.php` page after mine spider is logged in, so I can start scraping the wanted data.

This is `torrent_spider.py`:

 
  from scrapy import Spider
   
from scrapy.selector import Selector
   
   
   
class TorrentSpider(Spider):
       
""" TorrentSpider who will Scrape the Top Then Relese Table. """
        name
= "torrenting"
        allowed_domains
= ["torrenting.com"]
        start_urls
= [
           
"https://www.torrenting.com/browse.php",
       
]
   
       
def start_request(self):
           
return [scrapy.FormRequest("https://www.torrenting.com/login.php?returnto=Login",
                                        formdata
={'user': 'example', 'pass': 'somepass'},
                                        callback
= self.logged_in)
   
       
def logged_in(self, response):
           
pass
   
   
       
def parse(self, response):
           
pass







 
Reply all
Reply to author
Forward
0 new messages