I'm trying to scrape some data from a site that use cookie for the
language of the site.
How to pass to my spider this cookie value for language:
Cookie: code_pays=2; code_region=0;
In the spider?
I don't know where to set up the CookiesMiddleware shown here
:http://doc.scrapy.org/topics/downloader-middleware.html??
Thanks for your help to a new scrapy "lost" user ;-)
Scrapy is awsome!
If you want to set cookies from your spider you can set the Request.cookies
attribute from the request objects you're returning.
For example:
request.cookies['code_pays'] = '2'
request.cookies['code_region'] = '0'
> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.
Thanks for your answer. Sorry if it's a dummy question but i'm a novice
in this...
I need to add this in my spider file?
Where about?
Here is my spider: (i use the generic ones)
import re
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.contrib.spiders import CrawlSpider, Rule
from anzse.items import AnzseItem
class ExampleSpider(CrawlSpider):
name = 'example'
allowed_domains = ['example.com']
start_urls = ['http://www.example.com/']
rules = (
Rule(SgmlLinkExtractor(allow=r'Items/'), callback='parse_item',
follow=True),
)
def parse_item(self, response):
hxs = HtmlXPathSelector(response)
i = AnzseItem()
#i['domain_id'] = hxs.select('//input[@id="sid"]/@value').extract()
#i['name'] = hxs.select('//div[@id="name"]').extract()
#i['description'] = hxs.select('//div[@id="description"]').extract()
return i
You'll have to create the requests yourself in your spiders, instead of using
the requests created automatically by the CrawlSpider, if you want to manually
set the cookies. So you'll probably have to inherit from BaseSpider, instead of
CrawlSpider. And you'll have to construct the crawling logic yourself, instead
of using crawlspider rules.
Hope this helps,
Pablo.
>> To unsubscribe from this group, send email to scrapy-users+unsubscribe@googlegroups.com.
>> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
def make_requests_from_url(self, url):
return Request(url, cookies={'lang': 'en'}, dont_filter=True)