How to pass cookie value to request?

Rosa (Anuncios)

unread,

Jan 28, 2011, 9:46:14 AM1/28/11

to scrapy...@googlegroups.com

Hi,

I'm trying to scrape some data from a site that use cookie for the
language of the site.

How to pass to my spider this cookie value for language:

Cookie: code_pays=2; code_region=0;

In the spider?

I don't know where to set up the CookiesMiddleware shown here
:http://doc.scrapy.org/topics/downloader-middleware.html??

Thanks for your help to a new scrapy "lost" user ;-)

Scrapy is awsome!

Pablo Hoffman

unread,

Jan 28, 2011, 11:40:04 AM1/28/11

to scrapy...@googlegroups.com

Hi Rosa,

If you want to set cookies from your spider you can set the Request.cookies
attribute from the request objects you're returning.

For example:

request.cookies['code_pays'] = '2'
request.cookies['code_region'] = '0'

> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.

Rosa (Anuncios)

unread,

Jan 28, 2011, 12:07:43 PM1/28/11

to scrapy...@googlegroups.com

Hi Pablo,

Thanks for your answer. Sorry if it's a dummy question but i'm a novice
in this...

I need to add this in my spider file?

Where about?

Here is my spider: (i use the generic ones)

import re

from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.contrib.spiders import CrawlSpider, Rule
from anzse.items import AnzseItem

class ExampleSpider(CrawlSpider):
name = 'example'
allowed_domains = ['example.com']
start_urls = ['http://www.example.com/']

rules = (
Rule(SgmlLinkExtractor(allow=r'Items/'), callback='parse_item',
follow=True),
)

def parse_item(self, response):
hxs = HtmlXPathSelector(response)
i = AnzseItem()
#i['domain_id'] = hxs.select('//input[@id="sid"]/@value').extract()
#i['name'] = hxs.select('//div[@id="name"]').extract()
#i['description'] = hxs.select('//div[@id="description"]').extract()
return i

Pablo Hoffman

unread,

Jan 28, 2011, 12:12:34 PM1/28/11

to scrapy...@googlegroups.com

Hi Rosa,

You'll have to create the requests yourself in your spiders, instead of using
the requests created automatically by the CrawlSpider, if you want to manually
set the cookies. So you'll probably have to inherit from BaseSpider, instead of
CrawlSpider. And you'll have to construct the crawling logic yourself, instead
of using crawlspider rules.

Hope this helps,
Pablo.

Mahmoud Abdel-Fattah

unread,

May 19, 2012, 7:03:50 AM5/19/12

to scrapy...@googlegroups.com

Hi Rosa, did you manage to solve this problem ?

>> To unsubscribe from this group, send email to scrapy-users+unsubscribe@googlegroups.com.

Anthony Da Cruz

unread,

Apr 28, 2014, 3:50:24 AM4/28/14

to scrapy...@googlegroups.com

I have the same issue here, if you know how to do this?

Rosa if you managed this problem could you share the answer :) Thanks

>> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.

Nikolaos-Digenis Karagiannis

unread,

Apr 30, 2014, 4:46:48 AM4/30/14

to scrapy...@googlegroups.com

Define your own make_requests_from_url(). The crawl spider inherits it.

def make_requests_from_url(self, url):
    return Request(url, cookies={'lang': 'en'}, dont_filter=True)

Or you could define start_requests() instead of start_urls.
The cookie middleware will merges to/from the same cookiejar for subsequent requests.

William Kinaan

unread,

Apr 30, 2014, 11:41:37 AM4/30/14

to scrapy...@googlegroups.com

First of all, To see the cookies being sent, you have to enable the cookies debug option. You do that by adding the following line to your setting file.

COOKIES_DEBUG = True

Second, to pass a cookie value into your request, you need to create the request. Then, add the cookies parameter.

For example:

request_with_cookies = Request(url="Your url", cookies={'code_pays': '2', 'code_region': '0'})

Reply all

Reply to author

Forward