How to override user_agent with UserAgentMiddleware

1,328 views
Skip to first unread message

Brad

unread,
Jan 26, 2012, 1:21:32 PM1/26/12
to scrapy-users
Hi, I have a question about how to override user_agent in my spider
(CrawlSpider).

I added this to the spider class:

SPIDER_MIDDLEWARES = {
'scrapy.contrib.spidermiddleware.depth.DepthMiddleware':900,

'scrapy.contrib.downloadermiddleware.redirect.RedirectMiddleware':800,

'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware':
700
}



def __init__(self, name=None, **kwargs):
user_agent = 'My user agent'
settings.overrides ['USER_AGENT'] = user_agent


-------

I'm wondering if this is the correct way, and also how I can verify if
the correct user_agent is being sent in the requests.

Thank you,
Brad

Rolando Espinoza La Fuente

unread,
Feb 1, 2012, 7:43:28 PM2/1/12
to scrapy...@googlegroups.com
You can use `user_agent` attribute.

class MySpider(...):

user_agent = 'foo/1.0'

...

Also you can roll your own user-agent middleware to customize the
value across all spiders.
If you want to dynamically set the user-agent in your spider you just
need to set the
request header.

def parse(...):
req = Request(...)
req.headers["User-Agent"] = self.get_random_ua()
...

Regards,

~Rolando

> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.
>

Btron

unread,
Feb 1, 2012, 8:02:47 PM2/1/12
to scrapy-users
Thanks very much for the tip, Rolondo. One issue was that I needed to
pass the user agent dynamically and assign the value in the spider's
init method. That didn't seem to be working because I think it was too
late to assign the user agent. So I ended up using a script to call
scrapy and passed in the user_agent parameter there (e.g. -s
USER_AGENT="$3") and that's working for now. I might look into
customizing user-agent middleware as a next step.

On Feb 1, 4:43 pm, Rolando Espinoza La Fuente <dark...@gmail.com>
wrote:
Reply all
Reply to author
Forward
0 new messages