Using settings in spider __init__

279 views
Skip to first unread message

Andre King

unread,
Apr 5, 2017, 6:33:55 PM4/5/17
to scrapy-users
So far, I've read that settings should be gotten in the from_crawler method and passed in as args or kwargs to the spider __init__ method.

I wasn't able to figure out how to do this, so my solution has been to simply call get_project_settings() in the spider's __init__ method instead and go from there.

Is there anything wrong with this approach? If so, can someone provide an example as to how to pass settings to __init__ from from_crawler?

Thanks,
Andre

Andre King

unread,
Apr 5, 2017, 7:00:27 PM4/5/17
to scrapy-users
Just answered my own questions:

(1) Do NOT call get_project_settings() in spider's __init__ method because: this will retrieve only the settings in settings.py, not any custom spider or command line settings.
(2) In order to pass arguments into the spider's __init__ method from the from_crawler method, make sure both methods accept *args and **kwargs as input. Then you can do something like this in from_crawler:

my_setting = crawler.settings.get('MY_SETTING')

kwargs['foo'] = my_setting

obj = super(MySpider, self).from_crawler(crawler, *args, **kwargs)

return obj

Andre King

unread,
Apr 5, 2017, 7:12:11 PM4/5/17
to scrapy-users
Actually this brings up the problem that my command-line args get clobbered, unless I perform a check: if 'foo' in kwargs, do nothing. It seems like I'm doing something wrong. Any tips?

Jakob de Maeyer

unread,
Apr 6, 2017, 4:28:32 AM4/6/17
to scrapy-users
Hey Andre,

the default from_crawler method will set the Settings object as settings attribute of your spider, so you can just do this in your init (and leave from_crawler untouched):
    def __init__(self, *args, **kwargs):
        super(MySpider, self).__init__(*args, **kwargs)
        if self.settings['FOO']:
# Enter panic mode
Cheers!,
-Jakob

Andre King

unread,
Apr 6, 2017, 10:48:30 AM4/6/17
to scrapy-users
Hi Jakob,

Actually, I get an error that the spider has no attribute "settings". Please see the small project I created to show this: https://github.com/andreking/hounder

- Andre

Jakob de Maeyer

unread,
Apr 6, 2017, 11:03:46 AM4/6/17
to scrapy...@googlegroups.com
Ah, my apologies, I'm an idiot. ;) Of course the settings attribute only gets set after the __init__() call.

In that case, if you need the settings inside __init__(), it's probably simplest to override from_crawler and pass the settings object to __init__():

class MySpider(scrapy.Spider):
name = "myspider"
def __init__(self, settings, *args, **kwargs):
super(MySpider, self).__init__(*args, **kwargs)
if settings['FOO']:
# Enter panic mode
pass
@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
spider = cls(crawler.settings, *args, **kwargs)
spider._set_crawler(crawler)
return spider


Sorry about the noise ;),
-Jakob


--
You received this message because you are subscribed to a topic in the Google Groups "scrapy-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scrapy-users/U2DNE1a2Yd0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Andre King

unread,
Apr 6, 2017, 2:36:14 PM4/6/17
to scrapy-users
Thanks! This is what I was looking for :)

Andre
Reply all
Reply to author
Forward
0 new messages