You can't override settings like this in your spiders like your code does:
class FirstSpider(CrawlSpider):
settings.overrides['ITEM_PIPELINES'] = ...
And you can't customize the item pipelines per spider.
What you could do is check the spider in the process_item() of your pipeline,
and ignore certain ones. For example:
def process_item(self, item, spider):
if spider.name not in ['myspider1', 'myspider2', 'myspider3']:
return item
Hope this helps,
Pablo.
> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.
But there are some nice alternatives for achieving that functionality. For
example, you can choose a spider attribute to define which pipelines will be
enabled for each spider, and then check that attribute in your pipelines.
Here's how your spiders would look:
class SomeSpider(CrawlSpider):
pipelines = ['first']
class AnotherSpider(CrawlSpider):
pipelines = ['first', 'second']
And your pipelines:
class FirstPipeline(object):
� def process_item(self, item, spider):
if 'first' not in getattr(spider, 'pipelines', []):
return item
# ... pipeline code here ...
class SecondPipeline(object):
� def process_item(self, item, spider):
if 'second' not in getattr(spider, 'pipelines', []):
return item
# ... pipeline code here ...
Btw, this code can be easily made more performant by using sets instead of
lines for the pipelines attribute, and by caching the pipelines per spider.
Pablo.