CloseSpider from pipeline

626 views
Skip to first unread message

klaus

unread,
Jan 16, 2012, 5:26:02 AM1/16/12
to scrapy...@googlegroups.com
Hi,
i have a problem to use CloseSpider function. In my pipeline i insert records to mysql database and also i check if record already exist. If existing records exceded certain value i want to close spider.

On the scrapy documentation i readed that CloseSpider can be invoked only from spider class and not in the pipeline. How can i do for make comunication from pipeline and spider class? Can i invoke CloseSpider from pipeline?

Thanks 

Pablo Hoffman

unread,
Jan 16, 2012, 8:49:46 AM1/16/12
to scrapy...@googlegroups.com
You can close the spider by calling crawler.engine.close_spider() function, similar to how the CloseSpider builtin extension does it. Check its code:
https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/closespider.py
--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/scrapy-users/-/dcx3Zsngih8J.
To post to this group, send email to scrapy...@googlegroups.com.
To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.

Fer

unread,
Jul 18, 2013, 4:41:53 PM7/18/13
to scrapy...@googlegroups.com
I had the same problem and I solved it as Pablo Hoffman suggested:
I added these lines to my settings file:

EXTENSIONS = {
    'ProjectFolder.droppeditem.DroppedItemCloseSpider': 500,
}
# I dropped an item each time it already exist in my database, so after 20 items dropped the spider will close:
CLOSESPIDER_DROPPEDITEMCOUNT = 20

And I created a file called droppeditem.py with the following code:

from collections import defaultdict
from twisted.internet import reactor
from scrapy import signals

class DroppedItemCloseSpider(object):

    def __init__(self, crawler):
        self.crawler = crawler

        self.close_on = {
            'droppeditemcount': crawler.settings.getint('CLOSESPIDER_DROPPEDITEMCOUNT'),
            }

        self.counter = defaultdict(int)

        if self.close_on.get('droppeditemcount'):
            crawler.signals.connect(self.item_dropped, signal=signals.item_dropped)
        crawler.signals.connect(self.spider_closed, signal=signals.spider_closed)

    @classmethod
    def from_crawler(cls, crawler):
        return cls(crawler)

    def item_dropped(self, item, spider):
        self.counter['droppeditemcount'] += 1
        if self.counter['droppeditemcount'] == self.close_on['droppeditemcount']:
            self.crawler.engine.close_spider(spider, 'closespider_droppeditemcount')

    def spider_closed(self, spider):
        task = getattr(self, 'task', False)
        if task and task.active():
            task.cancel()

Megido _

unread,
Jul 22, 2013, 12:48:10 PM7/22/13
to scrapy...@googlegroups.com
this make a error "engine not runned"

понедельник, 16 января 2012 г., 15:49:46 UTC+2 пользователь Pablo Hoffman написал:
Reply all
Reply to author
Forward
0 new messages