CloseSpider from pipeline

klaus

unread,

Jan 16, 2012, 5:26:02 AM1/16/12

to scrapy...@googlegroups.com

Hi,

i have a problem to use CloseSpider function. In my pipeline i insert records to mysql database and also i check if record already exist. If existing records exceded certain value i want to close spider.

On the scrapy documentation i readed that CloseSpider can be invoked only from spider class and not in the pipeline. How can i do for make comunication from pipeline and spider class? Can i invoke CloseSpider from pipeline?

Thanks

Pablo Hoffman

unread,

Jan 16, 2012, 8:49:46 AM1/16/12

to scrapy...@googlegroups.com

You can close the spider by calling crawler.engine.close_spider() function, similar to how the CloseSpider builtin extension does it. Check its code:
https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/closespider.py

--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/scrapy-users/-/dcx3Zsngih8J.
To post to this group, send email to scrapy...@googlegroups.com.
To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.

Fer

unread,

Jul 18, 2013, 4:41:53 PM7/18/13

to scrapy...@googlegroups.com

I had the same problem and I solved it as Pablo Hoffman suggested:

I added these lines to my settings file:

EXTENSIONS = {

'ProjectFolder.droppeditem.DroppedItemCloseSpider': 500,

}

# I dropped an item each time it already exist in my database, so after 20 items dropped the spider will close:

CLOSESPIDER_DROPPEDITEMCOUNT = 20

And I created a file called droppeditem.py with the following code:

from collections import defaultdict

from twisted.internet import reactor

from scrapy import signals

class DroppedItemCloseSpider(object):

def __init__(self, crawler):

self.crawler = crawler

self.close_on = {

'droppeditemcount': crawler.settings.getint('CLOSESPIDER_DROPPEDITEMCOUNT'),

}

self.counter = defaultdict(int)

if self.close_on.get('droppeditemcount'):

crawler.signals.connect(self.item_dropped, signal=signals.item_dropped)

crawler.signals.connect(self.spider_closed, signal=signals.spider_closed)

@classmethod

def from_crawler(cls, crawler):

return cls(crawler)

def item_dropped(self, item, spider):

self.counter['droppeditemcount'] += 1

if self.counter['droppeditemcount'] == self.close_on['droppeditemcount']:

self.crawler.engine.close_spider(spider, 'closespider_droppeditemcount')

def spider_closed(self, spider):

task = getattr(self, 'task', False)

if task and task.active():

task.cancel()

Megido _

unread,

Jul 22, 2013, 12:48:10 PM7/22/13

to scrapy...@googlegroups.com

this make a error "engine not runned"

понедельник, 16 января 2012 г., 15:49:46 UTC+2 пользователь Pablo Hoffman написал:

Reply all

Reply to author

Forward