elegant way to quit scrapy from middleware process_request() function

522 views
Skip to first unread message

Sungmin Lee

unread,
Nov 2, 2014, 11:50:20 PM11/2/14
to scrapy...@googlegroups.com
Hi all,

I've built my own middleware, and I would like to stop crawler when it meets some condition.
I know that from spider, I can terminate the process by raising CloseSpider() exception, but this does not work from middleware's process_request() function at all.

I googled this topic for a while, but none of the solutions I found worked so far.
(raise CloseSpider('message'), spider.close_down = True, etc.)

Even sys.exit(0) call doesn't work shouting exception tracebacks:

Traceback (most recent call last):
 File "/usr/local/Cellar/python/2.7.6_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/core/engine.py", line 137, in _next_request_from_scheduler
   d = self._download(request, spider)
 File "/usr/local/Cellar/python/2.7.6_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/core/engine.py", line 213, in _download
   dwld = self.downloader.fetch(request, spider)
 File "/usr/local/Cellar/python/2.7.6_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/core/downloader/__init__.py", line 87, in fetch
   dfd = self.middleware.download(self._enqueue_request, request, spider)
 File "/usr/local/Cellar/python/2.7.6_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/core/downloader/middleware.py", line 65, in download
   deferred = mustbe_deferred(process_request, request)
--- <exception caught here> ---
 File "/usr/local/Cellar/python/2.7.6_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/utils/defer.py", line 39, in mustbe_deferred
   result = f(*args, **kw)
 File "/usr/local/Cellar/python/2.7.6_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/core/downloader/middleware.py", line 32, in process_request
   response = method(request=request, spider=spider)
 File "/Users/username/Projects/projectname/mymiddleware.py", line 56, in process_request
   sys.exit(0)
exceptions.SystemExit: 0

Is there any way that I can elegantly finish the scraper from process_request function in custom middleware?
Thanks!

Nicolás Alejandro Ramírez Quiros

unread,
Nov 3, 2014, 8:25:44 AM11/3/14
to scrapy...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages