Middleware to avoid downloading new requests while parse callbacks are still being processed

29 views
Skip to first unread message

Leonardo Casanova

unread,
Jun 12, 2015, 12:42:49 PM6/12/15
to scrapy...@googlegroups.com
Hi,

As the title says my problem is that I need to limit the amount of parse callbacks being processed concurrently as the are quite memory intensive (some of them create a selenium instance). 
To avoid this I want to prevent scrapy from downloading new requests while parse callbacks are still being processed.

I have written this middleware to reschedule requests if they exceed a certain limit. 
However at some point the scraping seems to stop and no new requests are made. 
Can you help me check if I missed something?

Best Regards
Leo
limit_requests_middleware.py

Leonardo Casanova

unread,
Jun 15, 2015, 3:25:25 AM6/15/15
to scrapy...@googlegroups.com

Mikhail Korobov

unread,
Jun 20, 2015, 8:05:04 AM6/20/15
to scrapy...@googlegroups.com
Hi Leonardo,

Scrapy doesn't execute parse callbacks in parallel, only a single parse callback can be executed at time.

понедельник, 15 июня 2015 г., 12:25:25 UTC+5 пользователь Leonardo Casanova написал:
Reply all
Reply to author
Forward
0 new messages