Middleware to avoid downloading new requests while parse callbacks are still being processed

Leonardo Casanova

unread,

Jun 12, 2015, 12:42:49 PM6/12/15

to scrapy...@googlegroups.com

Hi,

As the title says my problem is that I need to limit the amount of parse callbacks being processed concurrently as the are quite memory intensive (some of them create a selenium instance).

To avoid this I want to prevent scrapy from downloading new requests while parse callbacks are still being processed.

I have written this middleware to reschedule requests if they exceed a certain limit.

However at some point the scraping seems to stop and no new requests are made.

Can you help me check if I missed something?

Best Regards

Leo

limit_requests_middleware.py

Leonardo Casanova

unread,

Jun 15, 2015, 3:25:25 AM6/15/15

to scrapy...@googlegroups.com

I have also asked this question in stackoverflow.

http://stackoverflow.com/questions/30808294/middleware-to-limit-the-amount-of-requests-downloaded-by-scrapy

Mikhail Korobov

unread,

Jun 20, 2015, 8:05:04 AM6/20/15

to scrapy...@googlegroups.com

Hi Leonardo,

Scrapy doesn't execute parse callbacks in parallel, only a single parse callback can be executed at time.

понедельник, 15 июня 2015 г., 12:25:25 UTC+5 пользователь Leonardo Casanova написал:

Reply all

Reply to author

Forward