scrapy + redis = awesome

Rolando Espinoza La Fuente

unread,

Aug 29, 2011, 12:47:14 AM8/29/11

to scrapy...@googlegroups.com

I've published a scrapy+redis integration. This allows to:
* run many crawlers for the same spider and share the workload
* run many post-processing workers to consume the items
* persist request queue, therefore pause/resume crawling

Certainly this is best suited for CPU-bound scrapers.

Requires latest development version and scrapy and haven't been
tested on production.

Source code: https://github.com/darkrho/scrapy-redis

Regards,

~Rolando

Daniel Graña

unread,

Aug 29, 2011, 1:51:17 PM8/29/11

to scrapy...@googlegroups.com

great work :)

so, to use redis as scheduler backend:

- set SCHEDULER to scrapy_redis.scheduler.Scheduler

- set SCHEDULER_PERSIST to True

and to store scraped items in redis for futher post processing:

- add scrapy_redis.pipelines.RedisPipeline to ITEM_PIPELINES

right?

Rolando Espinoza La Fuente

unread,

Aug 29, 2011, 2:02:25 PM8/29/11

to scrapy...@googlegroups.com

Yes. But SCHEDULER_PERSIST is optional, set to True only if you want the
pause/resume feature.

The RedisPipeline serialize the item using json, so the
post-processing can be done
in any language that support redis and json.

Regards

~Rolando

shahidashraff

unread,

Dec 16, 2012, 7:45:01 AM12/16/12

to scrapy...@googlegroups.com

how to set the server adress in the scheduler and how to run it

Andres Douglas

unread,

Apr 25, 2013, 5:12:20 AM4/25/13

to scrapy...@googlegroups.com

Rolando, thanks for sharing this is really interesting. How stable is this at this point? It seems like it's been a while since you published it, but the code in github still has a warning about not being production ready?

Tuấn Lê

unread,

Nov 25, 2014, 2:48:26 AM11/25/14

to scrapy...@googlegroups.com

Hi,

I'm research about scrapy-redis. But I don't know how it work. I need a example step by step.
Can you help me?

Thank you

Reply all

Reply to author

Forward