scrapy + redis = awesome

1,007 views
Skip to first unread message

Rolando Espinoza La Fuente

unread,
Aug 29, 2011, 12:47:14 AM8/29/11
to scrapy...@googlegroups.com
I've published a scrapy+redis integration. This allows to:
* run many crawlers for the same spider and share the workload
* run many post-processing workers to consume the items
* persist request queue, therefore pause/resume crawling

Certainly this is best suited for CPU-bound scrapers.

Requires latest development version and scrapy and haven't been
tested on production.

Source code: https://github.com/darkrho/scrapy-redis

Regards,

~Rolando

Daniel Graña

unread,
Aug 29, 2011, 1:51:17 PM8/29/11
to scrapy...@googlegroups.com
great work :)

so, to use redis as scheduler backend:

- set SCHEDULER to scrapy_redis.scheduler.Scheduler
- set SCHEDULER_PERSIST to True

and to store scraped items in redis for futher post processing:
- add scrapy_redis.pipelines.RedisPipeline to ITEM_PIPELINES

right?

Rolando Espinoza La Fuente

unread,
Aug 29, 2011, 2:02:25 PM8/29/11
to scrapy...@googlegroups.com

Yes. But SCHEDULER_PERSIST is optional, set to True only if you want the
pause/resume feature.

The RedisPipeline serialize the item using json, so the
post-processing can be done
in any language that support redis and json.

Regards

~Rolando

shahidashraff

unread,
Dec 16, 2012, 7:45:01 AM12/16/12
to scrapy...@googlegroups.com
how to set the server adress in the scheduler and how to run it 

Andres Douglas

unread,
Apr 25, 2013, 5:12:20 AM4/25/13
to scrapy...@googlegroups.com
Rolando, thanks for sharing this is really interesting. How stable is this at this point? It seems like it's been a while since you published it, but the code in github still has a warning about not being production ready? 

Tuấn Lê

unread,
Nov 25, 2014, 2:48:26 AM11/25/14
to scrapy...@googlegroups.com
Hi,

I'm research about scrapy-redis. But I don't know how it work. I need a example step by step.
Can you help me?

Thank you
Reply all
Reply to author
Forward
0 new messages