Re: How to run Scrapy spider on several cores in parallel?

Pablo Hoffman

unread,

Sep 29, 2012, 12:21:03 PM9/29/12

to scrapy...@googlegroups.com

Yes, you can run as many instances of a single spider in parallel, and it will spawn a different process per run (to use many cores). This was one of the design goals of Scrapyd: to circumvent Python concurrency limitations and be able to use many cores. The max_proc setting of scrapyd allow to set how many concurrent processes you want to run, and it defaults to the number of cores available in the system. Of course, each run is a different process and completely isolated from the other ones so it doesn't share the request queue. You can partition the start urls (if you have a predefined list of urls to crawl), and give each run a different partition. Another thing that has been mentioned (but I haven't tested myself) is scrapy-redis which allows you to spawn many spider runs sharing the same requests queue.

Good luck!

On Sat, Sep 29, 2012 at 11:47 AM, Ilya Persky <ilya....@gmail.com> wrote:

Hello guys!

Recently I've came across the documentation section which describes scrapyd. It is said there that scrapyd can run several spiders in parallel. So my questions are:

1) Can scrapyd run several instances of _one_ spider at a time? Say, I have a CPU-bounded spider and it would be a great thing to see it running on two cores. If yes - how exactly can I do that?

2) Again, if this is possible - would these spiders share one request queue or I'd need some additional programming to make this work?

Thank you in advance!
Regards,

Ilya.

--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/scrapy-users/-/Uks3wzSM8dAJ.
To post to this group, send email to scrapy...@googlegroups.com.
To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.

Ilya Persky

unread,

Sep 29, 2012, 1:47:40 PM9/29/12

to scrapy...@googlegroups.com

Thanks, Pablo, that awesome, I'll try it!

Pandu

unread,

Dec 20, 2012, 2:41:16 AM12/20/12

to scrapy...@googlegroups.com

why the link to scrapy_redis has chnaged to website becouply . is that spam , intentional or error

Reply all

Reply to author

Forward