Multi-threaded spider in Scrapy?

3,280 views
Skip to first unread message

Republic

unread,
Mar 8, 2010, 12:44:10 AM3/8/10
to scrapy-users
Hi,

I am trying to crawl a specific blogsite for text only.

Would it be possible to enable multi-threading in Scrapy. Meaning
there are different spiders sent out to crawl different pages but on
the same blogsite?

Is that possible?

Hope you can advise?


FM

Daniel Graña

unread,
Mar 8, 2010, 6:13:14 AM3/8/10
to scrapy...@googlegroups.com
I wouldn't call it multi-threading, but using multiples spiders to crawl same domain is certainly possible. 



Victor Mireyev

unread,
Mar 8, 2010, 8:50:40 AM3/8/10
to scrapy-users
There's also interesing SEP-011: Process models for Scrapy (http://
dev.scrapy.org/wiki/SEP-011),
which proposes <<running each spider on a separate process>> in order
to fight against memory leaks.

On 8 мар, 13:13, Daniel Graña <dan...@gmail.com> wrote:


> On Mon, Mar 8, 2010 at 3:44 AM, Republic <ngfoom...@gmail.com> wrote:
> > Hi, I am trying to crawl a specific blogsite for text only.
>
> > Would it be possible to enable multi-threading in Scrapy. Meaning
> > there are different spiders sent out to crawl different pages but on
> > the same blogsite?
>
> > Is that possible?
>
> Hope you can advise?
>
>
>
> I wouldn't call it multi-threading, but using multiples spiders to crawl
> same domain is certainly possible.
>

> see:http://groups.google.com/group/scrapy-users/browse_thread/thread/efe4...
> <http://groups.google.com/group/scrapy-users/browse_thread/thread/efe4...>

Leonardo Lazzaro

unread,
Mar 12, 2010, 2:20:33 PM3/12/10
to scrapy...@googlegroups.com
Hello,
I was needing something similar, and I found celery. 
Celery will queue tasks (and its also distributed!) , so you could run crawling task from different work servers.

Its really easy to use it, and its really complex feature!

It was originally for django, but now could be added to any project. check out the link http://ask.github.com/celery/introduction.html
Hope this helps to anyone!




--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To post to this group, send email to scrapy...@googlegroups.com.
To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.


Foo Meng Ng

unread,
Mar 13, 2010, 1:19:00 AM3/13/10
to scrapy...@googlegroups.com
So you are saying that now we could use celery as a form of task
scheduler that dishes out Scrapies crawlers to multi-crawl?

interesting man!

FM

--
Ng Foo Meng

Leonardo Lazzaro

unread,
Mar 13, 2010, 10:48:00 PM3/13/10
to scrapy...@googlegroups.com
I am using it plus django to control (schedule,etc) spiders via web interface. its really cool :)

Foo Meng Ng

unread,
Mar 14, 2010, 3:48:01 AM3/14/10
to scrapy...@googlegroups.com
cool!

I am really interested to know more about your work. is there any
papers that talk about it? or you have some pics I am really
interested to know more.


FM

On Sun, Mar 14, 2010 at 11:48 AM, Leonardo Lazzaro

Reply all
Reply to author
Forward
0 new messages