Scrapy X Django asynchronous operations

65 views
Skip to first unread message

lu...@noa.one

unread,
Jul 19, 2016, 7:04:43 AM7/19/16
to scrapy-users
Folks, I am having a tough time to integrate scrapy asynchronous operations in django.

I see little support for twisted, scrapy's underlying asynchronous library, in the django world.

I have tried to use the hendrix and crochet packages but found the documentation and examples lacking.

Naturally, I would expect to use celery with django, for asynchronous tasks. It has an amazing integration.


My questions therefore are:
  - Can I use celery to run an asynchronous spider crawl, given that twisted is used underneath? Is this even the right way to go?
  - If I do use scrapy X celery, will I have scaling issues? (say, if I wanted to crawl 1000 pages)

Has anyone got any experience here?

Rolando Espinoza

unread,
Jul 19, 2016, 9:42:17 PM7/19/16
to scrapy...@googlegroups.com
FWIW, here you have a crochet+scrapy integration: https://github.com/rolando/scrapydo/blob/master/scrapydo/api.py#L90

Rolando

--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages