Folks, I am having a tough time to integrate scrapy asynchronous operations in django.
I see little support for twisted, scrapy's underlying asynchronous library, in the django world.
I have tried to use the hendrix and crochet packages but found the documentation and examples lacking.
Naturally, I would expect to use celery with django, for asynchronous tasks. It has an amazing integration.
My questions therefore are:
- Can I use celery to run an asynchronous spider crawl, given that twisted is used underneath? Is this even the right way to go?
- If I do use scrapy X celery, will I have scaling issues? (say, if I wanted to crawl 1000 pages)
Has anyone got any experience here?