Hi Beto,
just wanna check with you if you have success with integrating scrapy
with celery.
Right now I with to integrate scrapy into my celeryd workers. Somehow
i follow
http://stackoverflow.com/questions/11528739/running-scrapy-spiders-in-a-celery-task
but so far wasn't able to get multiprocessing working with the that.
Any help you can render?
Pablo?
On Mar 26, 1:02 am, Beto Boullosa <
b...@boullosa.org> wrote:
> Hi, Pablo,
>
> I've already figured out what you meant. I've succeeded in disabling the
> signals through *reactor.run(installSignalHandlers=False).*
> *
> *
> Doing this we were able to run the reactor in another thread without
> problems. And thus, we've succesfully managed to control the flow of
> execution of reactor from Celery in the main thread,
> using reactor.callFromThread.
>
> Thanks again!
> Beto
>
>
>
>
>
>
>
> On Fri, Mar 22, 2013 at 10:33 AM, Beto Boullosa <
b...@boullosa.org> wrote:
> > Thanks for your quick answer! :)
>
> > What do you mean by the OS signals? Anything configurable in twisted
> > itself?
>
> > On Fri, Mar 22, 2013 at 10:19 AM, Pablo Hoffman <
pablohoff...@gmail.com>wrote:
>
> >> I think you *should* be able to run twisted reactor in a non-main thread,
> >> if you disable (OS) signals.
>
> >> On Fri, Mar 22, 2013 at 10:13 AM, Beto Boullosa <
b...@boullosa.org>wrote:
>
> >>> Hi, Pablo,
>
> >>> Thanks for your answer. Option 2 with callFromThread is working fine. :)
>
> >>> Now we're doing some tests to integrate it with Celery, but we haven't
> >>> succeeded yet, mainly because we haven't found a way of running Celery
> >>> outside the main thread.
>
> >>> Cheers,
> >>> Beto
>
> >>> On Tue, Mar 19, 2013 at 12:04 AM, Pablo Hoffman <
pablohoff...@gmail.com>wrote:
>
> >>>> If the RabbitMQ library you are using provides a blocking API, you have
> >>>> two options:
>
> >>>> 1. poll (instead of doing a blocking read) to check for more work
> >>>> 2. do a blocking read, but do it in a thread. Leave the main thread for
> >>>> running twisted reactor (and scrapy Crawlers).
>
> >>>> Don't spawn multiple thread for multiple Crawlers, that's not how it
> >>>> works. All crawlers should run (asynchronously) in the same thread where
> >>>> the Twisted reactor is running.
>
> >>>> For option 2, you should probably use callFromThread to ensure thread
> >>>> safety (see Using Threads in Twisted<
http://twistedmatrix.com/documents/12.0.0/core/howto/threading.html>
> >>>> )
>
> >>>> On Thu, Mar 14, 2013 at 2:08 PM, Beto Boullosa <
b...@boullosa.org>wrote:
>
> >>>>> Hi, Pablo,
>
> >>>>> Thanks for your answer. I had already figured out this piece of
> >>>>> documentation, nice one. I understand now that I can have multiple crawlers
> >>>>> in my process, I've made some tests and it works fine.
>
> >>>>> Nevertheless, I'm facing a bigger problem now: I'm trying to develop
> >>>>> some kind of "crawler consumer" that would consume a queue with the
> >>>>> description of the domains to be crawled. I'm using RabbitMQ to do all the
> >>>>> queue stuff: the consumer pops from the queue the next domain to be
> >>>>> crawled, then instantiates a new scrapy crawler, then runs it and so on.
>
> >>>>> Problem is: by some odd reason, when I integrate both RabbitMQ and
> >>>>> Scrapy, the crawler that I instantiate never crawls anything, although its
> >>>>> spider is initiated ok and everything else seems fine. It's as though the
> >>>>> crawler callbacks for scraping the items are never reached.
>
> >>>>> Incidentally, if I force to close the connection with the RabbitMQ
> >>>>> queue (or if I shutdown RabbitMQ completely), then the crawlers work again.
>
> >>>>> So, it looks like there is some kind of interference between the
> >>>>> RabbitMQ mechanism and the scrapy/twisted internals that, for some reason,
> >>>>> blocks the reactor from working properly when both are running.
>
> >>>>> I've also tested creating every spider in its own thread, but the
> >>>>> problem remains.
>
> >>>>> Would you have any ideas to share on that?
>
> >>>>> Thanks a lot,
> >>>>> Beto
>
> >>>>> On Thu, Mar 14, 2013 at 1:22 PM, Pablo Hoffman <
pablohoff...@gmail.com
> >>>>> > wrote:
>
> >>>>>> Hi Beto,
>
> >>>>>> These ideas have been pretty much implemented by now. Scrapy 0.16 is
> >>>>>> singleton-free and you can have multiple Crawler objects running in a
> >>>>>> single process/twisted-reactor. There's a small section in the
> >>>>>> documentation that explains how:
>
> >>>>>>
http://doc.scrapy.org/en/latest/topics/practices.html#running-multipl...
> >>>>>>>> > On Dec 28, 7:37 am, Pablo Hoffman<
pablohoff...@gmail.com**>
> >>>>>>> For more options, visithttps://
groups.google.com/groups/opt_out.
> >>>>>> For more options, visithttps://
groups.google.com/groups/opt_out.
> >>>>> For more options, visithttps://
groups.google.com/groups/opt_out.
> >>>> For more options, visithttps://
groups.google.com/groups/opt_out.
>
> >>> --
> >>> You received this message because you are subscribed to the Google
> >>> Groups "scrapy-users" group.
> >>> To
>
> ...
>
> read more »