Is it possible to Scrapy react to external stimuli?

24 views
Skip to first unread message

Paulo Borges

unread,
Jun 5, 2016, 8:11:32 AM6/5/16
to scrapy...@googlegroups.com
Hi,

I'm trying to integrate together Scrapy and Selenium to scrape heavily-weighted JavaScript webapps. I've already found a package [1] that provides a `WebdriverRequest` that works well on my tests.

But I also want to handle not only the requests that I explicitly make using `WebdriverRequest`, but also the requests the browser makes when it navigates to the target page (e.g. javascript and css files, XHR).

My current idea is:

1. Make the browser use a "proxy" that can call a HTTP endpoint to log every requests/response pair it receives.
2. A Scrapy extension would create this HTTP endpoint using `twisted.web.server.Site` and inject it into the existing twisted reactor. (I never used Twisted before, so I don't know if this is possible).
3. Somehow let the Spider know that we have data for it process everytime this endpoint is called.

Is that possible? How can I implement the last step? Does anyone have another suggestion?

Thank you,
Paulo

Reply all
Reply to author
Forward
0 new messages