Is it possible to Scrapy react to external stimuli?

24 views

Skip to first unread message

Paulo Borges

unread,

Jun 5, 2016, 8:11:32 AM6/5/16

to scrapy...@googlegroups.com

Hi,

I'm trying to integrate together Scrapy and Selenium to scrape heavily-weighted JavaScript webapps. I've already found a package [1] that provides a `WebdriverRequest` that works well on my tests.

But I also want to handle not only the requests that I explicitly make using `WebdriverRequest`, but also the requests the browser makes when it navigates to the target page (e.g. javascript and css files, XHR).

My current idea is:

1. Make the browser use a "proxy" that can call a HTTP endpoint to log every requests/response pair it receives.

2. A Scrapy extension would create this HTTP endpoint using `twisted.web.server.Site` and inject it into the existing twisted reactor. (I never used Twisted before, so I don't know if this is possible).

3. Somehow let the Spider know that we have data for it process everytime this endpoint is called.

Is that possible? How can I implement the last step? Does anyone have another suggestion?

Thank you,

Paulo

[1] https://github.com/brandicted/scrapy-webdriver

Reply all

Reply to author

Forward

0 new messages