Hi,
I'm trying to integrate together Scrapy and Selenium to scrape heavily-weighted JavaScript webapps. I've already found a package [1] that provides a `WebdriverRequest` that works well on my tests.
But I also want to handle not only the requests that I explicitly make using `WebdriverRequest`, but also the requests the browser makes when it navigates to the target page (e.g. javascript and css files, XHR).
My current idea is:
1. Make the browser use a "proxy" that can call a HTTP endpoint to log every requests/response pair it receives.
2. A Scrapy extension would create this HTTP endpoint using `twisted.web.server.Site` and inject it into the existing twisted reactor. (I never used Twisted before, so I don't know if this is possible).
3. Somehow let the Spider know that we have data for it process everytime this endpoint is called.
Is that possible? How can I implement the last step? Does anyone have another suggestion?
Thank you,
Paulo