David,
I've written middleware to intercept a JS-specific request before it is processed. I haven't used WaitFor.js, so I can't help you there, but I can help get you started with PhantomJS.
class JSMiddleware(BaseMiddleware):
def process_request(self, request, spider):
if request.meta.get('js'): # you probably want a conditional trigger
driver = webdriver.PhantomJS()
driver.get(request.url)
body = driver.page_source
return HtmlResponse(driver.current_url, body=body, encoding='utf-8', request=request)
return
That's the simplest approach. You may want to end up adding options to the webdriver.PhantomJS() call, such as desired_capabilities including SSL processing options or a user agent string. You may also want to wrap the driver.get() call in a try/except block. Additionally, you should do something with the cookies that come back from PhantomJS using driver.get_cookies().
Also, if you want every request to go through JS, then you can remove the request.meta['js'] conditional. Otherwise, you could insert that information for initial requests in a spider.make_requests_from_url override, or you could simply have a spider instance method like spider.run_js(request) where the spider can look at the request and determine whether it needs JS on it based on some criteria you come up with.
There are a lot of options for you with PhantomJS, so it's really up to you, but this should be a decent starting point. I hope this answers your question.
--
Respectfully,
Joey Espinosa