heya!I would like to scrape the news archive of Google News, given a certain keyword.
I have setup Scrapy to run on an Amazon EC2 with Ubuntu (just the CLI) and it's all working fine and Scrapy collects the data and saves it into a MySQL Table.Now my problem is, that I would like to use the Google News filter to e.g. scrape all news Articles published between 2010 and 2011.Google News has this option (if you type in a keyword, you have a little gear button in the input field or use the options on the left hand site).My Problem is now, that scrapy does not use JavaScript and the Google filter is only applicable by using JavaScript.
If I configure everythin in my browser and use this URL for Scrapy, I get a non JavaScript version of Google News with no filters applied.So far I was trying to understand how webkit or silenium work, but all I understood is that they use a real browser which is opened in the OS and the use an API to access this browser.Seeing that I am using CLI only, I don't have the option to run a browser.Can you guys help me out and point me in the right direction of how I could use Scrapy and JavaScript without having to use a "real" browser?Thank you very muchBestnaaboo--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/scrapy-users/-/H87rBcosfmUJ.
To post to this group, send email to scrapy...@googlegroups.com.
To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.