How to scrape Google result page with all sections using CasperJS?

243 views
Skip to first unread message

HP

unread,
Apr 12, 2016, 1:27:16 AM4/12/16
to CasperJS
Hello All,
                I am new to Casperjs & learning code syntax of it & I checked the demo of fetching links using casperjs. Basically I want to scrape following results from Google.

1. Organic search result links with title & description
2. Paid search results with title & description
3. News, images & videos on the result page
4. Knowledge graphs, snippets, related keywords, product listings

Also I want to fetch pagewise results along with setting up proxy to bypass banning of automated queries, are there any libraries or code implemented for the same that I can use to fetch above mentioned data? Any help will be appreciated.

Thank you!

bruce

unread,
Apr 12, 2016, 11:51:22 AM4/12/16
to casp...@googlegroups.com
pretty sure google has a number of defensive processes in place to deal with auto scraping of any real quantity.

also, as I recall, google has an api for scraping/extracting a limited number of items on a daily basis.

that said, I'm also sure that a few have tackled the issue(s) of scraping google.



--
CasperJS homepage & documentation: http://casperjs.org/
CasperJS @github: https://github.com/n1k0/casperjs
 
You received this message because you are subscribed to the Google Groups "casperjs" group.
Visit this group at http://groups.google.com/group/casperjs?hl=en.
---
You received this message because you are subscribed to the Google Groups "CasperJS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to casperjs+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

HP

unread,
Apr 13, 2016, 5:21:12 AM4/13/16
to CasperJS
So you mean fetching results using casperjs won't work? What if we use proxies for fetching data?

bruce

unread,
Apr 13, 2016, 9:07:11 AM4/13/16
to casp...@googlegroups.com
If a company wishes to "protect" their site from crawlers, there are a number of processes that can be implemented to scare off the vast majority of people wanting the data.. Of course, there are costs associated with implementing the preventive processes, and most companies wouldn't unless they think their sites have content that needs protecting. Google is one of these companies.

Google also has the resources in spades to prevent scraping, given that most people wanting to scrape google, couldn't work at google!

If you need to get a bunch of data from google for your project, I'd suggest you look elsewhere.



HP

unread,
Apr 14, 2016, 1:55:09 AM4/14/16
to CasperJS
Thanks for the  reply, I want the google page result data, can you suggest me the best method to get it apart from API?
Reply all
Reply to author
Forward
0 new messages