Google Website Scraping getting blocked after few requests

4 views
Skip to first unread message

Params Raman via StackOverflow

unread,
Jun 21, 2014, 10:01:05 AM6/21/14
to google-appengin...@googlegroups.com

We are developing a simple application that makes call to one of Google's services (Reverse Image Search http://www.google.com/insidesearch/features/images/searchbyimage.html by uploading images by url/image and getting the entity name for the image). Essentially, we were getting the results page (as html) that Google returned and scraping the results using a simple parser.

We hosted this on Google App Engine and found that after a while Google blocked our app (identified by the IP) and send out a message saying it is to prevent bots from sending requests to its websites. Below is the message I found in the web server's logs:

This page appears when Google automatically detects requests coming from your computer network which appear to be in violation of the http://www.google.com/policies/terms/">Terms of Service. The block will expire shortly after those requests stop. In the meantime, solving the above CAPTCHA will let you continue to use our services.

This traffic may have been sent by malicious software, a browser plug-in, or a script that sends automated requests. If you share your network connection, ask your administrator for help — a different computer using the same IP address may be responsible. http://support.google.com/websearch/answer/86640">Learn more

Sometimes you may be asked to solve the CAPTCHA if you are using advanced terms that robots are known to use, or sending requests very quickly.

I wanted to check if there is a way to solve this or any workaround, etc. Since Google doesn't expose any Reverse Image Search API's, we do not see any other way (other than creating a http request and scraping the response) to get the info we want.

Any leads will be helpful.



Please DO NOT REPLY directly to this email but go to StackOverflow:
http://stackoverflow.com/questions/24333257/google-website-scraping-getting-blocked-after-few-requests
Reply all
Reply to author
Forward
0 new messages