Google App Engine, rogue crawlers, and PageSpeed Insights

123 views
Skip to first unread message

jswap

unread,
Jul 26, 2012, 8:34:43 PM7/26/12
to google-a...@googlegroups.com
I run a website containing lots of doctor-related data.  We get crawled by rogue crawlers from thousands of IP addresses DAILY (mostly in Russia) and we sometimes see our content show up on other websites.  I define a crawler as "rogue" when it does not obey robots.txt exclusions, and the crawling company offers no benefit to us and just sucks up system resources.

Google App Engine is hosting a crawler (appid: s~steprep) that is similar to the Russian ones we block.  This crawler crawls us aggressively, sucks up system resources, ignores the robots.txt file, and offers no benefit to us.  Per our normal policy, we have been blocking the dozens of Google IP addresses that this crawler is crawling from.  The problem is that one or more of these IP addresses also host Google's "PageSpeed Insights" page, located here: https://developers.google.com/speed/pagespeed/insights

My questions for Google are: 
1 - Is it your intention that websites be unable to block crawlers that you host?
2 - Is it your intention that websites must allow the steprep crawler in exchange for using the PageSpeed Insights tool?

Some people may suggest "why not just ask the company crawling you to stop crawling you?"
1 - Some companies ignore the request.
2 - Some companies temporarily stop crawling, then show up again a few days or weeks later, at which point I have to waste time dealing with it all over again.

If we were to allow every crawler to crawl our site, our server would be brought to its knees.  Website owners need a mechanism for blocking rogue crawlers, even when they are hosted by Google App Engine.

jswap

unread,
Jul 26, 2012, 8:42:05 PM7/26/12
to google-a...@googlegroups.com
Sorry for posting this thread again (I didn't think the first try was successful, but I see it now).  Is there a way to delete this post?
Reply all
Reply to author
Forward
0 new messages