Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Discussions > Crawling, indexing, and ranking > Googlebot Requesting Too Many Times
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  8 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Linoge  
View profile  
 More options Jul 29 2008, 4:08 pm
From: Linoge
Date: Tue, 29 Jul 2008 13:08:23 -0700 (PDT)
Local: Tues, Jul 29 2008 4:08 pm
Subject: Googlebot Requesting Too Many Times
I run www.wallsofthecity.net, and my hosting company (DreamHost)
recently brought to my attention the fact that my site has been
receiving sufficient numbers of page requests to crash the shared
server I am on.  They sent me the below information by way of
evidence:

ocean: 04:55 PM# tail -10000 access.log| awk '{print $1}' | sort |
uniq
-c |sort -n
     1 208.36.144.6
     1 66.249.70.151
     1 83.7.130.117
    24 63.225.83.211
   249 66.249.71.122
   266 66.249.71.123
   278 66.249.71.121
ocean: 04:55 PM# pwd
/home/linoge/logs/tdzkwiki.wallsofthecity.net/http
ocean: 04:55 PM# host 66.249.71.121
Name: crawl-66-249-71-121.googlebot.com
Address: 66.249.71.121

ocean: 04:55 PM# host 66.249.71.123
Name: crawl-66-249-71-123.googlebot.com
Address: 66.249.71.123

ocean: 04:56 PM# host 66.249.71.122

Name: crawl-66-249-71-122.googlebot.com
Address: 66.249.71.122

Now, to me, that does not mean a whole lot, aside from Googlebots
being very, *very* interested in my site.

Given that DreamHost's solution is to simply block all Googlebots
forever, I am looking for an alternative fix.  What would/could be
causing the bots to go this spastic?  Is there something on my site?
I have already logged into Webmaster Tools and requested a slower
indexing rate, but will that actually accomplish anything?  Thanks for
whatever assistance you can provide.


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Robbo  
View profile  
 More options Jul 29 2008, 4:44 pm
From: Robbo
Date: Tue, 29 Jul 2008 13:44:19 -0700 (PDT)
Local: Tues, Jul 29 2008 4:44 pm
Subject: Re: Googlebot Requesting Too Many Times
I do not understand how those figures prove anything, but let's work
on the assumption that googlebot is spending a lot of time crawling
your site and you want to know what might be contributing to this.

Your server is giving incorrect responses for some requests.

-- if I request a non-existent page:
http://www.wallsofthecity.net/zxzxzxzxzx
your server correctly responds with a 404 Page Not Found.

But if I do the same with your tdzkwiki subdomain,
http://tdzkwiki.wallsofthecity.net/zxzxzxzxzx
the response given is false: it says 301 Moved permanently to:
http://tdzkwiki.wallsofthecity.net/Zxzxzxzxzx
(note the uppercase Z )
and requesting that URL (with the uppercase Z) gets a 200 success
response form your server which is obviously NOT right.

If your server repeatedly says 301 and then 200 for ALL AND ANY
nonexistent URLs, including perhaps ones that DID previously exist but
have been deleted, you are making it very hard for google to
"understand" your site and to keep the indexes up to date.

There are probably other issues but I hiope this is a useful start.

Robbo


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Linoge  
View profile  
 More options Jul 29 2008, 6:13 pm
From: Linoge
Date: Tue, 29 Jul 2008 15:13:52 -0700 (PDT)
Local: Tues, Jul 29 2008 6:13 pm
Subject: Re: Googlebot Requesting Too Many Times
Well, I guess that just goes to show what I know, then :).  I just
copy-pasted the information my hosting company sent me - I honestly do
not understand a great deal of it.

Thanks for the information, though - that was honestly educational and
useful.  At this point, I think I will just keep the Googlebot out of
the "tdzkwiki" section, and then let it back into the root domain...
that should not cause any problems as long as things keep working.

Thanks again!

On Jul 29, 4:44 pm, Robbo wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Shades1  
View profile  
 More options Jul 29 2008, 11:06 pm
From: Shades1
Date: Tue, 29 Jul 2008 20:06:47 -0700 (PDT)
Local: Tues, Jul 29 2008 11:06 pm
Subject: Re: Googlebot Requesting Too Many Times
If g is crashing that server then they are full to the max or have
other issues. Maybe new hosting or dedicated?

On Jul 29, 4:13 pm, Linoge wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile  
 More options Jul 30 2008, 8:13 am
From: JohnMu
Date: Wed, 30 Jul 2008 05:13:23 -0700 (PDT)
Local: Wed, Jul 30 2008 8:13 am
Subject: Re: Googlebot Requesting Too Many Times
Hi Linoge and welcome to the groups!

It does look like that wiki section of your site has a lot of unique
URLs which can be crawled. I imagine this could put a bit of load on
your server. Perhaps the simplest solution would be to disallow
crawling of that subdomain using a "disallow" robots.txt directive.

Hope it helps!
John


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Linoge  
View profile  
 More options Jul 30 2008, 4:52 pm
From: Linoge
Date: Wed, 30 Jul 2008 13:52:37 -0700 (PDT)
Local: Wed, Jul 30 2008 4:52 pm
Subject: Re: Googlebot Requesting Too Many Times
Dedicated hosting is the "solution" I am trying to avoid - hey, I am
cheap :).  This is the first real hiccup I have had with them, so I
dunno.

On Jul 29, 11:06 pm, Shades1 wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Linoge  
View profile  
 More options Jul 30 2008, 4:54 pm
From: Linoge
Date: Wed, 30 Jul 2008 13:54:04 -0700 (PDT)
Local: Wed, Jul 30 2008 4:54 pm
Subject: Re: Googlebot Requesting Too Many Times
Will GoogleBots actually obey that?  I know there is evidence to
support that they do not obey Crawl-Delay (something else I have
implemented in an attempt to make things run better), but I am not
sure about other aspects.

On Jul 30, 8:13 am, JohnMu wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile  
 More options Jul 30 2008, 6:55 pm
From: JohnMu
Date: Wed, 30 Jul 2008 15:55:10 -0700 (PDT)
Local: Wed, Jul 30 2008 6:55 pm
Subject: Re: Googlebot Requesting Too Many Times
Hi Linoge

You're right, we do not use the "crawl-delay" robots.txt directive.
However, we will honor any "disallow" directives that you have in
there. The disallow blocks crawlers from accessing the URLs - they
might still remain in the index for a while regardless (it does not
block indexing of those URLs).

If you can work it out and if you feel that your wiki has valuable
content that you would like indexed, it might be worth the trouble to
work out a list of specific kinds of URLs that you would like
disallowed. That way, the URLs where you have unique and compelling
content can still be crawled and indexed. You will generally be able
to spot patterns when you look at your server log files. You can use
those patterns to create disallow directives for your robots.txt
file.

More information about using patterns in your robots.txt file can be
found at http://www.google.com/support/webmasters/bin/answer.py?answer=40367

Hope it helps!
John


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google