Google Groups Home
Help | Sign in
Discussions > Google webmaster tools > robots.txt analysis size bug
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  4 messages - Collapse all
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
Marc2042  
View profile
 More options Apr 10 2007, 4:42 am
From: Marc2042
Date: Tue, 10 Apr 2007 01:42:26 -0700
Local: Tues, Apr 10 2007 4:42 am
Subject: robots.txt analysis size bug
Hi.

I put a large robots.txt file (5000 lines) into root directory.
Clearly it was a mistake and I corrected it, using wildcard characters
now. This is what happened.

During the time my old, large robots.txt was online, the "robots.txt
analysis" tool was cutting the robots file size. It seems that the
robots file was simply too big (In the text box) for this tool.

Suddenly, according to the webmaster tool, urls were blocked, that
were not in the robots file. Some of these urls still appeared in the
google index, some without a "cached version" and some without a title
and description.

My questions are simple:

1. Does the "robots.txt analysis" tool really reflect what happens in
Google, since robot in google can be larger than 5000 lines?

2. Is it possible that the tool shows a certain URL as blocked altough
it was not blocked by google's engines?

4. If it was really blocked, for a 10-20 days, how long will it be
until it is reindexed properly? (page has PR 4)

Thank you,
Marc


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
softplus  
View profile
 More options Apr 10 2007, 5:57 am
From: softplus
Date: Tue, 10 Apr 2007 09:57:50 -0000
Local: Tues, Apr 10 2007 5:57 am
Subject: Re: robots.txt analysis size bug
Hi Marc
As far as I know, the analysis tool is just a general tool that does
something similar as what the Googlebot does when it parses the
robots.txt. It has (as far as I remember) previously had some smaller
issues (that have since been cleaned up). It is possible that it has a
limit (to restrict resource usage) while the real Googlebot parser
does not.

Also keep in mind that the tools pages cache error messages for some
time, this can be for a few days or longer. So even if the URLs are
marked as being blocked, it is possible that they are now being
crawled again. What do your server logs say?

The URLs that were blocked and have been changed with regards to
indexing should return to the within the next few crawl cycles. This
depends a lot on how your pages are currently being crawled, it's
impossible to give a number based on PR alone. It makes little sense
to track that too much, you can't influence it (other than by
promoting the pages -- which is of course always a good idea :-)).

John


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Marc2042  
View profile
 More options Apr 13 2007, 2:52 am
From: Marc2042
Date: Thu, 12 Apr 2007 23:52:16 -0700
Local: Fri, Apr 13 2007 2:52 am
Subject: Re: robots.txt analysis size bug
Dear John,

thanks a lot for your response.

You are right, there is a delay of about 2-3 days in the error
messages. Today, April 13 the webmaster tools indicated that Google
last visited on April 10. This message appeared just today... ->delay!

A few pages that seemed to be thrown out from the index are back now
(after I changed the first robots.txt).
This however implies that there might be a correlation between the
Googlebot and the Webmaster Tools.

I simply can't understand, why they were dropped. All other tools
(except webmasters tools) showed that the first robots.txt was valid
and the dropped URLs were allowed! Only Webmaster Tools showed them as
blocked, because the robots.txt was too long (?) and they were
suddenly not cached in Google anymore, or their title disappeared, or
site:www.website.com would not show them.

So is it a bug in the webmaster tool or pure coincidence?

Help from the team is appreciated...

We want to know: How much can we really rely on the webmaster tools as
an assisting development tool?

Have a great weekend,
Marc


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jonathan Simon Google employee  
View profile
 More options Apr 17 2007, 4:36 pm
From: Jonathan Simon
Date: Tue, 17 Apr 2007 20:36:11 -0000
Local: Tues, Apr 17 2007 4:36 pm
Subject: Re: robots.txt analysis size bug
Your robots.txt file being too long for the robots.txt analysis tool
in Webmaster Tools is not the reason a few of your pages were dropped
from the index. The information gathered by this tool is for reporting
only and does not provide input into the Googlebot's interpretation of
your robots.txt file.

If your robots.txt file exceeded the 100k size limit then that could
cause unexpected results when being crawled by the Googlebot.


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2008 Google