Any tinkering with the access to your robots.txt file would likely do
more harm than good. Your best course is to do nothing. The presence
of your robots.txt file in the index is unusual, but won't hurt your
site's performance in Google. And I would expect it to fall out of
the index on its own over time. Good luck!
You can also disallow your robots.txt in your robots.txt :). We'll
read the robots.txt (since we have to) to find that you don't want us
to crawl the robots.txt, so it'll still get processed. That said, if
your robots.txt is being indexed, I assume there must be a link to it
somewhere. It would probably be good to double-check that this link is
not on your site (and remove it if it is). Without links & being
disallowed, it will generally fall out of the results over time (and
to be honest, I doubt any user is going to find it with your
keywords :-)).
John
PS Another possibility would be to use the x-robots-tag HTTP headers,
but that's a bit more complicated and probably not necessary.
> You can also disallow your robots.txt in your robots.txt :). We'll
> read the robots.txt (since we have to) to find that you don't want us
> to crawl the robots.txt, so it'll still get processed. That said, if
> your robots.txt is being indexed, I assume there must be a link to it
> somewhere. It would probably be good to double-check that this link is
> not on your site (and remove it if it is). Without links & being
> disallowed, it will generally fall out of the results over time (and
> to be honest, I doubt any user is going to find it with your
> keywords :-)).
> John
> PS Another possibility would be to use the x-robots-tag HTTP headers,
> but that's a bit more complicated and probably not necessary.
Good catch, JLH! Yes, it would be possible to use the "noindex"
directive here as well. However, since the "noindex" directive is not
a widely used one (I don't know which other search engines support it
at the moment) and since it's possible that it's behavior might change
in the future, I wouldn't recommend using it for the long run. If all
you need is to get the URL removed just this one time, then that
would be a good use for it (you could followup later on with a
"disallow" instead of the "noindex". Hope it helps!
Hi John,
Does the Noindex rule in the robots.txt file have a similar
effect as the X-Robots-Tag noindex in the HTTP header ?
For example, would it have the same effect
(removal from the Google index of robots.txt)
if instead of having in the robots.txt file
User-agent: Googlebot
Noindex: /robots.txt
the robots.txt URL would serve in the HTTP response header
X-Robots-Tag: noindex
I suppose the difference is that within the robots.txt file
one can specify the bot the rule applies to, but not
with the x-robots-tag in the HTTP header.
Also it is not known how or if other search engines
support the X-Robots-Tags in the HTTP header,
and how search engines would treat
X-Robots-Tag: noindex
in the HTTP response header of the robots.txt file.
The "noindex" in the robots.txt is generally treated the same as a
robots "noindex" meta tag or an "x-robots-tag" with "noindex." The big
difference (in my opinion) is that it's easier to remove too much with
the robots.txt than it is with either a meta tag (that has to be
placed on all pages) or the HTTP header tag (which usually takes some
manual work as well). That's one reason I'm always a bit careful with
suggesting it :)
At the moment, I'm not aware of other search engines processing
"noindex" in the robots.txt or using the "x-robots-tag" HTTP header
tag.
> The "noindex" in the robots.txt is generally treated the same as a
> robots "noindex" meta tag or an "x-robots-tag" with "noindex." The big
> difference (in my opinion) is that it's easier to remove too much with
> the robots.txt than it is with either a meta tag (that has to be
> placed on all pages) or the HTTP header tag (which usually takes some
> manual work as well). That's one reason I'm always a bit careful with
> suggesting it :)
> At the moment, I'm not aware of other search engines processing
> "noindex" in the robots.txt or using the "x-robots-tag" HTTP header
> tag.
About noindex,
doesn't the noindex directive (via meta tag,
or x-robots-tag in the HTTP header, or via the Noindex line in the
robots.txt file)
indicate to Googlebot not to consider other data in those URLs in any
way?
I thought that Googlebot does not process or store data from URLs
that
have noindex.
Couldn't it be confusing to Googlebot to apply noindex to the
robots.txt file itself
(via the line Noindex: /robotx.txt or via
a noindex x-robots-tag in the HTTP header of robots.txt)?
What I meant in my previous posting was that I was wondering
if the noindex x-robots-tag
in the HTTP header of the robots.txt file
would have the same effect as the line
Noindex: /robots.txt
inside the robots.txt file.
Of course the main difference would be that
in the case of the Noindex: /robots.txt line within the robots.txt
file,
Googlebot will have to first parse the file to read it.
I repeat that I think that applying noindex to
the robots.txt file might be confusing.
> The "noindex" in the robots.txt is generally treated the same as a
> robots "noindex" meta tag or an "x-robots-tag" with "noindex." The big
> difference (in my opinion) is that it's easier to remove too much with
> the robots.txt than it is with either a meta tag (that has to be
> placed on all pages) or the HTTP header tag (which usually takes some
> manual work as well). That's one reason I'm always a bit careful with
> suggesting it :)
> At the moment, I'm not aware of other search engines processing
> "noindex" in the robots.txt or using the "x-robots-tag" HTTP header
> tag.
Either way (via the x-robots-tag or through the robots.txt file) we'll
have to access the robots.txt to see what we're allowed to crawl, so
both ways would get found at about the same time. We will always check
the robots.txt file regardless of whether or not it's indexed -- for
all we know, it might have changed since the last time we checked, and
that would be important to know.
Hi John,
Thank you for your reply, it is very helpful.
I hope you do not mind if I add that
there is still the question if all other
search engines/bots that respect robots.txt
will process the robots.txt correctly
if they meet the noindex directive for the robots.txt file itself.
> Either way (via the x-robots-tag or through the robots.txt file) we'll
> have to access the robots.txt to see what we're allowed to crawl, so
> both ways would get found at about the same time. We will always check
> the robots.txt file regardless of whether or not it's indexed -- for
> all we know, it might have changed since the last time we checked, and
> that would be important to know.
Hi Cristina
I'm not aware of how the other search engines handle a "noindex"
directive in the robots.txt. I assume they'll likely ignore it if they
don't understand it :).