Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
Discussions > Crawling, indexing, and ranking > robots.txt returned in search engine results - how to fix?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  15 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Frod  
View profile  
 More options Aug 22 2008, 1:35 pm
From: Frod
Date: Fri, 22 Aug 2008 10:35:27 -0700 (PDT)
Local: Fri, Aug 22 2008 1:35 pm
Subject: robots.txt returned in search engine results - how to fix?
Hi, folks:

robots.txt returned in search engine results - how to fix?

Regarding the client site http://www.davidlevinent.com , here is a
search on "http://www.davidlevinent.com":

http://www.google.com/search?q=http://www.davidlevinent.com&hl=en&fil...

Then view all search results.  These results include the entry:

----
User-agent: * Disallow: /cgi-bin/ Disallow: /logs/ Disallow ...
- 7:10am
User-agent: * Disallow: /cgi-bin/ Disallow: /logs/ Disallow: /Images/
Disallow: /_notes/ Disallow: /images/ Disallow: /oldfiles/ Disallow: /
Scanned Photos/ ...
www.davidlevinent.com/robots.txt - 1k - Cached - Similar pages - Note
this...
----

robots.txt and sitemap.xml live at www.davidlevinent.com

Note that the first entry in robots.txt contains:
Sitemap: http://www.davidlevinent.com/sitemap.xml

Any suggestions as to how to eliminate the above search engine result?

Thanks,

Fred


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
RainboRick  
View profile  
 More options Aug 22 2008, 2:29 pm
From: RainboRick
Date: Fri, 22 Aug 2008 11:29:28 -0700 (PDT)
Local: Fri, Aug 22 2008 2:29 pm
Subject: Re: robots.txt returned in search engine results - how to fix?
Any tinkering with the access to your robots.txt file would likely do
more harm than good.  Your best course is to do nothing.  The presence
of your robots.txt file in the index is unusual, but won't hurt your
site's performance in Google.  And I would expect it to fall out of
the index on its own over time.  Good luck!

On Aug 22, 12:35 pm, Frod wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile  
 More options Aug 22 2008, 6:03 pm
From: JohnMu
Date: Fri, 22 Aug 2008 15:03:29 -0700 (PDT)
Local: Fri, Aug 22 2008 6:03 pm
Subject: Re: robots.txt returned in search engine results - how to fix?
Hi Fred

You can also disallow your robots.txt in your robots.txt :). We'll
read the robots.txt (since we have to) to find that you don't want us
to crawl the robots.txt, so it'll still get processed. That said, if
your robots.txt is being indexed, I assume there must be a link to it
somewhere. It would probably be good to double-check that this link is
not on your site (and remove it if it is). Without links & being
disallowed, it will generally fall out of the results over time (and
to be honest, I doubt any user is going to find it with your
keywords :-)).

John

PS Another possibility would be to use the x-robots-tag HTTP headers,
but that's a bit more complicated and probably not necessary.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JLH  
View profile  
 More options Aug 22 2008, 6:15 pm
From: JLH
Date: Fri, 22 Aug 2008 15:15:03 -0700 (PDT)
Local: Fri, Aug 22 2008 6:15 pm
Subject: Re: robots.txt returned in search engine results - how to fix?
How about noindex in the robots.txt (which I didn't know existed) as
Susan suggests here:

http://groups.google.com/group/Google_Webmaster_Help-Tools/msg/8cebd8...

On Aug 22, 5:03 pm, JohnMu wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile  
 More options Aug 22 2008, 7:26 pm
From: JohnMu
Date: Fri, 22 Aug 2008 16:26:05 -0700 (PDT)
Local: Fri, Aug 22 2008 7:26 pm
Subject: Re: robots.txt returned in search engine results - how to fix?
Good catch, JLH! Yes, it would be possible to use the "noindex"
directive here as well. However, since the "noindex" directive is not
a widely used one (I don't know which other search engines support it
at the moment) and since it's possible that it's behavior might change
in the future, I wouldn't recommend using it for the long run. If all
you need is to get the URL removed  just this one time, then that
would be a good use for it (you could followup later on with a
"disallow" instead of the "noindex". Hope it helps!

John


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Frod  
View profile  
 More options Aug 30 2008, 3:14 pm
From: Frod
Date: Sat, 30 Aug 2008 12:14:18 -0700 (PDT)
Local: Sat, Aug 30 2008 3:14 pm
Subject: Re: robots.txt returned in search engine results - how to fix?
Thanks, John and others.  Robots.txt is not referenced anywhere on the
site.  I added the following to robots.txt:

User-agent: *
Noindex: /

Let's see what happens...

Fred


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JLH  
View profile  
 More options Aug 30 2008, 3:35 pm
From: JLH
Date: Sat, 30 Aug 2008 12:35:07 -0700 (PDT)
Local: Sat, Aug 30 2008 3:35 pm
Subject: Re: robots.txt returned in search engine results - how to fix?
that will noindex the entire site.  Do you want that?

To do just the robots.txt...

User-agent: *
Noindex: /robots.txt

On Aug 30, 2:14 pm, Frod wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Frod  
View profile  
 More options Aug 30 2008, 3:46 pm
From: Frod
Date: Sat, 30 Aug 2008 12:46:13 -0700 (PDT)
Local: Sat, Aug 30 2008 3:46 pm
Subject: Re: robots.txt returned in search engine results - how to fix?

On Aug 30, 1:35 pm, JLH wrote:

> that will noindex the entire site.  Do you want that?

> To do just the robots.txt...

> User-agent: *
> Noindex: /robots.txt

Thanks - good catch!  I realized the error and updated robots.txt.

Fred


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
cristina  
View profile  
 More options Aug 31 2008, 9:16 am
From: cristina
Date: Sun, 31 Aug 2008 06:16:49 -0700 (PDT)
Local: Sun, Aug 31 2008 9:16 am
Subject: Re: robots.txt returned in search engine results - how to fix?
Hi John,
Does the Noindex rule in the robots.txt file have a similar
effect as the X-Robots-Tag noindex in the HTTP header ?

For example, would it have the same effect
(removal from the Google index of robots.txt)
if instead of having in the robots.txt file

User-agent: Googlebot
Noindex: /robots.txt

the robots.txt URL would serve in the HTTP response header

X-Robots-Tag: noindex

I suppose the difference is that within the robots.txt file
one can specify the bot the rule applies to, but not
with the x-robots-tag in the HTTP header.
Also it is not known how or if other search engines
support the X-Robots-Tags in the HTTP header,
and how search engines would treat
X-Robots-Tag: noindex
in the HTTP response header of the robots.txt file.

Cristina.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile  
 More options Aug 31 2008, 6:33 pm
From: JohnMu
Date: Sun, 31 Aug 2008 15:33:34 -0700 (PDT)
Local: Sun, Aug 31 2008 6:33 pm
Subject: Re: robots.txt returned in search engine results - how to fix?
Hi Cristina

The "noindex" in the robots.txt is generally treated the same as a
robots "noindex" meta tag or an "x-robots-tag" with "noindex." The big
difference (in my opinion) is that it's easier to remove too much with
the robots.txt than it is with either a meta tag (that has to be
placed on all pages) or the HTTP header tag (which usually takes some
manual work as well). That's one reason I'm always a bit careful with
suggesting it :)

At the moment, I'm not aware of other search engines processing
"noindex" in the robots.txt or using the "x-robots-tag" HTTP header
tag.

Hope it helps!
John


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
wreilly  
View profile  
 More options Aug 31 2008, 8:30 pm
From: wreilly
Date: Sun, 31 Aug 2008 17:30:34 -0700 (PDT)
Local: Sun, Aug 31 2008 8:30 pm
Subject: Re: robots.txt returned in search engine results - how to fix?
Isnt the sitemap supposed to go at the end after a blank line?

On Aug 31, 5:33 pm, JohnMu wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
cristina  
View profile  
 More options Sep 1 2008, 8:30 am
From: cristina
Date: Mon, 1 Sep 2008 05:30:35 -0700 (PDT)
Local: Mon, Sep 1 2008 8:30 am
Subject: Re: robots.txt returned in search engine results - how to fix?
Hi John,
Thank you very much for your reply.

About noindex,
doesn't the noindex directive (via meta tag,
or x-robots-tag in the HTTP header, or via the Noindex line in the
robots.txt file)
indicate to Googlebot not to consider other data in those URLs in any
way?

I thought that Googlebot does not process or store data from URLs
that
have noindex.

Couldn't it be confusing to Googlebot to apply noindex to the
robots.txt file itself
(via the line Noindex: /robotx.txt or via
a noindex x-robots-tag in the HTTP header of robots.txt)?

What I meant in my previous posting was that I was wondering
if the noindex x-robots-tag
in the HTTP header of the robots.txt file
would have the same effect as the line
Noindex: /robots.txt
inside the robots.txt file.
Of course the main difference would be that
in the case of the Noindex: /robots.txt line within the robots.txt
file,
Googlebot will have to first parse the file to read it.
I repeat that I think that applying noindex to
the robots.txt file might be confusing.

Cristina.

On Aug 31, 11:33 pm, JohnMu wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile  
 More options Sep 2 2008, 8:23 am
From: JohnMu
Date: Tue, 2 Sep 2008 05:23:44 -0700 (PDT)
Local: Tues, Sep 2 2008 8:23 am
Subject: Re: robots.txt returned in search engine results - how to fix?
Hi Cristina

Either way (via the x-robots-tag or through the robots.txt file) we'll
have to access the robots.txt to see what we're allowed to crawl, so
both ways would get found at about the same time. We will always check
the robots.txt file regardless of whether or not it's indexed -- for
all we know, it might have changed since the last time we checked, and
that would be important to know.

Cheers
John


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
cristina  
View profile  
 More options Sep 2 2008, 8:57 am
From: cristina
Date: Tue, 2 Sep 2008 05:57:35 -0700 (PDT)
Local: Tues, Sep 2 2008 8:57 am
Subject: Re: robots.txt returned in search engine results - how to fix?
Hi John,
Thank you for your reply, it is very helpful.

I hope you do not mind if I add that
there is still the question if all other
search engines/bots that respect robots.txt
will process the robots.txt correctly
if they meet the noindex directive for the robots.txt file itself.

Cristina.

On Sep 2, 1:23 pm, JohnMu wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile  
 More options Sep 2 2008, 9:26 am
From: JohnMu
Date: Tue, 2 Sep 2008 06:26:29 -0700 (PDT)
Local: Tues, Sep 2 2008 9:26 am
Subject: Re: robots.txt returned in search engine results - how to fix?
Hi Cristina
I'm not aware of how the other search engines handle a "noindex"
directive in the robots.txt. I assume they'll likely ignore it if they
don't understand it :).

John


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »