Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
Message from discussion Sitemap auto-discovery
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Maile Ohye Google employee  
View profile  
 More options Apr 18 2007, 2:54 pm
From: Maile Ohye
Date: Wed, 18 Apr 2007 18:54:16 -0000
Local: Wed, Apr 18 2007 2:54 pm
Subject: Re: Sitemap auto-discovery
Hi guys,

> Questions for those who feel like answering or guessing along:

We'd like to answer! :)
First, thanks to everyone for their contributions on this thread.

> - does it work by user-agent? ie could you submit a sitemap for each
> engine separately?

Yes.  Although the Autodiscovery "Sitemap:" directive is independent
of the User-Agent, general robots.txt rules such as "Disallow", remain
applicable.

For example, in the following robots.txt, any sitemaps.org-compliant
search engine can download two Sitemaps, sitemap1.xml and
sitemap2.xml.

******* Start of example1 robots.txt *******

User-agent: *
Disallow:

Sitemap:  http://www.example.com/sitemap1.xml
Sitemap:  http://www.example.com/sitemap2.xml

******* End of example1 robots.txt *******

However, in example2 (listed below), Google is aware that there are 2
Sitemap files, but we'll only retrieve sitemap1.xml because the
robots.txt restricts the googlebot from sitemap2.xml.

******* Start of example2 robots.txt *******

User-agent: googlebot
Disallow: /sitemap2.xml

User-agent: MSNBot
Disallow: /sitemap1.xml

Sitemap:  http://www.example.com/sitemap1.xml
Sitemap:  http://www.example.com/sitemap2.xml

******* End of of example2 robots.txt *******

More information can be found in our Help Center, "How can I control
which search engines see my Sitemap in my robots.txt file?"
http://www.google.com/support/webmasters/bin/answer.py?answer=65289&t...

> - how does it handle canonical issues - or is it be a simple
> "preferred domain" chooser? Eg if your sitewww.domain.comand
> domain.com share files (including the robots.txt), would including a
> sitemap for "www.domain.com" automatically devalue "domain.com"? How
> about https/http sites? Could that be the solution to getting the
> right version indexed?

Currently Sitemap Autodiscovery is not a solution to canonicalization/
preferred domain issues.  (Though we note your potential feature
request.)

We still recommend implementing 301s to the canonical URLs, as well as
utilizing the "Preferred Domain" setting in Webmaster Tools.  Sitemaps
should only contain the canonical version of the URL.

Should you happen to list the non-canonical URL in your Sitemap, it'll
be treated similarly to non-canonical URLs found through our discovery-
based web crawl.  Sample outcomes:

1. If at crawl time we're able to determine that it's non-canonical,
we'll interpret it as the canonical version.
2. It may be considered a separate URL (thus webmasters should
redirect to the canonical if possible, set the "Preferred Domain").
This will not devalue the canonical URL, but dilutes the canonical
URL's potential PageRank.

Please let us know if we can provide clarification.

Thanks again,
Maile


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.