Sitemaps appear to be a 'favourite' for ...Phil Payne... - and one
area that we seem to get a lot of 'ignorant behaviour' on...
Such as stunningly ridiculous change frequencies
all priorities set to high levels
etc.
... yet ...
no one from G seems to have said whether these can be negatives, or
used as potential indicators... or in any way have any influence at
all.
There are little 'hints' for things like preferred URL if their is on-
site duplication....
but thats about it.
So, if at all possible, could someone from Google please shed a little
light on the whole SiteMaps area please?
Come on... show us they are worth having... if used correctly!
(otherwise people a) might as well not bother with them and b) we can
stop telling people how stupid their sitemaps are ;))
Yes, we do use Sitemaps. Quite a lot of them, actually. In Zurich, I
work together with part of the Sitemaps team, so I feel very
comfortable in saying that if you can set up and maintain them, it'll
generally be worth your time ("generally" because I don't know what
you charge per hour, how much time you'd spend on them and what it's
worth to you to be properly indexed :-)).
One of the subjects Phil brings up regularly is the quality of the
meta data within the Sitemap file. Obviously, if we use Sitemaps, it
would make sense to give us valid data. Let's take a quick look at the
kinds of data that we have in a Sitemap file:
URL - If you give us the wrong URLs, well, that's not really going to
help your "right" URLs :-). In some cases, we may use these URLs to
determine the canonical versions to display for your pages, so it
makes sense to give us not only working URLs, but if possible the ones
which you want crawled and indexed. If you have the choice between "/"
and "/index.html", I would suggest giving us only one of them,
probably just "/". However, this is just one signal for determining
the canonical version, so make sure that your navigation also goes to
that URL and (if necessary) you have 301 redirects in place to move
all misinformed crawlers there.
I would only include URLs to indexable content like (X)HTML pages and
other documents. If you can't give us all your URLs then just give us
whatever you can. If you use multiple CMS within your site, it's fine
to give us multiple Sitemap files or Sitemap Index files (say one for
the forum, one for the blog and one for the shop).
Last modification date - This one is a bit tougher. Many websites
return either no date or the current date/time for dynamic pages (PHP/
ASP scripts, etc) and because of that, they sometimes give us the
wrong last modification date for their URLs. This brings us into a bit
of a dilemma: do we trust your data (all pages just changed) or do we
assume it was a mistake? The problem being that if we assume that all
pages have just changed, we might not have enough time left to crawl
the few pages that actually did change. If possible, it would really
make sense for you to either submit the actual change date for the
content or just leave this field out. If your Sitemap generator
doesn't allow that, we'll try to make sure that you don't run into any
problems with the "wrong" last modification dates (that can also
include old dates for Sitemap files that haven't been updated as often
as the content).
Priority, Change frequency - These values are similar to the last one,
only there is no "correct" value :-). Obviously, giving us "1.0" for
priority and "always" for the change frequency is not going to make us
crawl your whole site 24/7. As with the last modification date, if you
can't give us something that makes sense, it would be best to leave
the fields out. If that's not possible, we'll try to make do with the
information that we have.
When setting up a Sitemap generator, I would make sure that the URLs
you list are perfect in that they are the ones which you want indexed
and that no duplicates are listed. In the past I have used these URLs
as a basis for making sure that a site has the preferred canonicals in
the navigation and redirects as appropriate (you can compare your
Sitemap URLs with the URLs found by crawling your site). In many
cases, crawl problems can be recognized ahead of time if this is done
regularly.
For the other meta data it depends a bit on how you will be
maintaining your Sitemap file. If you are certain that you can keep up
with all changes in your site (perhaps if your Sitemap file is
generated automatically or you are certain that you will keep up
manually), then I would suggest including the last modification date
(and would generally leave the change frequency out). If you know
ahead of time that you most likely won't be able to keep up, I would
just use the change frequency and not include a last modification
date. Priority is a bit harder to judge so personally I like to keep
it out, but if you think you can give us good data, then by all means
give it to us :).
A short summary for those that fell asleep during the above:
- Yes, please send us Sitemap files, preferably sitemap.org XML files!
- Work on good URLs & use them to double-check your site's navigation
- Optional: Date or change frequency? depends on how you work.
- Also optional: Priority
Does that make sense? Let me know if you have more questions!
I see the main benefits of sitemap.xml to webmsters as:
-- a way of ensuring that google discovers all the URLs;
-- can indicate which URL you want indexing if google discovers two
that appear to be same content;
-- if there is any problem with crawling time allocation, it helps
identify which URLs are more recent or higher priority for indexing
compared with others on the same site;
-- anything which assists efficient crawling is likely to increase the
liklihood of being fully indexed;
-- if google decides that a site's sitemap is so defective that it is
"useless", it *might* be one of many many possible negative signals.
Beyond that, I don't think that sitemap.xml should give any particular
boost to ranking.
I do think that some things about the way a site "presents" are
symptomatic of lack of care, even sloppiness, and I think if you
cannot take care of simple basics, there are probably many deeper
problems. There are certainly many things that I did years ago that I
would not dream of doing deliberately now.
It may be that when glancing at a site and you see www/non-www
canonicalization issue; multiple URLs for same/homepage; invalid
robots.txt; impossible sitemap.xml data; lack of good use of header
and title tags; repetitive titles and description snippets; etc, these
are easy things to spot.
And although I think it's good to be polite and non-confrontational,
it is important to assert that some defects are obvious - especially
appropriate when the OP takes an aggressive or "why is google
punishing my site when everything I do is perfect" sort of attitude.
I feel sure that there are far more serious issues about the way most
sites "present" but some things are easier to spot and to be
definitive about, and defective sitemaps is one of them.
> Sitemaps appear to be a 'favourite' for ...Phil Payne... - and one
> area that we seem to get a lot of 'ignorant behaviour' on...
> Such as stunningly ridiculous change frequencies
> all priorities set to high levels
> etc.
> ... yet ...
> no one from G seems to have said whether these can be negatives, or
> used as potential indicators... or in any way have any influence at
> all.
> There are little 'hints' for things like preferred URL if their is on-
> site duplication....
> but thats about it.
> So, if at all possible, could someone from Google please shed a little
> light on the whole SiteMaps area please?
> Come on... show us they are worth having... if used correctly!
> (otherwise people a) might as well not bother with them and b) we can
> stop telling people how stupid their sitemaps are ;))
"On one hand, by giving Google a sitemap file with all the changes,
and a list of your most important pages - you are also giving those
details to your competitors. Yes, that Sitemap file is public for
Google and for your competitors."
There's a way to avoid that. Sure, if your Sitemap file is called /
sitemap.xml or is listed in your robots.txt file, then other people
are going to be able to view it. That said, they will also be able to
crawl your site to find these URLs as well -- though they wouldn't
know the "meta data". However, you can also keep your Sitemap files
private: just use an obscure file name and submit the URLs directly to
the search engines.
For the other search engines, you can generally also use a similar
"ping" method. Some also support a direct submission through a similar
dashboard for webmasters.
Another item mentioned in Barry's post is "I have always been a
believer that well on-page optimized sites do not require or even
benefit much from Google Sitemaps." Obviously most websites do not
need Sitemap files -- otherwise we'd be doing a pretty bad job of
crawling the web :-). However, good and "well-optimized" sites are
often also fairly large and hopefully updated regularly. With a
website like that, it can take a bit of time for us to discover new or
changed content, so helping the search engines with a good Sitemap
file really makes a lot of sense. Imagine if we had to recrawl big
auction sites in order to discover all the new items ... Yikes!
> "On one hand, by giving Google a sitemap file with all the changes,
> and a list of your most important pages - you are also giving those
> details to your competitors. Yes, that Sitemap file is public for
> Google and for your competitors."
> There's a way to avoid that. Sure, if your Sitemap file is called /
> sitemap.xml or is listed in your robots.txt file, then other people
> are going to be able to view it. That said, they will also be able to
> crawl your site to find these URLs as well -- though they wouldn't
> know the "meta data". However, you can also keep your Sitemap files
> private: just use an obscure file name and submit the URLs directly to
> the search engines.
> For the other search engines, you can generally also use a similar
> "ping" method. Some also support a direct submission through a similar
> dashboard for webmasters.
> Another item mentioned in Barry's post is "I have always been a
> believer that well on-page optimized sites do not require or even
> benefit much from Google Sitemaps." Obviously most websites do not
> need Sitemap files -- otherwise we'd be doing a pretty bad job of
> crawling the web :-). However, good and "well-optimized" sites are
> often also fairly large and hopefully updated regularly. With a
> website like that, it can take a bit of time for us to discover new or
> changed content, so helping the search engines with a good Sitemap
> file really makes a lot of sense. Imagine if we had to recrawl big
> auction sites in order to discover all the new items ... Yikes!
>> "On one hand, by giving Google a sitemap file with all the changes,
>> and a list of your most important pages - you are also giving those
>> details to your competitors. Yes, that Sitemap file is public for
>> Google and for your competitors."
He has a point, but we are still in the VERY early days of sitemap
adoption. Relatively few sites have them, and far fewer have sensible
ones.
I like the robots.txt discovery mechanism. So far I haven't seen
anyone write a piece of code that scans domains for a robots.txt file
containing a sitemap definition and then uses that for marketing
purposes.
"Dear webmaster@.. - did you know that your sitemap is garbage and the
page you set as highest priority has 1,723 W3C validation errors?"
Actually I have received a few emails claiming they can get rid of all
validation errors from my site.
I bet they can, since I rarely have any and when I do they only exist
for a few minutes until I fix them LOL
> >> "On one hand, by giving Google a sitemap file with all the changes,
> >> and a list of your most important pages - you are also giving those
> >> details to your competitors. Yes, that Sitemap file is public for
> >> Google and for your competitors."
> He has a point, but we are still in the VERY early days of sitemap
> adoption. Relatively few sites have them, and far fewer have sensible
> ones.
> I like the robots.txt discovery mechanism. So far I haven't seen
> anyone write a piece of code that scans domains for a robots.txt file
> containing a sitemap definition and then uses that for marketing
> purposes.
> "Dear webmaster@.. - did you know that your sitemap is garbage and the
> page you set as highest priority has 1,723 W3C validation errors?"
You can have the #1 position on Yahoo! AND Google today.
Our company is unmatched in the Search Engine Optimization industry.
If you want the top position on
ALL the major search engines without headache, expense or time _ we
are your solution. Our
technology is exclusive and our results are proven. We WILL get you
to the top and keep you there.
Even better, we will beat any competitors pricing AND we guarantee our
work. If you have lost money,
time and position and you want unlimited traffic with guaranteed
results, contact us today for a
free quote at: [email address] or simply reply to this
message. Please include the
Website(s) you are interested in promoting and the best way to contact
you.
THIS IS NOT PAY PER CLICK. Examples/Demo will be provided.
Sincerely,
Sarah Dixon
Search Placement Specialists
> You can have the #1 position on Yahoo! AND Google today.
> Our company is unmatched in the Search Engine Optimization industry.
> If you want the top position on
> ALL the major search engines without headache, expense or time _ we
> are your solution. Our
> technology is exclusive and our results are proven. We WILL get you
> to the top and keep you there.
> Even better, we will beat any competitors pricing AND we guarantee our
> work. If you have lost money,
> time and position and you want unlimited traffic with guaranteed
> results, contact us today for a
> free quote at: [email address] or simply reply to this
> message. Please include the
> Website(s) you are interested in promoting and the best way to contact
> you.
> THIS IS NOT PAY PER CLICK. Examples/Demo will be provided.
> Sincerely,
> Sarah Dixon
> Search Placement Specialists
So, cat amongst the pigeons.
In the various Google references to the Sitemaps... there are no
gaurentees.
The URLs 'may not be crawled' ...
So it's kind of an 'Option' for Google?
It's referenced that there is no 'SEO gain' ...
So it has no 'positive' influence on Ranking... what about Negative?
Could a poorly constructed and/or incorrectly informed sitemap prove
possibly Negative?
They are one of the tools used by Google to examine sites ...
So is it a 'measuring' device?
If so, what is it measuring?
Can it help spot problem sites or raise trouble flags?
.
Other things (now off the top of my head ;)) ...
Can a sitemap help improve the crawl rate on a site?
Does a sitemap improve the crawl rate on a site?
Can a sitemap improve the chances of crawling 'newer' content?
Does a sitemap help the crawling of 'newer' content?
Does a sitemap make it easier/more efficient to index pages/sites?
If a page has no internal links - will a Sitemap assist in getting the
page crawled?
[Yes - I'm aware of the similarity of some of the questions - but they
are different]
@John
In my opinion, the most important point you made is only listing site
nav version URLs in sitemaps. Site nav, as in user goes to homepage
clicks level 1, level 2, level 3 and then clicks a product URL. It's
that URL that should be included in sitemaps and not a "site search",
Endeca, "flat" version or other form of a URL. I see this issue come
up all the time and especially in major, enterprise level, ecommerce
sites using combined technologies. It is always a good idea to look
at sites from the user perspective bookmarking URLs along the way and
then to open your xml sitmap, do a find for the bookmarked URL to be
certain what you think is happening actually is reality.
Sorry for getting a little fired up but it really drives me nuts
explaining the definition of a "Sitemap" to CTOs! XML, HTML, TXT or
other it is still a sitemap....
> "On one hand, by giving Google a sitemap file with all the changes,
> and a list of your most important pages - you are also giving those
> details to your competitors. Yes, that Sitemap file is public for
> Google and for your competitors."
> There's a way to avoid that. Sure, if your Sitemap file is called /
> sitemap.xml or is listed in your robots.txt file, then other people
> are going to be able to view it. That said, they will also be able to
> crawl your site to find these URLs as well -- though they wouldn't
> know the "meta data". However, you can also keep your Sitemap files
> private: just use an obscure file name and submit the URLs directly to
> the search engines.
> For the other search engines, you can generally also use a similar
> "ping" method. Some also support a direct submission through a similar
> dashboard for webmasters.
> Another item mentioned in Barry's post is "I have always been a
> believer that well on-page optimized sites do not require or even
> benefit much from Google Sitemaps." Obviously most websites do not
> need Sitemap files -- otherwise we'd be doing a pretty bad job of
> crawling the web :-). However, good and "well-optimized" sites are
> often also fairly large and hopefully updated regularly. With a
> website like that, it can take a bit of time for us to discover new or
> changed content, so helping the search engines with a good Sitemap
> file really makes a lot of sense. Imagine if we had to recrawl big
> auction sites in order to discover all the new items ... Yikes!
> The URLs 'may not be crawled' ...
> So it's kind of an 'Option' for Google?
Yes, it always is, even if you submit them through the "Add URL"
form... There are a ton of factors that go into deciding how we crawl
your site and the content in the Sitemap file can play an important
role in that. That said, if your site is barely (or not) crawlable, I
would recommend fixing that over just trying to use a Sitemap file to
cover up those issues. Having a crawlable site is always important.
> It's referenced that there is no 'SEO gain' ...
> So it has no 'positive' influence on Ranking... what about Negative?
> Could a poorly constructed and/or incorrectly informed sitemap prove
> possibly Negative?
Way before I joined Google, when Sitemaps was still very new, I think
I may have seen a negative impact from really bad Sitemap files.
However, I was never able to confirm that and I have not seen anything
similar since then. I feel pretty confident that, at least on our
side, having a Sitemap file will not negatively impact your site's
ranking. Having a really bad Sitemap file won't really help your site,
but at least it generally won't harm it either.
That said, I see a lot of room for positive changes: having new and
changed content indexed quicker than other sites can really make an
impact. This may not help ranking for the main keyword, but it could
bring visitors looking for something new. Depending on your site,
those visitors could turn into new regular visitors or even convert
into customers.
> They are one of the tools used by Google to examine sites ...
> So is it a 'measuring' device? If so, what is it measuring? Can it help spot problem sites or raise trouble flags?
Sure. There are a lot of problems that can be recognized with
Sitemaps, most of these problems are reported to the webmaster through
Webmaster Tools (which is one reason to make sure that Sitemap files
are registered there). For instance, I've seen a lot of webmasters
that found out that a particular hosting setup was blocking the
Googlebot from crawling their sites thanks to messages returned for
Sitemaps in Webmaster Tools.
> Can a sitemap help improve the crawl rate on a site?
> Does a sitemap improve the crawl rate on a site?
No and yes. Heh :-). No, it can't really influence the crawl rate for
a site in general. However, it can influence how we crawl the site.
Say I have time to get 20 URLs from your site tonight, I could either
guess and pick some or you could tell me which ones to look at. I
still won't get more than 20 URLs, but it could be that I end up
getting the more relevant (or new / changed) ones. You tell me :-).
> Can a sitemap improve the chances of crawling 'newer' content?
> Does a sitemap help the crawling of 'newer' content?
Sure.
> Does a sitemap make it easier/more efficient to index pages/sites?
I'm not really sure what you mean here.
> If a page has no internal links - will a Sitemap assist in getting the
> page crawled?
Possibly. There are some reports that this has happened, though
personally I think this is a somewhat strange point to make (and it's
brought up often): if you know that your pages don't have any internal
links, then just fix it already :-). If you think that your site might
accidentally contain pages that are not linked internally, then make
sure that it doesn't :-).
Someone else asked about the various Sitemap formats. I would
generally recommend using the sitemaps.org XML format from the start,
even if you are only listing your URLs at the moment. The optional
meta data in the XML format can be added anytime later on and there
are a lot of good generators out there that can help you get the them
easily and quickly.
> No and yes. Heh :-). No, it can't really influence the crawl rate for
> a site in general. However, it can influence how we crawl the site.
> Say I have time to get 20 URLs from your site tonight, I could either
> guess and pick some or you could tell me which ones to look at. I
> still won't get more than 20 URLs, but it could be that I end up
> getting the more relevant (or new / changed) ones. You tell me :-).
> Any more questions?
Yeah. Can you (internally) find even one site on which that's
happened?
Because I've been submitting correct and accurate sitemaps for several
sites for over two years and I haven't ONCE seen the Googlebot come
for the one page that a sitemap has identified as changed.
Amazing... not only did I make the topic ... it's got a whack of
attention...
and I keep missing it!
Apologies.
Oaky... thank you vbery very much ...JohnMu..., greatly appreciate the
responses :D
.
okay... I msut admit... I love ...Phil Paynes... question... as that
was one I was going to ask (jsut not quite as 'targeted'.
As pointed out by ...Phil Payne..., and even myself....
the sitemap doesn't seem to have had any impact that shows the bots
paying attention to 'newer' content or 'recently modified' content.
So... though you say that it can influence how a site is crawled...
does it?
Would you be willing to confirm that it does happen?
.
The misunderstood question...
" Does a sitemap make it easier/more efficient to index pages/
sites? ..."
basically... does it make Googles job that bit easier to crawl and
index sites?
(the theory here being that if it's 'easier', then there is more
chance of the bot actually crawlign and indexing, compared to the bot
having to waste time dithering about and thus maybe index less ;)).
???
The last modification date is one of the items of meta-data that can
tell us a lot about a page, but it can also be extremely misleading &
even has the potential to cause "harm" (say you forgot to update the
Sitemap file but have changed the page). Because of that we take many
factors into account when picking the optimal time to recrawl an
existing page. Sadly, it's not possible to give a general yes or no.
However, before I joined Google I set up a series of test sites
explicitly to test this behaviour and there was a strong correlation
between changes in the last modification date and improved recrawling
of changed URLs. So in the end, if you can provide Sitemap files with
correct last modification dates, I would really recommend submitting
them like that.
One way to help promote those modification dates is to leverage "if-
modified-since" conditional HTTP headers in your site as well. By
doing that you can not only save bandwidth, but it also helps us to
keep track of the actual changes in content within your site.
Regarding your other question - a Sitemap file certainly makes it
easier for us to crawl your site -- but it generally plays no role in
the following steps (indexing and ranking).
Hope that helps, keep up the good work & good questions :)
As of yesterday and today, sitemaps seem to be coming into their own.
Two users have posted about seeing messages concerning their sitemaps
on the Webmaster Tools console.
In one case, Google noted that all the priorities were the same.
Since priority is relative only within a site, this achieves nothing -
but the warning is interesting.
In the second, Google noted that all pages in the site were "dynamic"
and that such sites can present problems for crawling. The reason was
that the sitemap had a changefreq of 'always' on every page.
These are the first occasions I've seen Google react to the actual
content of a sitemap, apart from 404s on URLs in sitemaps.
It looks to me very much like Google is preparing to use or might even
_be_ using the priority and changefreq information.