My webhost is 1&1, and they have a sitemap creator that created a
sitemap for my site that has 24,163 URLs! That apparently includes
all the image files, of which I have quite a few.
Do I want my image file URLs in my sitemap? or just the html pages?
It's cool to see your hoster already making Sitemap files, I like
that! In general, it doesn't make much sense to include content in
your Sitemap files which we can't show in web search results. Content
which is shown in other kinds of search results (News, Video and Geo)
generally has it's own Sitemap format, which you could use if those
files are very relevant to your site. Images are a bit special since
we generally need to have context for the images -- we need to find
the pages on your site which include the images so that we know what
the images are about.
That said, it certainly won't do any harm to keep those URLs in there,
if they're automatically added and maintained by software that your
hoster runs automatically. We'll generally skip over the image URLs
and concentrate on the indexable ones instead :-).
> It's cool to see your hoster already making Sitemap files, I like
> that! In general, it doesn't make much sense to include content in
> your Sitemap files which we can't show in web search results. Content
> which is shown in other kinds of search results (News, Video and Geo)
> generally has it's own Sitemap format, which you could use if those
> files are very relevant to your site. Images are a bit special since
> we generally need to have context for the images -- we need to find
> the pages on your site which include the images so that we know what
> the images are about.
> That said, it certainly won't do any harm to keep those URLs in there,
> if they're automatically added and maintained by software that your
> hoster runs automatically. We'll generally skip over the image URLs
> and concentrate on the indexable ones instead :-).
With significantly less than 50'000 URLs, it probably doesn't matter
which format you use, both compressed and non-compressed would be
fine. The easiest way to swap out the Sitemap file is just to use the
same name as the previous one you used. That way, you don't have to
tell anyone that you changed it, we (and Yahoo & Microsoft, if they
know about it) will just pick up the new one and continue working with
that file.
Well, I got a little over-eager :o) and went ahead and deleted the 1&1
sitemap, and submitted the new GSiteCrawler sitemap in Google
Webmaster Tools.
GSiteCrawler is pretty neato! I disallowed a few directories, and
told it some file extensions to ignore, and the resulting sitemap only
has 849 URLs in it, which seems a lot more reasonable.
I guess the only thing left to do is wait, eh? Any idea how long it
takes to index the sitemap?
Also, if you don't mind... there's a line in the sitemap file that
prevents it from opening in a browser window. The line in the file
is:
<?xml-stylesheet type="text/xsl" href="gss.xsl"?>
Any idea why GSiteCrawler would put that line in there? Might it cause
a problem?
You might also want to check out Enhanced image search in Google
Webmaster Tools (Dashboard/Tools). I have one image on my humor site
that gets @10 hits a day from Google Image Search and I've done
nothing more (no separate sitemap, etc.) than check the box there.
Remember to have relevant alt tags.
> Well, I got a little over-eager :o) and went ahead and deleted the 1&1
> sitemap, and submitted the new GSiteCrawler sitemap in Google
> Webmaster Tools.
> GSiteCrawler is pretty neato! I disallowed a few directories, and
> told it some file extensions to ignore, and the resulting sitemap only
> has 849 URLs in it, which seems a lot more reasonable.
> I guess the only thing left to do is wait, eh? Any idea how long it
> takes to index the sitemap?
> Also, if you don't mind... there's a line in the sitemap file that
> prevents it from opening in a browser window. The line in the file
> is:
> <?xml-stylesheet type="text/xsl" href="gss.xsl"?>
> Any idea why GSiteCrawler would put that line in there? Might it cause
> a problem?
that line specified a stylesheet that can be used to display the
Sitemap file in a browser. The search engines ignore it, but your
browser will try to use it. You can either upload the stylesheet as
well or remove that line (I think there's an option to do that).
Keep in mind that submitting a Sitemap file does not guarantee that
we'll crawl and index all of your pages right away. The Sitemap file
just gives us additional information about your URLs so that when it
does come time to crawl more of them, we'll be better informed and be
able to make better choices during the crawl process.