Google Groups Home
Help | Sign in
Discussions > Sitemap Protocol > sitemap and extra characters
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  3 messages - Collapse all
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
AlanJay  
View profile
 More options Sep 17 2007, 10:22 am
From: AlanJay
Date: Mon, 17 Sep 2007 07:22:49 -0700
Local: Mon, Sep 17 2007 10:22 am
Subject: sitemap and extra characters
I have created sitemaps for my various sites and parts of sites and
all seemed fine, I have custom scripts to spit out new versions every
day.

Recently I discovered that google seemed to be mis-reading the files
and adding extra characters to the URLs usally things link quote " or
space ' ' ie things like %20 and %22.

has there been anychange to the google parser in the last 3 months
that might have caused this to happen?

And any tips on trying to ensure it doesn't happen in the future?

Regards


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile
(1 user)  More options Oct 1 2007, 6:39 pm
From: JohnMu
Date: Mon, 01 Oct 2007 22:39:12 -0000
Local: Mon, Oct 1 2007 6:39 pm
Subject: Re: sitemap and extra characters
Hi AlanJay
I haven't seen a similar issue posted here in the groups. Could you
tell us where you see these URLs and what the URL of your sitemap file
is? Are you certain that these URLs are not just from malformed links
somewhere outside of your site?

For more information about "not found" crawl errors, please see our
FAQ:
http://groups.google.com/group/Google_Webmaster_Help/web/faqs-for-web...

Thanks!
John


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Phil Payne  
View profile
 More options Oct 2 2007, 7:45 am
From: Phil Payne
Date: Tue, 02 Oct 2007 04:45:46 -0700
Local: Tues, Oct 2 2007 7:45 am
Subject: Re: sitemap and extra characters

> I haven't seen a similar issue posted here in the groups.

http://groups.google.com/group/Google_Webmaster_Help/search?group=Goo...

It's a common problem.  It is NOT malformed URIs on the site or indeed
on any other site - if that were the case, other search engine bots
would try to find them too.

It's exclusively a Googlebot problem, and it's been around for at
least two years and possibly longer. The solution is entirely within
Google's purview - just analyse a few thousand of your 404 responses
and try to work out where you got the URI from. Do Google's systems
have no audit trails?

Some of these things are flat ridiculous and make Google look really
stupid.  About six months ago, the Googlebot tried to download every
URI on my site, but substituting .asp for all of my .html files.  What
bolleaux! No other bot has ever done this.  And I've never had a .asp
page on the site.

It seems stupifying.  It happens on all my sites - why doesn't Google
put up a simple site and see what happens?  Or check Matt Cutts'
server logs?


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2008 Google