Hi All,
I'm investigating an issue at the moment where GoogleBot appears to be
asking for files on one domain that exist on another and could do with
some input from other brains.
The issue is that for example, a file exists at
www.domain-one.co.nz/files/123/executive-summary.pdf (and has for some
time) and now GoogleBot is now also trying to index that file at
www.domain-two.co.nz/files/123/executive-summary.pdf where it does not
exist.
In the last day I've seen 20 different occurrences of this (different
domain names, same files name) for a total of 183 requests. These
requests have been consistent - when a non-existent file is requested
it's always the same file and same domains. I can't spot a pattern in
the domains and files being requested. According to our Apache access
logs these started happening yesterday at 1:36pm NZ time and didn't
appear to have been happening before that.
Some relevant points:
1. The sites in question are all running on a custom in-house CMS and
on the same server.
2. The files being requested are stored in separate directories and in
separate mySQL databases.
3. I cannot find any reference to these files on the "bad" domain
(grepped the database and file contents)
Possible explanations:
1. Google has changed something (e.g. looking harder for duplicate
content) and are now asking for files in a way they hadn't previously.
2. We've screwed something up and are unwittingly telling Google that
those files exist with some kind of site map.
3. Somewhere someone has made an incorrect index of the sites and
GoogleBot is treating those links as authoritative.
What I think I want:
When we see a request coming for a non-existent file we want to know
the *reason* why GoogleBot thinks it's OK to ask for that file. I've
looked in Google's Webmaster Tools (we don't have it installed yet on
an affected domain) and can't find this anyway. Hopefully this will
get us closer to an answer and I'm hoping that it's just a facepalm
issue but the multiple domain-ness of it is just weird.
Any other suggestions? I'll post the actual URL's if required but
would prefer not to disclose them at this point.
Cheers,
- Bob -
--
Bob Brown, [L|W]AMP Web Developer
gur...@gmail.com,
http://www.guru.net.nz