Weird GoogleBot Issue

19 views
Skip to first unread message

Bob Brown

unread,
May 10, 2012, 8:56:10 PM5/10/12
to nzp...@googlegroups.com
Hi All,

I'm investigating an issue at the moment where GoogleBot appears to be
asking for files on one domain that exist on another and could do with
some input from other brains.

The issue is that for example, a file exists at
www.domain-one.co.nz/files/123/executive-summary.pdf (and has for some
time) and now GoogleBot is now also trying to index that file at
www.domain-two.co.nz/files/123/executive-summary.pdf where it does not
exist.

In the last day I've seen 20 different occurrences of this (different
domain names, same files name) for a total of 183 requests. These
requests have been consistent - when a non-existent file is requested
it's always the same file and same domains. I can't spot a pattern in
the domains and files being requested. According to our Apache access
logs these started happening yesterday at 1:36pm NZ time and didn't
appear to have been happening before that.

Some relevant points:

1. The sites in question are all running on a custom in-house CMS and
on the same server.
2. The files being requested are stored in separate directories and in
separate mySQL databases.
3. I cannot find any reference to these files on the "bad" domain
(grepped the database and file contents)

Possible explanations:

1. Google has changed something (e.g. looking harder for duplicate
content) and are now asking for files in a way they hadn't previously.
2. We've screwed something up and are unwittingly telling Google that
those files exist with some kind of site map.
3. Somewhere someone has made an incorrect index of the sites and
GoogleBot is treating those links as authoritative.

What I think I want:

When we see a request coming for a non-existent file we want to know
the *reason* why GoogleBot thinks it's OK to ask for that file. I've
looked in Google's Webmaster Tools (we don't have it installed yet on
an affected domain) and can't find this anyway. Hopefully this will
get us closer to an answer and I'm hoping that it's just a facepalm
issue but the multiple domain-ness of it is just weird.

Any other suggestions? I'll post the actual URL's if required but
would prefer not to disclose them at this point.

Cheers,

- Bob -

--
Bob Brown, [L|W]AMP Web Developer
gur...@gmail.com, http://www.guru.net.nz

David Neilsen

unread,
May 11, 2012, 9:24:48 AM5/11/12
to nzp...@googlegroups.com
I would recommend putting a 301 permanent redirect on any content that has been access at an incorrect URL.

You can't alway guess why this happens, but it is likly some one linked to it incorrectly.

There are also other things you could do if it fits your needs:
  • The URL is incorrect, and totally irrelevant for the domain, issue a 404 as it is not found in the current context
  • The URL was old, but updated, issue a 3xx redirect, and/or implement canonical URLs
  • And if you truly think this is an issue on Google's part, you could try and report it to them, but I don't think you would get very far. So I would recommend one of the above.

David Neilsen | 07 834 3366 | PANmedia ®


--
NZ PHP Users Group: http://groups.google.com/group/nzphpug
To post, send email to nzp...@googlegroups.com
To unsubscribe, send email to
nzphpug+u...@googlegroups.com

Reply all
Reply to author
Forward
0 new messages