I recently deleted some pages on a clients site. So that users who
followed links to those pages would have something to see, I put in a
301 header at the top of the pages and redirected the browser to home
page of the site.
However, I got an email a few weeks later from client, complaining
that those pages were still showing up in google's search results. I
read google's advice here:
http://www.google.com/support/webmasters/bin/answer.py?answer=64033
and decided to remove the content from the index by setting up a
robots.txt file. I think that this is preferable to settting the
return code to 404 or 410 or adding a meta-tag, as they suggest. It
makes more sense for a visitor to the site to get bounced to the
homepage. Possibly.
An issue has already been raised to do with HTTP codes for non-
existant pages:
http://code.google.com/p/haddock-cms/issues/detail?id=35
In the DB pages plug-in, I think that we should go beyond that. When a
page is deleted, the admin should have an option for how the page
should be deleted.
If a DB page does not exist, then 404 should be returned. If it is
deleted, then 410 should be returned. The admin should also have the
option to delete the page and then have vistitors redirected to some
other page. Another way that a page could be deleted would be for the
content to still be there but for the page to be removed from the
search engines indexes.
I've already raised an issue to do with the robots.txt file:
http://code.google.com/p/haddock-cms/issues/detail?id=12
Part of the robots.txt file could be generated dynamically with a list
of pages that should not be indexed.
Also, those pages could have meta tags added like this:
http://www.google.com/support/webmasters/bin/answer.py?answer=61050
See also:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html