That Google ip range is not sufficient to ensure all Googlebots are
allowed.
There are many, many other Ip ranges for Googlebot.
You really should not block any.
You robots.txt file is pretty useless. All those rogue robots you want
to block will never obey the robots.txt so it's pointless to even try
to disallow them that way.
You need to simplify it and concentrate on how you want the major
robots to crawl the site - what they are to be disallowed. Never mind
the bad robots.
As for your site, you have a canonical domain problem.
You urls are accessible both as www and non-www.
Your homepage is at 4 urls:
http://www.innovation-point.com/
http://www.innovation-point.com/index.htm
http://innovation-point.com/
http://innovation-point.com/index.htm
You need to decide if you want to use www or non-www urls and 301
redirect the "wrong" form to the preferred one.
You also need to refer to the homepage (i.e. the root) as "/" (or
using the preferred domain name, e.g. http://www.example.com/ ). Never
as index.htm . So you need to change yoru navigation accordingly. 301
redirect index.htm back to root.
Since you are on an Apache serevr the solution is simple using
the .htaccess file:
Options +Indexes +FollowSymlinks
RewriteEngine on
RewriteBase /
### re-direct index.htm to root / ###
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.htm\ HTTP/
RewriteRule ^(.*)index\.htm$ /$1 [R=301,L]
### re-direct non-www to www
rewritecond %{http_host} ^innovation-point.com [nc]
rewriterule ^(.*)$ http://www.innovation-point.com/$1 [r=301,nc]
You have some easy to fix validation errors:
http://validator.w3.org/check?verbose=1&uri=http%3A%2F%2Fwww.innovati...
I suggest also you remove the first line in the source code:
<!-- saved from url=(0022)http://internet.e-mail -->
It will put the browser into quirks mode (anything before the doctype
would) and you may have unexpected layout problems.
Also remember for the future that sending (and receiving) web pages by
email may change the source code. Best is to zip them and email them.
On Dec 18, 11:13 pm, sokap wrote:
> I have been troubleshooting this for hours and hours and this seems
> like a real stumper to me, so any help would be greatly appreciated.
> Here's the scenario:
> My domain iswww.innovation-point.comand it's a direct domain without
> any redirects. For the last few years many pages on our site have
> risen and ranked quite high in Google. After opening a Google
> Webmaster Tools account six weeks ago, all of our URLs have become
> "unreachable" and just about every page has dropped completed out of
> Google search rankings. Over the past week the number of unreachable
> URLs has gone from 40 down to 35 and now it's up to 38.
> There are two errors that are consistent:
> 1. Almost all of our pages/URLs are listed as unreachable due to
> "robots.txt unreachable"
> 2. The xml sitemap is returning an error due to "Network unreachable:
> robots.txt unreachable"
> Here's what we've done to remedy this:
> 1. Created a new robots.txt file which is located atwww.innovation-point.com/robots.txt
> 2. Created a new sitemap using a tool fromhttp://www.xml-sitemaps.com/
> which is now located on our site atwww.innovation-point.com/sitemap.xml
> 3. While we don't currently have any "penalties" by Google, we ran a
> website spam detector athttp://tool.motoricerca.info/spam-detector/
> to see if our designer from a few years ago did anything with hidden
> text etc. that we didn't know about - the scan came up perfectly
> clean.
> 4. Run an html validator located athttp://validator.w3.org/. Here,
> we discovered a few VERY MINOR html issues which would be VERY
> surprising if they were causing the major problems we're experiencing
> now. We've fixed most of these, though some very nominal small items
> still exist (e.g., like a few image spacers didn't have alt-text).
> 5. We shared all this with out hosting provider
> (www.totalhosting.com) and they said they changed the server
> configuration to "allow Google IPs into the firewall". After that
> didn't work, they then did the following: "verified and checked that
> there was no block on the server which could cause problem for
> robots.txt inaccessibility. However after reviewing few articles on
> googlewebmastercentral blogspot we have allowed google IP range
> 66.249.64.0/19 on the server."
> After all this, out sitemap submission is still receiving the same
> error and the number of unreachable URLs is going up weekly due to the
> unreachable robots.txt error. According to Google Webmaster Tools,
> Googlebot continues to attempt to reach our URLs (most of the
> unreachable attempts are from the last two weeks) with no success.
> I'm not a technical person myself, but I sure am learning a lot about
> networks, html, and Google from this painful process. Any help with
> this would be GREATLY appreciated, and I'll be super impressed with
> anyone who can pin this one down! Thanks in advance for your help.