I'm getting this error on my sitemaps section of webmasters tools, and
I'm at a complete loss to figure it out. I've worked with my host
extensively on this issue and it has been happening for almost a
month. We've moved my site to a completely different sever in cause
there was a firewall blocking the sitemap googlebot, but I have 2
other sites with my host and I don't have this problem with those
sites only this one. We've looked into the DNS records and everything
checked out. We can't figure out anything that would keep the sitemap
googlebot from accessing my site.
Your sitemap is being served with some extra invisible characters at
the top.
Use http://www.rexswain.com/httpview.html (pick text) to view what is
being served.
> I'm getting this error on my sitemaps section of webmasters tools, and
> I'm at a complete loss to figure it out. I've worked with my host
> extensively on this issue and it has been happening for almost a
> month. We've moved my site to a completely different sever in cause
> there was a firewall blocking the sitemap googlebot, but I have 2
> other sites with my host and I don't have this problem with those
> sites only this one. We've looked into the DNS records and everything
> checked out. We can't figure out anything that would keep the sitemap
> googlebot from accessing my site.
Hi Dave
You should be able to tell if we're generally able to reach your site
if Webmaster Tools has accepted your Sitemap file and is able to
process it accordingly. It's possible that the errors that you see are
some older errors (you should be able to tell by the date next to
them). In that case, they'll disappear as we continue to recrawl your
site.
Hi John, Dave
I have the same robots.txt timeout error despite having no robots.txt
file and I've noticed that this issue seems to crop up again and again
in forums and groups.
Isn't this a problem with Googlebot? I've changed nothing on my
site( www.inglesmadrid.com) apart from add a couple of new pages; And
the hosting company (Arsys, the largest in Spain) swears that they are
not blocking Googlebot. But after a year with no problems I suddenly
have this timeout error for each access googlebot has made in
September.
Sorry to accuse Googlebot in this way! Might it be a good idea if
Google Webmaster service offered us a bit more information on the
timeout problem?
best regards
chris
> Hi Dave
> You should be able to tell if we're generally able to reach your site
> if Webmaster Tools has accepted your Sitemap file and is able to
> process it accordingly. It's possible that the errors that you see are
> some older errors (you should be able to tell by the date next to
> them). In that case, they'll disappear as we continue to recrawl your
> site.
Hi Chris
It looks like we started running into problems reaching your site
around the 3rd of September. Keep in mind than an "unreachable
robots.txt" error does not mean that your robots.txt has to exist --
it's just that we couldn't even get a proper response back from the
server when we tried to check the robots.txt (or to see if one
exists).
In general, this is not an issue of adding / changing a few pages,
it's almost always (at least all the times I've checked :-)) been an
issue with either the server using a security module or a firewall/
router that is too strict. The best way to check this is to submit a
Sitemap file and to see how it's being processed (this usually goes
fairly quickly, so you should be able to see if it's working or not in
much less than an hour).
Hi John,
Thanks very much for your advice, i will add the sitemap and report
back.
I always worked on the basis that the simpler the better, so no
sitemap, no robots.txt file. That way i figured there would be fewer
things to go wrong. But i'll try this!
best regards
chris
> Hi Chris
> It looks like we started running into problems reaching your site
> around the 3rd of September. Keep in mind than an "unreachable
> robots.txt" error does not mean that your robots.txt has to exist --
> it's just that we couldn't even get a proper response back from the
> server when we tried to check the robots.txt (or to see if one
> exists).
> In general, this is not an issue of adding / changing a few pages,
> it's almost always (at least all the times I've checked :-)) been an
> issue with either the server using a security module or a firewall/
> router that is too strict. The best way to check this is to submit a
> Sitemap file and to see how it's being processed (this usually goes
> fairly quickly, so you should be able to see if it's working or not in
> much less than an hour).
Thought I would update this thread in case anyone comes across it
while searching for a solution.
In the end i decided to do... er.... absolutely nothing. And that
seems to have done the trick! Since Sept 20 I've had no more "time-
out" errors and the web site (http://www.inglesmadrid.com) seems to be
slowly (very) climbing back up the rankings.
Conclusion?
Really I'm not sure. I'd say either this was a temporary problem with
Google, or my web hosting company had in fact changed the server
configuration (although they denied it) and then changed it back
(again, without saying anything)
Recommendation?
1. If you have fancy things like robots.txt files / site maps / meta
no-follow tags, obviously you’d want to check these first.
2. Talk to your web hosting company asap. The sooner they know
there's a problem the better.
3. Wait at least 7 days to see if you stop getting new timeout errors
(listed on Google Webmaster tools)
4. If after 10 days you’re still getting timeout errors…panic.
Consider replacing the website with alternative media : brochures,
letters-in-a-bottle, smoke signals, yodels, chinese whispers etc.
If you are still stuck, feel free to contact me (details on the
website) as I might yet find a definitive answer!
It looks like this might be the same problem as Chris had. Regardless
of Chris' recommendation to wait, I would start asking the hoster
questions -- who knows, maybe then it will also disappear
automatically ;-). In the cases I've diagnosed, I have yet to see one
where the unreachable issue is not something on the hosting side of
things, so that's what I would look at first.
> It looks like this might be the same problem as Chris had. Regardless
> of Chris' recommendation to wait, I would start asking the hoster
> questions -- who knows, maybe then it will also disappear
> automatically ;-). In the cases I've diagnosed, I have yet to see one
> where the unreachable issue is not something on the hosting side of
> things, so that's what I would look at first.
> Hope it helps!
> John
Hmmm, seems I've started getting the error too since October 5 on one
of four domains I have listed in webmaster tools. The site's been
indexed fine by google for more than 2 years. Sitemap gets regularly
downloaded without errors, as does a robots.txt file I'd had in place.
I always had just a blank (ie empty) robots.txt uploaded just to stop
404 errors when bots went searching for one.
Having just logged in, I noticed some URL unreachable errors in report
on the URL for first time ever from googlebot visits over past week.
There were about 70 errors out of around 300 pages for the site -
different pages unreachable on different days, oddly enough. I figured
there'd be some error, so entered a resubmit on the sitemap. There's
no errors on 3 other sites I have set-up and indexed in webmaster
tools, all at the same host and same ftp.
Error came back this time after resubmittal of sitemap:
Network unreachable: robots.txt unreachable
We were unable to crawl your Sitemap because we found a robots.txt
file at the root of your site but were unable to download it. Please
ensure that it is accessible or remove it completely.
Advice is to fix or remove the robots.txt ... so I figure I'll delete
the robots.txt, since I essentially had nothing in it anyway. Two more
resubmits of sitemap over past hour or so comes back with the same
message: you found a robots.txt file but were unable to download it.
Since there's no robots.txt file anymore, this surprises me. You
should be getting a 404 redirect to homepage.
Are you sure this isn't something at Google's end going wrong? I've
not had these errors before in over two years, like others posting in
this thread. And I haven't changed anything in sitemap/robots.txt
configuration. Other domains with same host on same ftp aren't getting
any errors like it.
Looks like my host maybe had a not exactly the same, but similar
problem to that guy's. They don't have any kind of firewalls on the
webserver network segment, but they do have something which blocks IPs
that have tried to brute force logins. After contacting my host, they
said: I can disable this for a few days to see if it might have
blocked google, except we have them on a whitelist so they shouldn't
be blocked.
Googlebot IPs were whitelisted and shouldn't have been blocked, but
looks increasingly like they might have been. First resubmit of
sitemap after host switched off brute force blocked IPs and thankfully
I get an OK on submission and no more robots.txt unreachable errors.
I'll check in again tomorrow to see if all still ok.
Incidentally, does the googlebot IP range always stay pretty static?
Hello,
the hosting company (Arsys) swears that they are not blocking
Googlebot,
and say the problem is "on the side of Google."
They recommend that contact with Google to review their programming!
What can I do?, How can I prove where is the problem?
---------------------------
(Hola,
la empresa de hosting (Arsys) asegura que ellos no están bloqueado el
acceso a Googlebot, y dicen que el problema es "del lado de google".
Nos recomiendan que contactemos con Google para que revisen su
programación !!!
¿Qué puedo hacer?, ¿Cómo puedo demostrar dónde está el problema?)
> It looks like this might be the same problem as Chris had. Regardless
> of Chris' recommendation to wait, I would start asking the hoster
> questions -- who knows, maybe then it will also disappear
> automatically ;-). In the cases I've diagnosed, I have yet to see one
> where the unreachable issue is not something on the hosting side of
> things, so that's what I would look at first.
David & Chris, (and others that has the same issue on this thread),
the EXACT same thing happened to me, and yes there's other threads
about this at these forums and elsewhere, and, as you alluded Chris,
it IS A GOOGLE PROBLEM but they are not going to admit it (nor will
"webado"). It resulted in myself being ruined. Suddenly, with NO
DOWNTIME on my server nor any changes on my part; 95% of my pages were
"not reachable" like yours, either from "robots.txt timed out" or
"network unreachable", which resulted in ALL of my MAIN PAGES getting
removed from their index. THANKS GOOGLE. The unreachable list even
included long gone webpages that I 301'd to new webpages years ago.
My hosts long ago white-listed all google IP's.
My pages that were trashed were TEN YEARS OLD, all appeared on the
FIRST page of results for their respective search phrases, most of
them NUMBER ONE, (all PR3 and PR4 pages). So G not only screws site
owners with this "yearly event", but screws their users as well by
doing them this disservice--a disservice that NO OTHER search engine
will do.
Notice too that if you check the PR of these pages, it's DROPPING.
They remove the pages from the index, and your PR will eventually drop
to ZERO. That's another google screw up because *decent* PR CANNOT
DROP TO ZERO OVERNIGHT. It's impossible to lose ALL IBL's overnight,
so obviously that is NOT what is happening. (PR check at http://oyoy.eu/google/pr/ still not working).
Yes, it has happened before and will happen every year. When?
Guess.....every Christmas!! Every single year, year after year for
the last FIVE years, google has ruined millions of websites by
removing them from the index. It's usually small mom & pop business
sites that get screwed so Amazon, Ebay, Buy.com and the like will get
all the Christmas business. The attack usually starts around Oct.
thru Nov. and of course lasts through Christmas, and of course all is
well again about Jan 1st!! So it's RIGHT to accuse the Gbot, algo's,
or google of this because it's been proven time and time again to
happen, as so many others have noticed here and at other forums.
Let's see John Mu try and explain that. It is NOT coincidental that
so many site owners suddenly have the exact same unexplainable issue
and no one can convince anyone in their right mind with an IQ over 50
that it's a "server problem" and not google. It's the Gbot choking on
something, like yet another screwed up algo that's backfired. If you
really wanted to help John, you would ACCEPT that this is a google
problem (I'm sure you already realize it but won't admit it), look
into it, and FIX IT.
The past week I have seen less and less pages in that area of the WMT,
the list is slowly shrinking daily, but it has resulted in NONE of my
long-standing top-ranking webpages returning!!
-David-
> Hi John, Dave
> I have the same robots.txt timeout error despite having no robots.txt
> file and I've noticed that this issue seems to crop up again and again
> in forums and groups.
> Isn't this a problem with Googlebot? I've changed nothing on my
> site(www.inglesmadrid.com) apart from add a couple of new pages; And
> the hosting company (Arsys, the largest in Spain) swears that they are
> not blocking Googlebot. But after a year with no problems I suddenly
> have this timeout error for each access googlebot has made in
> September.
> Sorry to accuse Googlebot in this way! Might it be a good idea if
> Google Webmaster service offered us a bit more information on the
> timeout problem?
> best regards
> chris
> David & Chris, (and others that has the same issue on this thread),
> the EXACT same thing happened to me, and yes there's other threads
> about this at these forums and elsewhere, and, as you alluded Chris,
> it IS A GOOGLE PROBLEM but they are not going to admit it (nor will
> "webado"). It resulted in myself being ruined. Suddenly, with NO
> DOWNTIME on my server nor any changes on my part; 95% of my pages were
> "not reachable" like yours, either from "robots.txt timed out" or
> "network unreachable", which resulted in ALL of my MAIN PAGES getting
> removed from their index. THANKS GOOGLE. The unreachable list even
> included long gone webpages that I 301'd to new webpages years ago.
> My hosts long ago white-listed all google IP's.
> My pages that were trashed were TEN YEARS OLD, all appeared on the
> FIRST page of results for their respective search phrases, most of
> them NUMBER ONE, (all PR3 and PR4 pages). So G not only screws site
> owners with this "yearly event", but screws their users as well by
> doing them this disservice--a disservice that NO OTHER search engine
> will do.
> Notice too that if you check the PR of these pages, it's DROPPING.
> They remove the pages from the index, and your PR will eventually drop
> to ZERO. That's another google screw up because *decent* PR CANNOT
> DROP TO ZERO OVERNIGHT. It's impossible to lose ALL IBL's overnight,
> so obviously that is NOT what is happening. (PR check athttp://oyoy.eu/google/pr/ > still not working).
> Yes, it has happened before and will happen every year. When?
> Guess.....every Christmas!! Every single year, year after year for
> the last FIVE years, google has ruined millions of websites by
> removing them from the index. It's usually small mom & pop business
> sites that get screwed so Amazon, Ebay, Buy.com and the like will get
> all the Christmas business. The attack usually starts around Oct.
> thru Nov. and of course lasts through Christmas, and of course all is
> well again about Jan 1st!! So it's RIGHT to accuse the Gbot, algo's,
> or google of this because it's been proven time and time again to
> happen, as so many others have noticed here and at other forums.
> Let's see John Mu try and explain that. It is NOT coincidental that
> so many site owners suddenly have the exact same unexplainable issue
> and no one can convince anyone in their right mind with an IQ over 50
> that it's a "server problem" and not google. It's the Gbot choking on
> something, like yet another screwed up algo that's backfired. If you
> really wanted to help John, you would ACCEPT that this is a google
> problem (I'm sure you already realize it but won't admit it), look
> into it, and FIX IT.
> The past week I have seen less and less pages in that area of the WMT,
> the list is slowly shrinking daily, but it has resulted in NONE of my
> long-standing top-ranking webpages returning!!
> -David-
> On Sep 21, 7:30 am, chris barcelona wrote:
> > Hi John, Dave
> > I have the same robots.txt timeout error despite having no robots.txt
> > file and I've noticed that this issue seems to crop up again and again
> > in forums and groups.
> > Isn't this a problem with Googlebot? I've changed nothing on my
> > site(www.inglesmadrid.com) apart from add a couple of new pages; And
> > the hosting company (Arsys, the largest in Spain) swears that they are
> > not blocking Googlebot. But after a year with no problems I suddenly
> > have this timeout error for each access googlebot has made in
> > September.
> > Sorry to accuse Googlebot in this way! Might it be a good idea if
> > Google Webmaster service offered us a bit more information on the
> > timeout problem?
> > best regards
> > chris- Hide quoted text -
Hi David
As webado mentioned, it's impossible to say much without knowing your
URL.
At any rate, this is an issue that can be resolved - so I think it
would definitely help your site most to get it resolved. While it is
always possible that there are issues on our side, so far I have not
run into anything on our side, with any site, that would result in the
symptoms that you describe.
Hello
my url is www.mazarron.es (as I said before)
, my hosting company (Arsys) swears that they are not blocking
Googlebot,
and say again that the problem is "on the side of Google."
> Hi David
> As webado mentioned, it's impossible to say much without knowing your
> URL.
> At any rate, this is an issue that can be resolved - so I think it
> would definitely help your site most to get it resolved. While it is
> always possible that there are issues on our side, so far I have not
> run into anything on our side, with any site, that would result in the
> symptoms that you describe.
Yes, well when hosters swear they aren't doing it, my experience is
that they are doing it but either don't know it or can't be bothered
to try to find out. Probbaly dont' know how.
However you don't have a robots.txt file at all and I am getting a 404
for that, which is OK. If any of Googlebot's ips are blocked, they'd
not know whether you have or don't have a robots.txt file at all.
> Hello
> my url iswww.mazarron.es(as I said before)
> , my hosting company (Arsys) swears that they are not blocking
> Googlebot,
> and say again that the problem is "on the side of Google."
> :(
> On 12 oct, 09:35, JohnMu wrote:
> > Hi David
> > As webado mentioned, it's impossible to say much without knowing your
> > URL.
> > At any rate, this is an issue that can be resolved - so I think it
> > would definitely help your site most to get it resolved. While it is
> > always possible that there are issues on our side, so far I have not
> > run into anything on our side, with any site, that would result in the
> > symptoms that you describe.
From what I can see, it looks like we have had a lot of trouble
reaching your site since the 4th of October. Perhaps something was
changed then? Do you see the Googlebot accesses in your server logs?
Feel free to point your hoster to this thread, if you like.
previous to 3th of October, in the logfiles of the server, there are
hundreds of access of googlebot from IPs
66.249.71.42, 66.249.71.43 and 66.249.71.44
Since 4th of October there are none.
The google webmaster tools tell there are 65 "url inaccesibles" ("No
se puede acceder a robots.txt" and "No se puede acceder a la red")
and in the sitemap tell "No se puede acceder a la red.: No se puede
acceder a robots.txt.
No hemos podido rastrear su sitemap porque no nos ha sido posible
descargar el archivo robots.txt de la raíz de su sitio. Asegúrese de
que sea accesible o elimínelo por completo."
In this point, if my hoster insists they are not blocking googlebot, I
only can change the hosting company, don't?
> From what I can see, it looks like we have had a lot of trouble
> reaching your site since the 4th of October. Perhaps something was
> changed then? Do you see the Googlebot accesses in your server logs?
> Feel free to point your hoster to this thread, if you like.
Well, if you cannot convince your hosting company to take a real good
luck at what they are doing, yes, unfortunately you have to change
hosting company.
> previous to 3th of October, in the logfiles of the server, there are
> hundreds of access of googlebot from IPs
> 66.249.71.42, 66.249.71.43 and 66.249.71.44
> Since 4th of October there are none.
> The google webmaster tools tell there are 65 "url inaccesibles" ("No
> se puede acceder a robots.txt" and "No se puede acceder a la red")
> and in the sitemap tell "No se puede acceder a la red.: No se puede
> acceder a robots.txt.
> No hemos podido rastrear su sitemap porque no nos ha sido posible
> descargar el archivo robots.txt de la raíz de su sitio. Asegúrese de
> que sea accesible o elimínelo por completo."
> In this point, if my hoster insists they are not blocking googlebot, I
> only can change the hosting company, don't?
> Murcianico
> On 12 oct, 14:27, JohnMu wrote:
> > Hi murcianico
> > From what I can see, it looks like we have had a lot of trouble
> > reaching your site since the 4th of October. Perhaps something was
> > changed then? Do you see the Googlebot accesses in your server logs?
> > Feel free to point your hoster to this thread, if you like.
I would suggest encouraging them to post here -- perhaps there is
something that we can figure out together. In general, if there is an
issue on the hoster's side, it will affect many sites, so this should
definitely be something they should be interested in resolving.
If you're absolutely sure that it's not on your site (are you sure
your site does not use "mod_security" or something similar?), then you
can also try to see if we are crawling and indexing other sites (on
the same server), by using the advanced search settings with a
timeframe of a week. Sometimes this encourages the web hoster to
resolve the issue quicker :-)
I'm new into this discussion, but we are having the same problem, and
everywhere I search for a solution I am getting different suggestions
(check the robots.txt, check htaccess, it's your host blocking Google,
it's a problem with Google etc).
I am betting that this is a problem with our host blocking Google, but
as per usual our first attempt to communicate with the host resulted
in them saying that the problem is not at their end.
I want to convince them that the problem is most likely at their end,
but I don't know enough about Firewall Setup or Mod Security, but it
looks very much to me that one of these areas probably has something
to do with the problem.
I just want to ask a simple question (I know I should be asking my
host this, but I already did and they either didn't read it or
couldn't understand what I was asking). If someone can answer this,
maybe we can start to get to the bottom of the most likely cause for
this blocking.
Until a few days ago Googlebot could access both sites.
Now Googlebot can only access the first one.
The eror is that robots.txt is unreachable and Network Unreachable.
We tried removing robots.txt altogether and resubmitting our sitemap,
and we received an error saying that Google found our robots.txt
(which no longer existed) but was unable to download it. This tells me
that Google most likely wasn't even able to connect to the site -
hence blocking is the most likely answer.
QUESTION:-
With two sites on the same server, and only one site being blocked, is
it still possible that this is related to the Firewall or Mod
Security? Or can you think of any other way that a server may block
Google from one site and not the other? We are not blocking using IP
Deny Manager, and .htaccess is not blocking.
# All robots will spider the domain
User-agent: *
Disallow: /cgi-bin/
Disallow: /ssl/
Disallow: /suspended.page/
In other words all items which are disallowed grouped together under
the user agent.
The .htaccess file is protected by the server and should NEVER be
accessible to any http requests. If it is, disallowing it in
robots.txt is useless, your site is doomed anyway.
Moving on to how your pages are built, your navigation is quite likely
non-functional for a robot. Having links like this:
<a href="../../../../../../catalogue/">Catalogue</a>
means broken links. While the browser compensates for this essentially
malformed uri, a robot is quite unable to do that.
A single point of failure because both nameservers and the actual
website server are all on the same subnet. If one is down, all 3 will
likely be down and so is the site. This may result in transient access
failures.
Of course in addition Googlebot may well be also blocked at the
firewall.
> I'm new into this discussion, but we are having the same problem, and
> everywhere I search for a solution I am getting different suggestions
> (check the robots.txt, check htaccess, it's your host blocking Google,
> it's a problem with Google etc).
> I am betting that this is a problem with our host blocking Google, but
> as per usual our first attempt to communicate with the host resulted
> in them saying that the problem is not at their end.
> I want to convince them that the problem is most likely at their end,
> but I don't know enough about Firewall Setup or Mod Security, but it
> looks very much to me that one of these areas probably has something
> to do with the problem.
> I just want to ask a simple question (I know I should be asking my
> host this, but I already did and they either didn't read it or
> couldn't understand what I was asking). If someone can answer this,
> maybe we can start to get to the bottom of the most likely cause for
> this blocking.
> Until a few days ago Googlebot could access both sites.
> Now Googlebot can only access the first one.
> The eror is that robots.txt is unreachable and Network Unreachable.
> We tried removing robots.txt altogether and resubmitting our sitemap,
> and we received an error saying that Google found our robots.txt
> (which no longer existed) but was unable to download it. This tells me
> that Google most likely wasn't even able to connect to the site -
> hence blocking is the most likely answer.
> QUESTION:-
> With two sites on the same server, and only one site being blocked, is
> it still possible that this is related to the Firewall or Mod
> Security? Or can you think of any other way that a server may block
> Google from one site and not the other? We are not blocking using IP
> Deny Manager, and .htaccess is not blocking.
Thanks for that info. I'll look into all of that and see if we can get
it all fixed. The web site does get crawled (until this latest
problem), and the smallparts.com.au site has over 5000 pages on
Google, but maybe if we look into your suggestions we will get a lot
more indexed.
Anyway, I'm pleased to share with everyone that we have found the
fault to be with the firewall on the server.
It was very difficult to find much info about this, and our Internet
Host was useless. First they said it was nothing to do with them, and
when we pushed further they suggested to try "allowing" the Googlebot
IP's in the firewall setup. The firewall is called "csf", and I'm sure
many of you know the program. It's very good at what it does, but I
couldn't find any help files on how to allow batches of IP address.
Our web host couldn't help with this either.
Anyway I pieced together a lot of info from many sources and finally
came up with a solution which may be right or may be wrong, but
Googlebot can now access the Robots.txt file again and is once again
crawling our site.
Here are a couple of key tips that may be useful to others...
1. If your Internet Host says that the problem is not with them, it
probably is. My experience is that many people who work at Internet
Hosting companies get very annoyed when you break into their online
gaming time with (how dare you) a question. The simplest way for them
to get back to their game is to deny responsibility and hope you will
go away. They will then deal with the 5% of people who come back to
them and push the issue - when they finish their game.
2. If you or your host are using a firewall program called "csf",
check in the "Firewall Deny IP's" area. If any of Googlebots IP
addresses have been added, that's where the error lies. Remove them
and restart the firewall.
3. It wouldn't hurt to also add Googlebot's IP addresses to the
"Firewall Allow IP's" area too. Now "csf" comes with no instructions
about how to add groups of IP addresses except with some hint at the
top that you can use "quaded" IP addresses. You probably don't know
what "quaded" means and neither did I - it's not in any dictionary so
they must have made it up. However their example "(e.g.
192.168.254.0/24)" and a reference to CIDR addressing lead me to
believe that they meant a "/" at the end and that this somehow
represented multiple address ranges. It does, but I still have no idea
how the system works even after reading a couple of tutorials on it.
Anyway, I eventually came up with a list to add to the "Allow" area as
follows. It worked, but it could be incorrect. Hopefully a Googler
will see this list and edit it to correct it.