Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
Discussions > Sitemap Protocol > URL timeout: robots.txt timeout
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 1 - 25 of 44 - Collapse all  -  Translate all to Translated (View all originals)   Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
DavidAlan  
View profile  
 More options Sep 19 2008, 9:29 pm
From: DavidAlan
Date: Fri, 19 Sep 2008 18:29:58 -0700 (PDT)
Local: Fri, Sep 19 2008 9:29 pm
Subject: URL timeout: robots.txt timeout
I'm getting this error on my sitemaps section of webmasters tools, and
I'm at a complete loss to figure it out.  I've worked with my host
extensively on this issue and it has been happening for almost a
month.  We've moved my site to a completely different sever in cause
there was a firewall blocking the sitemap googlebot, but I have 2
other sites with my host and I don't have this problem with those
sites only this one.  We've looked into the DNS records and everything
checked out.  We can't figure out anything that would keep the sitemap
googlebot from accessing my site.

Here is the link to my sitemap:
http://www.fishingmoz.com/index.php?option=com_xmap&sitemap=1&view=xm...

If anyone has any ideas that would cause this that would be great.
I've posted this problem before and we've ruled out firewall, server,
and DNS

Thanks,
Dave


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
webado  
View profile  
 More options Sep 19 2008, 10:23 pm
From: webado
Date: Fri, 19 Sep 2008 19:23:15 -0700 (PDT)
Local: Fri, Sep 19 2008 10:23 pm
Subject: Re: URL timeout: robots.txt timeout
Your sitemap is being served with some extra invisible characters at
the top.
Use http://www.rexswain.com/httpview.html (pick text) to view what is
being served.

On Sep 19, 9:29 pm, DavidAlan wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile  
 More options Sep 20 2008, 11:42 am
From: JohnMu
Date: Sat, 20 Sep 2008 08:42:24 -0700 (PDT)
Local: Sat, Sep 20 2008 11:42 am
Subject: Re: URL timeout: robots.txt timeout
Hi Dave
You should be able to tell if we're generally able to reach your site
if Webmaster Tools has accepted your Sitemap file and is able to
process it accordingly. It's possible that the errors that you see are
some older errors (you should be able to tell by the date next to
them). In that case, they'll disappear as we continue to recrawl your
site.

Hope it helps!
John


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
chris barcelona  
View profile  
 More options Sep 21 2008, 7:30 am
From: chris barcelona
Date: Sun, 21 Sep 2008 04:30:55 -0700 (PDT)
Local: Sun, Sep 21 2008 7:30 am
Subject: Re: URL timeout: robots.txt timeout
Hi John, Dave
I have the same robots.txt timeout error despite having no robots.txt
file and I've noticed that this issue seems to crop up again and again
in forums and groups.
Isn't this a problem with Googlebot?  I've changed nothing on my
site( www.inglesmadrid.com) apart from add a couple of new pages; And
the hosting company (Arsys, the largest in Spain) swears that they are
not blocking Googlebot.  But after a year with no problems I suddenly
have this timeout error for each access googlebot has made in
September.
Sorry to accuse Googlebot in this way!  Might it be a good idea if
Google Webmaster service offered us a bit more information on the
timeout problem?
best regards
chris

On Sep 20, 5:42 pm, JohnMu wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile  
 More options Sep 22 2008, 6:08 am
From: JohnMu
Date: Mon, 22 Sep 2008 03:08:38 -0700 (PDT)
Local: Mon, Sep 22 2008 6:08 am
Subject: Re: URL timeout: robots.txt timeout
Hi Chris
It looks like we started running into problems reaching your site
around the 3rd of September. Keep in mind than an "unreachable
robots.txt" error does not mean that your robots.txt has to exist --
it's just that we couldn't even get a proper response back from the
server when we tried to check the robots.txt (or to see if one
exists).

In general, this is not an issue of adding / changing a few pages,
it's almost always (at least all the times I've checked  :-)) been an
issue with either the server using a security module or a firewall/
router that is too strict. The best way to check this is to submit a
Sitemap file and to see how it's being processed (this usually goes
fairly quickly, so you should be able to see if it's working or not in
much less than an hour).

Hope it helps!
John


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
chris barcelona  
View profile  
 More options Sep 23 2008, 3:06 pm
From: chris barcelona
Date: Tue, 23 Sep 2008 12:06:25 -0700 (PDT)
Local: Tues, Sep 23 2008 3:06 pm
Subject: Re: URL timeout: robots.txt timeout
Hi John,
Thanks very much for your advice, i will add the sitemap and report
back.
I always worked on the basis that the simpler the better, so no
sitemap, no robots.txt file. That way i figured there would be fewer
things to go wrong. But i'll try this!
best regards
chris

On 22 sep, 12:08, JohnMu wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
chris barcelona  
View profile  
 More options Oct 8 2008, 2:49 pm
From: chris barcelona
Date: Wed, 8 Oct 2008 11:49:47 -0700 (PDT)
Local: Wed, Oct 8 2008 2:49 pm
Subject: Re: URL timeout: robots.txt timeout
Thought I would update this thread in case anyone comes across it
while searching for a solution.
In the end i decided to do... er.... absolutely nothing.  And that
seems to have done the trick!  Since Sept 20 I've had no more "time-
out" errors and the web site (http://www.inglesmadrid.com) seems to be
slowly (very) climbing back up the rankings.

Conclusion?
Really I'm not sure.  I'd say either this was a temporary problem with
Google, or my web hosting company had in fact changed  the server
configuration (although they denied it) and then changed it back
(again, without saying anything)

Recommendation?
1.  If you have fancy things like robots.txt files / site maps / meta
no-follow tags, obviously you’d want to check these first.
2.  Talk to your web hosting company asap. The sooner they know
there's a problem the better.
3.  Wait at least 7 days to see if you stop getting new timeout errors
(listed on Google Webmaster tools)
4.  If after 10 days you’re still getting timeout errors…panic.
Consider replacing the website with alternative media : brochures,
letters-in-a-bottle, smoke signals, yodels, chinese whispers  etc.

If you are still stuck, feel free to contact me (details on the
website) as I might yet find a definitive answer!


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
murcianico  
View profile  
 More options Oct 10 2008, 9:50 am
From: murcianico
Date: Fri, 10 Oct 2008 06:50:10 -0700 (PDT)
Local: Fri, Oct 10 2008 9:50 am
Subject: Re: URL timeout: robots.txt timeout
Hello,
I have the same problem with www.mazaron.es since 3/10/2008

"No se puede acceder a la red"

How can i solve it?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
murcianico  
View profile  
 More options Oct 10 2008, 10:56 am
From: murcianico
Date: Fri, 10 Oct 2008 07:56:25 -0700 (PDT)
Local: Fri, Oct 10 2008 10:56 am
Subject: Re: URL timeout: robots.txt timeout
Sorry, the web es www.mazarron.es

On 10 oct, 15:50, murcianico wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile  
 More options Oct 10 2008, 5:52 pm
From: JohnMu
Date: Fri, 10 Oct 2008 14:52:19 -0700 (PDT)
Local: Fri, Oct 10 2008 5:52 pm
Subject: Re: URL timeout: robots.txt timeout
Hi murcianico and welcome to the groups!

It looks like this might be the same problem as Chris had. Regardless
of Chris' recommendation to wait, I would start asking the hoster
questions -- who knows, maybe then it will also disappear
automatically ;-). In the cases I've diagnosed, I have yet to see one
where the unreachable issue is not something on the hosting side of
things, so that's what I would look at first.

Hope it  helps!
John


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andy65  
View profile  
 More options Oct 10 2008, 11:56 pm
From: Andy65
Date: Fri, 10 Oct 2008 20:56:40 -0700 (PDT)
Local: Fri, Oct 10 2008 11:56 pm
Subject: Re: URL timeout: robots.txt timeout

On Oct 10, 10:52 pm, JohnMu wrote:

> Hi murcianico and welcome to the groups!

> It looks like this might be the same problem as Chris had. Regardless
> of Chris' recommendation to wait, I would start asking the hoster
> questions -- who knows, maybe then it will also disappear
> automatically ;-). In the cases I've diagnosed, I have yet to see one
> where the unreachable issue is not something on the hosting side of
> things, so that's what I would look at first.

> Hope it  helps!
> John

Hmmm, seems I've started getting the error too since October 5 on one
of four domains I have listed in webmaster tools. The site's been
indexed fine by google for more than 2 years. Sitemap gets regularly
downloaded without errors, as does a robots.txt file I'd had in place.
I always had just a blank (ie empty) robots.txt uploaded just to stop
404 errors when bots went searching for one.

Having just logged in, I noticed some URL unreachable errors in report
on the URL for first time ever from googlebot visits over past week.
There were about 70 errors out of around 300 pages for the site -
different pages unreachable on different days, oddly enough. I figured
there'd be some error, so entered a resubmit on the sitemap. There's
no errors on 3 other sites I have set-up and indexed in webmaster
tools, all at the same host and same ftp.

Error came back this time after resubmittal of sitemap:

Network unreachable: robots.txt unreachable
We were unable to crawl your Sitemap because we found a robots.txt
file at the root of your site but were unable to download it. Please
ensure that it is accessible or remove it completely.

Advice is to fix or remove the robots.txt ... so I figure I'll delete
the robots.txt, since I essentially had nothing in it anyway. Two more
resubmits of sitemap over past hour or so comes back with the same
message: you found a robots.txt file but were unable to download it.
Since there's no robots.txt file anymore, this surprises me. You
should be getting a 404 redirect to homepage.

Are you sure this isn't something at Google's end going wrong? I've
not had these errors before in over two years, like others posting in
this thread. And I haven't changed anything in sitemap/robots.txt
configuration. Other domains with same host on same ftp aren't getting
any errors like it.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andy65  
View profile  
 More options Oct 11 2008, 2:35 am
From: Andy65
Date: Fri, 10 Oct 2008 23:35:06 -0700 (PDT)
Local: Sat, Oct 11 2008 2:35 am
Subject: Re: URL timeout: robots.txt timeout
Strike my previous post. Have done some deeper digging and looks like
it might be down to my host.

Anyone with similar problem might find some nice reading here:

http://www.homewithandrew.com/index.php/debugging-the-network-unreach...

Looks like my host maybe had a not exactly the same, but similar
problem to that guy's. They don't have any kind of firewalls on the
webserver network segment, but they do have something which blocks IPs
that have tried to brute force logins. After contacting my host, they
said: I can disable this for a few days to see if it might have
blocked google, except we have them on a whitelist so they shouldn't
be blocked.

Googlebot IPs were whitelisted and shouldn't have been blocked, but
looks increasingly like they might have been. First resubmit of
sitemap after host switched off brute force blocked IPs and thankfully
I get an OK on submission and no more robots.txt unreachable errors.
I'll check in again tomorrow to see if all still ok.

Incidentally, does the googlebot IP range always stay pretty static?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
murcianico  
View profile  
 More options Oct 11 2008, 5:54 am
From: murcianico
Date: Sat, 11 Oct 2008 02:54:55 -0700 (PDT)
Local: Sat, Oct 11 2008 5:54 am
Subject: Re: URL timeout: robots.txt timeout
Hello,
the hosting company (Arsys) swears that they are not blocking
Googlebot,
and say the problem is "on the side of Google."
They recommend that contact with Google to review their programming!

What can I do?, How can I prove where is the problem?

---------------------------

(Hola,
la empresa de hosting (Arsys) asegura que ellos no están bloqueado el
acceso a Googlebot, y dicen que el problema es "del lado de google".
Nos recomiendan que contactemos con Google para que revisen su
programación !!!

¿Qué puedo hacer?, ¿Cómo puedo demostrar dónde está el problema?)

On 10 oct, 23:52, JohnMu wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
none2  
View profile  
 More options Oct 11 2008, 10:20 pm
From: none2
Date: Sat, 11 Oct 2008 19:20:08 -0700 (PDT)
Local: Sat, Oct 11 2008 10:20 pm
Subject: Re: URL timeout: robots.txt timeout
David & Chris, (and others that has the same issue on this thread),
the EXACT same thing happened to me, and yes there's other threads
about this at these forums and elsewhere, and, as you alluded Chris,
it IS A GOOGLE PROBLEM but they are not going to admit it (nor will
"webado").  It resulted in myself being ruined.  Suddenly, with NO
DOWNTIME on my server nor any changes on my part; 95% of my pages were
"not reachable" like yours, either from "robots.txt timed out" or
"network unreachable", which resulted in ALL of my MAIN PAGES getting
removed from their index.  THANKS GOOGLE.  The unreachable list even
included long gone webpages that I 301'd to new webpages years ago.
My hosts long ago white-listed all google IP's.

My pages that were trashed were TEN YEARS OLD, all appeared on the
FIRST page of results for their respective search phrases, most of
them NUMBER ONE, (all PR3 and PR4 pages).  So G not only screws site
owners with this "yearly event", but screws their users as well by
doing them this disservice--a disservice that NO OTHER search engine
will do.

Notice too that if you check the PR of these pages, it's DROPPING.
They remove the pages from the index, and your PR will eventually drop
to ZERO.  That's another google screw up because *decent* PR CANNOT
DROP TO ZERO OVERNIGHT.  It's impossible to lose ALL IBL's overnight,
so obviously that is NOT what is happening.  (PR check at http://oyoy.eu/google/pr/
still not working).

Yes, it has happened before and will happen every year.  When?
Guess.....every Christmas!!  Every single year, year after year for
the last FIVE years, google has ruined millions of websites by
removing them from the index.  It's usually small mom & pop business
sites that get screwed so Amazon, Ebay, Buy.com and the like will get
all the Christmas business.  The attack usually starts around Oct.
thru Nov. and of course lasts through Christmas, and of course all is
well again about Jan 1st!!  So it's RIGHT to accuse the Gbot, algo's,
or google of this because it's been proven time and time again to
happen, as so many others have noticed here and at other forums.

Let's see John Mu try and explain that.  It is NOT coincidental that
so many site owners suddenly have the exact same unexplainable issue
and no one can convince anyone in their right mind with an IQ over 50
that it's a "server problem" and not google.  It's the Gbot choking on
something, like yet another screwed up algo that's backfired.  If you
really wanted to help John, you would ACCEPT that this is a google
problem (I'm sure you already realize it but won't admit it), look
into it, and FIX IT.

The past week I have seen less and less pages in that area of the WMT,
the list is slowly shrinking daily, but it has resulted in NONE of my
long-standing top-ranking webpages returning!!
-David-

On Sep 21, 7:30 am, chris barcelona wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
webado  
View profile  
 More options Oct 11 2008, 11:14 pm
From: webado
Date: Sat, 11 Oct 2008 20:14:51 -0700 (PDT)
Local: Sat, Oct 11 2008 11:14 pm
Subject: Re: URL timeout: robots.txt timeout
You still have not provided a url for anybody to run tests against.

You are just ranting with no tangible evidence and no effort on your
part to  find any problems.

On Oct 11, 10:20 pm, none2 wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile  
 More options Oct 12 2008, 3:35 am
From: JohnMu
Date: Sun, 12 Oct 2008 00:35:52 -0700 (PDT)
Local: Sun, Oct 12 2008 3:35 am
Subject: Re: URL timeout: robots.txt timeout
Hi David
As webado mentioned, it's impossible to say much without knowing your
URL.

At any rate, this is an issue that can be resolved - so I think it
would definitely help your site most to get it resolved. While it is
always possible that there are issues on our side, so far I have not
run into anything on our side, with any site, that would result in the
symptoms that you describe.

Here are some articles that deal with this issue:
http://www.homewithandrew.com/index.php/debugging-the-network-unreach...
http://www.huffingtonpost.com/adam-fendelman/why-my-site-lost-all-sea...

Hope to hear back from you soon!

John


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
murcianico  
View profile  
 More options Oct 12 2008, 4:52 am
From: murcianico
Date: Sun, 12 Oct 2008 01:52:55 -0700 (PDT)
Local: Sun, Oct 12 2008 4:52 am
Subject: Re: URL timeout: robots.txt timeout
Hello
my url is www.mazarron.es (as I said before)
, my hosting company (Arsys) swears that they are not blocking
Googlebot,
and say again that the problem is "on the side of Google."

:(

On 12 oct, 09:35, JohnMu wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
webado  
View profile  
 More options Oct 12 2008, 8:21 am
From: webado
Date: Sun, 12 Oct 2008 05:21:41 -0700 (PDT)
Local: Sun, Oct 12 2008 8:21 am
Subject: Re: URL timeout: robots.txt timeout
Yes, well when hosters swear they aren't doing it, my experience is
that they are doing it but either don't know it or can't be bothered
to try to find out. Probbaly dont' know how.

However you don't have a robots.txt file at all and I am getting a 404
for that, which is OK.  If any of Googlebot's ips are blocked, they'd
not know whether you have or  don't have a robots.txt file at all.

On Oct 12, 4:52 am, murcianico wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile  
 More options Oct 12 2008, 8:27 am
From: JohnMu
Date: Sun, 12 Oct 2008 05:27:30 -0700 (PDT)
Local: Sun, Oct 12 2008 8:27 am
Subject: Re: URL timeout: robots.txt timeout
Hi murcianico

From what I can see, it looks like we have had a lot of trouble
reaching your site since the 4th of October. Perhaps something was
changed then? Do you see the Googlebot accesses in your server logs?
Feel free to point your hoster to this thread, if you like.

John


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
murcianico  
View profile  
 More options Oct 12 2008, 9:11 am
From: murcianico
Date: Sun, 12 Oct 2008 06:11:38 -0700 (PDT)
Local: Sun, Oct 12 2008 9:11 am
Subject: Re: URL timeout: robots.txt timeout
Hello,

previous to 3th of October, in the logfiles of the server, there are
hundreds of access of googlebot from IPs
66.249.71.42, 66.249.71.43 and 66.249.71.44

Since 4th of October there are none.

The google webmaster tools tell there are 65 "url inaccesibles" ("No
se puede acceder a robots.txt" and "No se puede acceder a la red")
and in the sitemap tell "No se puede acceder a la red.: No se puede
acceder a robots.txt.
No hemos podido rastrear su sitemap porque no nos ha sido posible
descargar el archivo robots.txt de la raíz de su sitio. Asegúrese de
que sea accesible o elimínelo por completo."

In this point, if my hoster insists they are not blocking googlebot, I
only can change the hosting company, don't?

Murcianico

On 12 oct, 14:27, JohnMu wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
webado  
View profile  
 More options Oct 12 2008, 9:23 am
From: webado
Date: Sun, 12 Oct 2008 06:23:07 -0700 (PDT)
Local: Sun, Oct 12 2008 9:23 am
Subject: Re: URL timeout: robots.txt timeout
Well, if you cannot convince your  hosting company to take a real good
luck at what they are doing, yes, unfortunately you have to change
hosting company.

On Oct 12, 9:11 am, murcianico wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile  
 More options Oct 13 2008, 10:57 pm
From: JohnMu
Date: Mon, 13 Oct 2008 19:57:42 -0700 (PDT)
Local: Mon, Oct 13 2008 10:57 pm
Subject: Re: URL timeout: robots.txt timeout
Hi Murcianico

I would suggest encouraging them to post here -- perhaps there is
something that we can figure out together. In general, if there is an
issue on the hoster's side, it will affect many sites, so this should
definitely be something they should be interested in resolving.

If you're absolutely sure that it's not on your site (are you sure
your site does not use "mod_security" or something similar?), then you
can also try to see if we are crawling and indexing other sites (on
the same server), by using the advanced search settings with a
timeframe of a week. Sometimes this encourages the web hoster to
resolve the issue quicker :-)

Hope it helps!
John


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
coldrick  
View profile  
 More options Oct 18 2008, 6:33 pm
From: coldrick
Date: Sat, 18 Oct 2008 15:33:38 -0700 (PDT)
Local: Sat, Oct 18 2008 6:33 pm
Subject: Re: URL timeout: robots.txt timeout
Hi everyone,

I'm new into this discussion, but we are having the same problem, and
everywhere I search for a solution I am getting different suggestions
(check the robots.txt, check htaccess, it's your host blocking Google,
it's a problem with Google etc).

I am betting that this is a problem with our host blocking Google, but
as per usual our first attempt to communicate with the host resulted
in them saying that the problem is not at their end.

I want to convince them that the problem is most likely at their end,
but I don't know enough about Firewall Setup or Mod Security, but it
looks very much to me that one of these areas probably has something
to do with the problem.

I just want to ask a simple question (I know I should be asking my
host this, but I already did and they either didn't read it or
couldn't understand what I was asking). If someone can answer this,
maybe we can start to get to the bottom of the most likely cause for
this blocking.

Here is the info for the setup:-
Two sites running on a dedicated server - no other sites on the server
http://www.smallparts.com.au/
http://www.hobbyparts.com.au/

Until a few days ago Googlebot could access both sites.

Now Googlebot can only access the first one.

The eror is that robots.txt is unreachable and Network Unreachable.

We tried removing robots.txt altogether and resubmitting our sitemap,
and we received an error saying that Google found our robots.txt
(which no longer existed) but was unable to download it. This tells me
that Google most likely wasn't even able to connect to the site -
hence blocking is the most likely answer.

QUESTION:-
With two sites on the same server, and only one site being blocked, is
it still possible that this is related to the Firewall or Mod
Security? Or can you think of any other way that a server may block
Google from one site and not the other? We are not blocking using IP
Deny Manager, and .htaccess is not blocking.

Any information would be appreciated.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
webado  
View profile  
 More options Oct 18 2008, 7:25 pm
From: webado
Date: Sat, 18 Oct 2008 16:25:40 -0700 (PDT)
Local: Sat, Oct 18 2008 7:25 pm
Subject: Re: URL timeout: robots.txt timeout
Your robots.txt file is incorrect.

# For domain: http://www.hobbyparts.com.au/

# All robots will spider the domain
User-agent: *
Disallow:

# Disallow directory /cgi-bin/
User-agent: *
Disallow: /cgi-bin/

# Disallow directory /ssl/
User-agent: *
Disallow: /ssl/

# Disallow directory /suspended.page/
User-agent: *
Disallow: /suspended.page/

# Disallow htaccess
User-agent: *
Disallow: .htaccess

It should be just this:

# For domain: http://www.hobbyparts.com.au/

# All robots will spider the domain
User-agent: *
Disallow: /cgi-bin/
Disallow: /ssl/
Disallow: /suspended.page/

In other words all items which are disallowed grouped together under
the user agent.
The .htaccess file is protected by the server and should NEVER be
accessible to any http requests. If it is, disallowing it in
robots.txt is useless, your site is doomed anyway.

Moving on to how your pages are built, your navigation is quite likely
non-functional for a robot. Having links like this:
<a href="../../../../../../catalogue/">Catalogue</a>

means broken links. While the browser compensates for this essentially
malformed uri, a robot is quite unable to do that.

Run Xenu and see for yourself:
http://home.snafu.de/tilman/xenulink.html

Quite totally uncrawlable site.

Your other site is built the same way, it's a mater of time until you
see the same problems with it too.

While this doesn't address the original problem, it uncovers a much
bigger problem your sites both have.

What may explain the problem Googlebot encountered is your dns zone:
http://www.intodns.com/smallparts.com.au

A single point of failure because both nameservers and the actual
website server are all on the same subnet. If one is down, all 3  will
likely be down and so is the site. This may result in transient access
failures.

Of course in addition Googlebot may well be also blocked at the
firewall.

On Oct 18, 6:33 pm, coldrick wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
coldrick  
View profile  
 More options Oct 19 2008, 3:35 am
From: coldrick
Date: Sun, 19 Oct 2008 00:35:04 -0700 (PDT)
Local: Sun, Oct 19 2008 3:35 am
Subject: Re: URL timeout: robots.txt timeout

Hi webado

Thanks for that info. I'll look into all of that and see if we can get
it all fixed. The web site does get crawled (until this latest
problem), and the smallparts.com.au site has over 5000 pages on
Google, but maybe if we look into your suggestions we will get a lot
more indexed.

Anyway, I'm pleased to share with everyone that we have found the
fault to be with the firewall on the server.

It was very difficult to find much info about this, and our Internet
Host was useless. First they said it was nothing to do with them, and
when we pushed further they suggested to try "allowing" the Googlebot
IP's in the firewall setup. The firewall is called "csf", and I'm sure
many of you know the program. It's very good at what it does, but I
couldn't find any help files on how to allow batches of IP address.
Our web host couldn't help with this either.

Anyway I pieced together a lot of info from many sources and finally
came up with a solution which may be right or may be wrong, but
Googlebot can now access the Robots.txt file again and is once again
crawling our site.

Here are a couple of key tips that may be useful to others...

1.  If your Internet Host says that the problem is not with them, it
probably is. My experience is that many people who work at Internet
Hosting companies get very annoyed when you break into their online
gaming time with (how dare you) a question. The simplest way for them
to get back to their game is to deny responsibility and hope you will
go away. They will then deal with the 5% of people who come back to
them and push the issue - when they finish their game.

2. If you or your host are using a firewall program called "csf",
check in the "Firewall Deny IP's" area. If any of Googlebots IP
addresses have been added, that's where the error lies. Remove them
and restart the firewall.

3. It wouldn't hurt to also add Googlebot's IP addresses to the
"Firewall Allow IP's" area too. Now "csf" comes with no instructions
about how to add groups of IP addresses except with some hint at the
top that you can use "quaded" IP addresses. You probably don't know
what "quaded" means and neither did I - it's not in any dictionary so
they must have made it up. However their example "(e.g.
192.168.254.0/24)" and a reference to CIDR addressing lead me to
believe that they meant a "/" at the end and that this somehow
represented multiple address ranges. It does, but I still have no idea
how the system works even after reading a couple of tutorials on it.
Anyway, I eventually came up with a list to add to the "Allow" area as
follows. It worked, but it could be incorrect. Hopefully a Googler
will see this list and edit it to correct it.

216.239.32.0/19 # Googlebot
64.233.160.0/19 # Googlebot
66.249.80.0/20 # Googlebot
72.14.192.0/18 # Googlebot
209.85.128.0/17 # Googlebot
66.102.0.0/20 # Googlebot
74.125.0.0/16 # Googlebot
64.18.0.0/20 # Googlebot
207.126.144.0/20 # Googlebot

Note! I added the # Googlebot notation for reference.

Hopefully this helps some others out there and saves them from the
hours of testing and searching and retesting I've had to do today.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 1 - 25 of 44   Newer >
« Back to Discussions « Newer topic     Older topic »