> Thanks for that info. I'll look into all of that and see if we can get
> it all fixed. The web site does get crawled (until this latest
> problem), and the smallparts.com.au site has over 5000 pages on
> Google, but maybe if we look into your suggestions we will get a lot
> more indexed.
> Anyway, I'm pleased to share with everyone that we have found the
> fault to be with the firewall on the server.
> It was very difficult to find much info about this, and our Internet
> Host was useless. First they said it was nothing to do with them, and
> when we pushed further they suggested to try "allowing" the Googlebot
> IP's in the firewall setup. The firewall is called "csf", and I'm sure
> many of you know the program. It's very good at what it does, but I
> couldn't find any help files on how to allow batches of IP address.
> Our web host couldn't help with this either.
> Anyway I pieced together a lot of info from many sources and finally
> came up with a solution which may be right or may be wrong, but
> Googlebot can now access the Robots.txt file again and is once again
> crawling our site.
> Here are a couple of key tips that may be useful to others...
> 1. If your Internet Host says that the problem is not with them, it
> probably is. My experience is that many people who work at Internet
> Hosting companies get very annoyed when you break into their online
> gaming time with (how dare you) a question. The simplest way for them
> to get back to their game is to deny responsibility and hope you will
> go away. They will then deal with the 5% of people who come back to
> them and push the issue - when they finish their game.
> 2. If you or your host are using a firewall program called "csf",
> check in the "Firewall Deny IP's" area. If any of Googlebots IP
> addresses have been added, that's where the error lies. Remove them
> and restart the firewall.
> 3. It wouldn't hurt to also add Googlebot's IP addresses to the
> "Firewall Allow IP's" area too. Now "csf" comes with no instructions
> about how to add groups of IP addresses except with some hint at the
> top that you can use "quaded" IP addresses. You probably don't know
> what "quaded" means and neither did I - it's not in any dictionary so
> they must have made it up. However their example "(e.g.
> 192.168.254.0/24)" and a reference to CIDR addressing lead me to
> believe that they meant a "/" at the end and that this somehow
> represented multiple address ranges. It does, but I still have no idea
> how the system works even after reading a couple of tutorials on it.
> Anyway, I eventually came up with a list to add to the "Allow" area as
> follows. It worked, but it could be incorrect. Hopefully a Googler
> will see this list and edit it to correct it.
It looks like things are working again, thanks to your perseverance!
One thing you might want to keep in mind with regards to your
whitelist is that the Googlebot IP addresses may change over time
(though as far as I know, it's not something that happens frequently).
Your best bet is to regularly check the blocked IP list and check for
valid Googlebot addresses using the reverse DNS lookup as described in
http://www.google.com/support/webmasters/bin/answer.py?answer=80553
Thanks for posting the details, it's good to have a post like yours
which we can point other users to!
I am now having the same problem with http://www.zp1.com. I've tested
the robots.txt and the sitemap file and don't know what could be
causing my error? I even checked with the host and, of course, they
said they are not blocking googlebot or any crawler. Any ideas?
> I am now having the same problem withhttp://www.zp1.com. I've tested
> the robots.txt and the sitemap file and don't know what could be
> causing my error? I even checked with the host and, of course, they
> said they are not blocking googlebot or any crawler. Any ideas?
I read the post and checked with the host. I also checked my log and
don't even see where googlebot attempted. Any ideas? The url is
http://www.zp1.com. Thanks
> > I am now having the same problem withhttp://www.zp1.com. I've tested
> > the robots.txt and the sitemap file and don't know what could be
> > causing my error? I even checked with the host and, of course, they
> > said they are not blocking googlebot or any crawler. Any ideas?- Hide quoted text -
> I read the post and checked with the host. I also checked my log and
> don't even see where googlebot attempted. Any ideas? The url ishttp://www.zp1.com. Thanks
> On Oct 20, 4:08 pm, coldrick wrote:
> > Yes. Read the post 4 back from this one.
> > Rgds
> > On Oct 21, 1:02 am,silentptnrwrote:
> > > I am now having the same problem withhttp://www.zp1.com. I've tested
> > > the robots.txt and the sitemap file and don't know what could be
> > > causing my error? I even checked with the host and, of course, they
> > > said they are not blocking googlebot or any crawler. Any ideas?- Hide quoted text -
> > > I am now having the same problem withhttp://www.zp1.com. I've tested
> > > the robots.txt and the sitemap file and don't know what could be
> > > causing my error? I even checked with the host and, of course, they
> > > said they are not blocking googlebot or any crawler. Any ideas?- Hide quoted text -
> > - Show quoted text -- Masquer le texte des messages précédents -
The server logs won't show an attempted connection once the firewall
starts to block it.
The firewall logs will show a connection atempt on the day that the
Googlebot was blocked and the Googlebot IP addresses will be listed in
the firewall's deny list.
If you have access to the firewall yourself, check the denied IP
lists.
If you don't have access yourself, go back to the Internet Host and
ask them to check the Firewall's denied IP lists for any of the Google
IP addresses listed earlier.
By the way webado. I see you belong to an internet hosting company. I
guess my point one about the online games doesn't apply to all IT
people. You are doing a great job on this forum. I got two of your
suggestions fixed on our sites. Not sure how to go about the Subnet
settings but I'll muddle through it I guess.
Just a note on the IP addresses I listed earlier - the last two don't
seem to be Google. Is there anyone who can remove them from the list?
> > > > I am now having the same problem withhttp://www.zp1.com. I've tested
> > > > the robots.txt and the sitemap file and don't know what could be
> > > > causing my error? I even checked with the host and, of course, they
> > > > said they are not blocking googlebot or any crawler. Any ideas?- Hide quoted text -
> > > - Show quoted text -- Masquer le texte des messages précédents -
> > - Afficher le texte des messages précédents -- Hide quoted text -
I have another site which I consult for and it is having a problem.
This site has always been an authority site for over ten years and is
a very seasoned site. The site is by far the best resource for
information in it's category. And for some reason in the past week
the site has dropped from a number one ranking to between page 5 and
7??? Nothing has changed on the site so I'm wondering if Google has
penalized the site for something. John if you could give your insight
I would appreciate it. The url of the site is http://www.auditions.com.
Perhaps google has changed its algorithm or something. I can't
understand why the search position would drop so quickly and
dramatically. Now when I search for the term "auditions" the top site
listed is a shoe site????
> The server logs won't show an attempted connection once the firewall
> starts to block it.
> The firewall logs will show a connection atempt on the day that the
> Googlebot was blocked and the Googlebot IP addresses will be listed in
> the firewall's deny list.
> If you have access to the firewall yourself, check the denied IP
> lists.
> If you don't have access yourself, go back to the Internet Host and
> ask them to check the Firewall's denied IP lists for any of the Google
> IP addresses listed earlier.
> By the way webado. I see you belong to an internet hosting company. I
> guess my point one about the online games doesn't apply to all IT
> people. You are doing a great job on this forum. I got two of your
> suggestions fixed on our sites. Not sure how to go about the Subnet
> settings but I'll muddle through it I guess.
> Just a note on the IP addresses I listed earlier - the last two don't
> seem to be Google. Is there anyone who can remove them from the list?
> On Oct 22, 7:12 am, webado wrote:
> > On 21 oct, 13:27,silentptnrwrote:
> > > I read the post and checked with the host. I also checked my log and
> > > don't even see where googlebot attempted.
> > That is proof that the server's firewall blocks Googlebot.
> > > > > I am now having the same problem withhttp://www.zp1.com. I've tested
> > > > > the robots.txt and the sitemap file and don't know what could be
> > > > > causing my error? I even checked with the host and, of course, they
> > > > > said they are not blocking googlebot or any crawler. Any ideas?- Hide quoted text -
> > > > - Show quoted text -- Masquer le texte des messages précédents -
> > > - Afficher le texte des messages précédents -- Hide quoted text -
Not sure why I'm getting this message, maybe someone could give me a
tip?
URL timeout: robots.txt timeout
We encountered an error while trying to access your Sitemap. Please
ensure your Sitemap follows our guidelines and can be accessed at the
location you provided and then resubmit.
You are seeing this error because as you
mentioned before, You don't see Google in your
logs and as coldrick and webado concur, something
is blocking Google from reaching your web site.
It could be a network firewall or a software firewall
in the server that manages the IP tables.
I did a check with the Googlebot user agent and can
access your site and robots.txt fine so I would
imagine if you are still having an issue with this
then the block is likely based on IP address.
> Not sure why I'm getting this message, maybe someone could give me a
> tip?
> URL timeout: robots.txt timeout
> We encountered an error while trying to access your Sitemap. Please
> ensure your Sitemap follows our guidelines and can be accessed at the
> location you provided and then resubmit.
> You are seeing this error because as you
> mentioned before, You don't see Google in your
> logs and as coldrick and webado concur, something
> is blocking Google from reaching your web site.
> It could be a network firewall or a software firewall
> in the server that manages the IP tables.
> I did a check with the Googlebot user agent and can
> access your site and robots.txt fine so I would
> imagine if you are still having an issue with this
> then the block is likely based on IP address.
> > Not sure why I'm getting this message, maybe someone could give me a
> > tip?
> > URL timeout: robots.txt timeout
> > We encountered an error while trying to access your Sitemap. Please
> > ensure your Sitemap follows our guidelines and can be accessed at the
> > location you provided and then resubmit.
> You are seeing this error because as you
> mentioned before, You don't see Google in your
> logs and as coldrick and webado concur, something
> is blocking Google from reaching your web site.
> It could be a network firewall or a software firewall
> in the server that manages the IP tables.
> I did a check with the Googlebot user agent and can
> access your site and robots.txt fine so I would
> imagine if you are still having an issue with this
> then the block is likely based on IP address.
> > Not sure why I'm getting this message, maybe someone could give me a
> > tip?
> > URL timeout: robots.txt timeout
> > We encountered an error while trying to access your Sitemap. Please
> > ensure your Sitemap follows our guidelines and can be accessed at the
> > location you provided and then resubmit.
> Do you think there could be anything with my htaccess file?
> On Oct 25, 12:35 am, Tim Abracadabra wrote:
> > You are seeing this error because as you
> > mentioned before, You don't see Google in your
> > logs and as coldrick and webado concur, something
> > is blocking Google from reaching your web site.
> > It could be anetworkfirewall or a software firewall
> > in the server that manages the IP tables.
> > I did a check with the Googlebot user agent and can
> > access your site and robots.txt fine so I would
> > imagine if you are still having an issue with this
> > then the block is likely based on IP address.
> > > Not sure why I'm getting this message, maybe someone could give me a
> > > tip?
> > > URL timeout: robots.txt timeout
> > > We encountered an error while trying to access your Sitemap. Please
> > > ensure your Sitemap follows our guidelines and can be accessed at the
> > > location you provided and then resubmit.
and I've confirmed these using a whois search. However as JohnMu
mentioned earlier, Google don't publish these addresses because they
may change over time. I only offer them as advice and suggest you do
your own whois search prior to implementig them. The list may not be
complete either.
What I've found out regarding this problem and getting it fixed at the
server is as follows...
This applies to the firewall program CSF (ConfigServer Security and
Firewall), but most likely applies to other firewall programs in some
manner.
1. You need to check through the list of FIREWALL DENY IP'S and remove
any of the Googlebot IP addresses that appear in the list. These are
what are stopping Googlebot accessing your site.
2. You need to add the range of Googlebot IP addresses to bot the
FIREWALL ALLOW IP'S and the LFD IGNORE IP'S in the firewall setup.
This will make sure Googlebot doesn't get blocked again.
If Googlebot appears to get blocked again at a future date and the
above ALLOWS and IGNORES are still in place, you need to go back and
check the log of denied IP's to find any that appear to be Googlebot,
like the one I got this morning which prompted me to dig a little
deeper and update the IP list above...
If you then do a WHOIS search on the above IP the whois search will
confirm that it is indeed Google and just under halfway down the list
of info you will see the following cidr range. Follow steps 1 and 2
above for the IP address listed.
If you are on shared hosting and don't have access to your server
firewall configuration, you need to talk to your Internet Hosting
Company and try to convince them to take the above action in order to
allow Googlebot back onto your site.
> Thanks for that info. I'll look into all of that and see if we can get
> it all fixed. The web site does get crawled (until this latest
> problem), and the smallparts.com.au site has over 5000 pages on
> Google, but maybe if we look into your suggestions we will get a lot
> more indexed.
> Anyway, I'm pleased to share with everyone that we have found the
> fault to be with the firewall on the server.
> It was very difficult to find much info about this, and our Internet
> Host was useless. First they said it was nothing to do with them, and
> when we pushed further they suggested to try "allowing" the Googlebot
> IP's in the firewall setup. The firewall is called "csf", and I'm sure
> many of you know the program. It's very good at what it does, but I
> couldn't find any help files on how to allow batches of IP address.
> Our web host couldn't help with this either.
> Anyway I pieced together a lot of info from many sources and finally
> came up with a solution which may be right or may be wrong, but
> Googlebot can now access the Robots.txt file again and is once again
> crawling our site.
> Here are a couple of key tips that may be useful to others...
> 1. If your Internet Host says that the problem is not with them, it
> probably is. My experience is that many people who work at Internet
> Hosting companies get very annoyed when you break into their online
> gaming time with (how dare you) a question. The simplest way for them
> to get back to their game is to deny responsibility and hope you will
> go away. They will then deal with the 5% of people who come back to
> them and push the issue - when they finish their game.
> 2. If you or your host are using a firewall program called "csf",
> check in the "Firewall Deny IP's" area. If any of Googlebots IP
> addresses have been added, that's where the error lies. Remove them
> and restart the firewall.
> 3. It wouldn't hurt to also add Googlebot's IP addresses to the
> "Firewall Allow IP's" area too. Now "csf" comes with no instructions
> about how to add groups of IP addresses except with some hint at the
> top that you can use "quaded" IP addresses. You probably don't know
> what "quaded" means and neither did I - it's not in any dictionary so
> they must have made it up. However their example "(e.g.
> 192.168.254.0/24)" and a reference to CIDR addressing lead me to
> believe that they meant a "/" at the end and that this somehow
> represented multiple address ranges. It does, but I still have no idea
> how the system works even after reading a couple of tutorials on it.
> Anyway, I eventually came up with a list to add to the "Allow" area as
> follows. It worked, but it could be incorrect. Hopefully a Googler
> will see this list and edit it to correct it.
> and I've confirmed these using a whois search. However as JohnMu
> mentioned earlier, Google don't publish these addresses because they
> may change over time. I only offer them as advice and suggest you do
> your own whois search prior to implementig them. The list may not be
> complete either.
> What I've found out regarding this problem and getting it fixed at the
> server is as follows...
> This applies to the firewall program CSF (ConfigServer Security and
> Firewall), but most likely applies to other firewall programs in some
> manner.
> 1. You need to check through the list of FIREWALL DENY IP'S and remove
> any of the Googlebot IP addresses that appear in the list. These are
> what are stopping Googlebot accessing your site.
> 2. You need to add the range of Googlebot IP addresses to bot the
> FIREWALL ALLOW IP'S and the LFD IGNORE IP'S in the firewall setup.
> This will make sure Googlebot doesn't get blocked again.
> If Googlebot appears to get blocked again at a future date and the
> above ALLOWS and IGNORES are still in place, you need to go back and
> check the log of denied IP's to find any that appear to be Googlebot,
> like the one I got this morning which prompted me to dig a little
> deeper and update the IP list above...
> If you then do a WHOIS search on the above IP the whois search will
> confirm that it is indeed Google and just under halfway down the list
> of info you will see the following cidr range. Follow steps 1 and 2
> above for the IP address listed.
> If you are on shared hosting and don't have access to your server
> firewall configuration, you need to talk to your Internet Hosting
> Company and try to convince them to take the above action in order to
> allow Googlebot back onto your site.
> I hope this helps someone.
> On Oct 19, 5:35 pm, coldrick wrote:
> > Hi webado
> > Thanks for that info. I'll look into all of that and see if we can get
> > it all fixed. The web site does get crawled (until this latest
> > problem), and the smallparts.com.au site has over 5000 pages on
> > Google, but maybe if we look into your suggestions we will get a lot
> > more indexed.
> > Anyway, I'm pleased to share with everyone that we have found the
> > fault to be with the firewall on the server.
> > It was very difficult to find much info about this, and our Internet
> > Host was useless. First they said it was nothing to do with them, and
> > when we pushed further they suggested to try "allowing" the Googlebot
> > IP's in the firewall setup. The firewall is called "csf", and I'm sure
> > many of you know the program. It's very good at what it does, but I
> > couldn't find any help files on how to allow batches of IP address.
> > Our web host couldn't help with this either.
> > Anyway I pieced together a lot of info from many sources and finally
> > came up with a solution which may be right or may be wrong, but
> > Googlebot can now access the Robots.txt file again and is once again
> > crawling our site.
> > Here are a couple of key tips that may be useful to others...
> > 1. If your Internet Host says that the problem is not with them, it
> > probably is. My experience is that many people who work at Internet
> > Hosting companies get very annoyed when you break into their online
> > gaming time with (how dare you) a question. The simplest way for them
> > to get back to their game is to deny responsibility and hope you will
> > go away. They will then deal with the 5% of people who come back to
> > them and push the issue - when they finish their game.
> > 2. If you or your host are using a firewall program called "csf",
> > check in the "Firewall Deny IP's" area. If any of Googlebots IP
> > addresses have been added, that's where the error lies. Remove them
> > and restart the firewall.
> > 3. It wouldn't hurt to also add Googlebot's IP addresses to the
> > "Firewall Allow IP's" area too. Now "csf" comes with no instructions
> > about how to add groups of IP addresses except with some hint at the
> > top that you can use "quaded" IP addresses. You probably don't know
> > what "quaded" means and neither did I - it's not in any dictionary so
> > they must have made it up. However their example "(e.g.
> > 192.168.254.0/24)" and a reference to CIDR addressing lead me to
> > believe that they meant a "/" at the end and that this somehow
> > represented multiple address ranges. It does, but I still have no idea
> > how the system works even after reading a couple of tutorials on it.
> > Anyway, I eventually came up with a list to add to the "Allow" area as
> > follows. It worked, but it could be incorrect. Hopefully a Googler
> > will see this list and edit it to correct it.
> > Note! I added the # Googlebot notation for reference.
> > Hopefully this helps some others out there and saves them from the
> > hours of testing and searching and retesting I've had to do today.- Hide quoted text -
> Thanks for that info. I'll look into all of that and see if we can get
> it all fixed. The web site does get crawled (until this latest
> problem), and the smallparts.com.au site has over 5000 pages on
> Google, but maybe if we look into your suggestions we will get a lot
> more indexed.
> Anyway, I'm pleased to share with everyone that we have found the
> fault to be with the firewall on the server.
> It was very difficult to find much info about this, and our Internet
> Host was useless. First they said it was nothing to do with them, and
> when we pushed further they suggested to try "allowing" the Googlebot
> IP's in the firewall setup. The firewall is called "csf", and I'm sure
> many of you know the program. It's very good at what it does, but I
> couldn't find any help files on how to allow batches of IP address.
> Our web host couldn't help with this either.
> Anyway I pieced together a lot of info from many sources and finally
> came up with a solution which may be right or may be wrong, but
> Googlebot can now access the Robots.txt file again and is once again
> crawling our site.
> Here are a couple of key tips that may be useful to others...
> 1. If your Internet Host says that the problem is not with them, it
> probably is. My experience is that many people who work at Internet
> Hosting companies get very annoyed when you break into their online
> gaming time with (how dare you) a question. The simplest way for them
> to get back to their game is to deny responsibility and hope you will
> go away. They will then deal with the 5% of people who come back to
> them and push the issue - when they finish their game.
> 2. If you or your host are using a firewall program called "csf",
> check in the "Firewall Deny IP's" area. If any of Googlebots IP
> addresses have been added, that's where the error lies. Remove them
> and restart the firewall.
> 3. It wouldn't hurt to also add Googlebot's IP addresses to the
> "Firewall Allow IP's" area too. Now "csf" comes with no instructions
> about how to add groups of IP addresses except with some hint at the
> top that you can use "quaded" IP addresses. You probably don't know
> what "quaded" means and neither did I - it's not in any dictionary so
> they must have made it up. However their example "(e.g.
> 192.168.254.0/24)" and a reference to CIDR addressing lead me to
> believe that they meant a "/" at the end and that this somehow
> represented multiple address ranges. It does, but I still have no idea
> how the system works even after reading a couple of tutorials on it.
> Anyway, I eventually came up with a list to add to the "Allow" area as
> follows. It worked, but it could be incorrect. Hopefully a Googler
> will see this list and edit it to correct it.
> > Thanks for that info. I'll look into all of that and see if we can get
> > it all fixed. The web site does get crawled (until this latest
> > problem), and the smallparts.com.au site has over 5000 pages on
> > Google, but maybe if we look into your suggestions we will get a lot
> > more indexed.
> > Anyway, I'm pleased to share with everyone that we have found the
> > fault to be with the firewall on the server.
> > It was very difficult to find much info about this, and our Internet
> > Host was useless. First they said it was nothing to do with them, and
> > when we pushed further they suggested to try "allowing" the Googlebot
> > IP's in the firewall setup. The firewall is called "csf", and I'm sure
> > many of you know the program. It's very good at what it does, but I
> > couldn't find any help files on how to allow batches of IP address.
> > Our web host couldn't help with this either.
> > Anyway I pieced together a lot of info from many sources and finally
> > came up with a solution which may be right or may be wrong, but
> > Googlebot can now access the Robots.txt file again and is once again
> > crawling our site.
> > Here are a couple of key tips that may be useful to others...
> > 1. If your Internet Host says that the problem is not with them, it
> > probably is. My experience is that many people who work at Internet
> > Hosting companies get very annoyed when you break into their online
> > gaming time with (how dare you) a question. The simplest way for them
> > to get back to their game is to deny responsibility and hope you will
> > go away. They will then deal with the 5% of people who come back to
> > them and push the issue - when they finish their game.
> > 2. If you or your host are using a firewall program called "csf",
> > check in the "Firewall Deny IP's" area. If any of Googlebots IP
> > addresses have been added, that's where the error lies. Remove them
> > and restart the firewall.
> > 3. It wouldn't hurt to also add Googlebot's IP addresses to the
> > "Firewall Allow IP's" area too. Now "csf" comes with no instructions
> > about how to add groups of IP addresses except with some hint at the
> > top that you can use "quaded" IP addresses. You probably don't know
> > what "quaded" means and neither did I - it's not in any dictionary so
> > they must have made it up. However their example "(e.g.
> > 192.168.254.0/24)" and a reference to CIDR addressing lead me to
> > believe that they meant a "/" at the end and that this somehow
> > represented multiple address ranges. It does, but I still have no idea
> > how the system works even after reading a couple of tutorials on it.
> > Anyway, I eventually came up with a list to add to the "Allow" area as
> > follows. It worked, but it could be incorrect. Hopefully a Googler
> > will see this list and edit it to correct it.
> > Note! I added the # Googlebot notation for reference.
> > Hopefully this helps some others out there and saves them from the
> > hours of testing and searching and retesting I've had to do today.- Hide quoted text -