My site disappeared from Google results (http://www.ekklesia360.com/)
I believe because we had 'powered by Ekklesia' on our client site's
footers, with many hosted on our same server. I have now varied these
footers to have a 5 or so diff't tags as not to 'spam' that term.
1. How do I know if this is enough to get reconsidered?
2. How do I even know if this is what caused the problem?
From what I can tell, it looks like we haven't been able to reach your
site at all for quite some time now. That makes it a bit hard for us
to learn what your site is about and to send visitors your way :-).
You should be able to see this in your Webmasters Tools account as
well, errors such as these are either listed with your Sitemap files
or in the Diagnostics / Web crawl section.
1. Under 'Web crawl errors' it has 'robots.txt timeout' so I added one
(previously I had none). Otherwise, under Diagnostics it says: We
have no errors to report. We crawl regularly, so check back later to
see updates.
2. There was no Sitemap, so I uploaded one.
> From what I can tell, it looks like we haven't been able to reach your
> site at all for quite some time now. That makes it a bit hard for us
> to learn what your site is about and to send visitors your way :-).
> You should be able to see this in your Webmasters Tools account as
> well, errors such as these are either listed with your Sitemap files
> or in the Diagnostics / Web crawl section.
> 1. Under 'Web crawl errors' it has 'robots.txt timeout' so I added one
> (previously I had none). Otherwise, under Diagnostics it says: We
> have no errors to report. We crawl regularly, so check back later to
> see updates.
> 2. There was no Sitemap, so I uploaded one.
> Would these two things cause the problem?
> On Jun 17, 6:44 am, JohnMu wrote:
> > Hi Drew and welcome to the groups!
> > From what I can tell, it looks like we haven't been able to reach your
> > site at all for quite some time now. That makes it a bit hard for us
> > to learn what your site is about and to send visitors your way :-).
> > You should be able to see this in your Webmasters Tools account as
> > well, errors such as these are either listed with your Sitemap files
> > or in the Diagnostics / Web crawl section.
"... I believe because we had 'powered by Ekklesia' on our client
site's footers, with many hosted on our same server. I have now
varied these footers to have a 5 or so diff't tags as not to 'spam'
that term. ..."
.,.. so that isn't an issue and isn't causing the site(s) penalties/
problems?
Hi everyone
I just wanted to post a quick update: it seems to be working with your
robots.txt :-)
I passed that information on to some engineers here and it looks like
your series of redirects for invalid URLs could have been the cause of
us not being able to access your site. When we can't access the
robots.txt properly (when it times out like this or is generally
unreachable), we tend not to crawl the site at all just to be safe. In
this case, the robots.txt was missing and that was triggering a fairly
complex series of redirects:
One additional problem is that in order to access http://domain.com/ ,
we need to check the robots.txt for it as well. Here we have an
additional step in redirects
That's a lot of work & takes quite a bit of time :-).
I would really recommend that you change the behavior for invalid URLs
so that it just returns a HTTP result code 404 (with a nice error page
for the user of course).
Hope it helps, keep up the good sleuthing, JLH!
John
> When we can't access the
> robots.txt properly (when it times out like this or is generally
> unreachable), we tend not to crawl the site at all just to be safe.
> Thats actually a damned useful/important bit of information :D
> No robots and poor server response = shoddy crawling!
No, no.
It's not "no robots". Having no robots.txt file is just fine - it's a
clear message from the webmaster who controls the root that he/she has
no problem with robots crawling it.
Similarly a valid robots.txt - or at least one that can be sensibly
parsed even if it has errors - is no problem for a responsible bot.
It's where a site delivers a non-null robots.txt file that is
unparsable - then (I have suspected for some weeks) the Googlebot
simply turns its back on the site as a failsafe mechanism. Possibly
to avoid potential legal problems. I've posted this informed
speculation several times recently - it's great to see it confirmed.
Actually, I believe when a site returns an unparsable robots.txt it's
generally ok since we were at least able to get something back from
the server. If we aren't able to get any kind of reply back from the
server, that's where we tend to back away. That generally includes
situations where the URL is just unreachable (perhaps a "security
update" that ended up blocking us in general) or situations like this
where we give up trying to access the URL (which in a way is
unreachable as well :-)).
There are a lot of sites that return the homepage for all invalid
URLs, which will include the robots.txt if it doesn't exist. We have
to be able to cope with that :-), even if it's not a good practice.