We had a major issue over the weekend. Our sitemap was being blocked
by a proxy server that caused a 404 error every time Google tried to
crawl us. Our site was never down but Google bot was getting 404
errors from when i tried to index from our .xml sitemap.
We have now fixed the error and our site is being indexed again but
our homepage has not been indexed and in the reporting it says that
the last time the homepage was accessed it gave a 404 error. Our
operations team set up an external computer that acted like the Google
Bot and was able to access the homepage so we don't know why it is not
being indexed.
How do i know if the 404 error is still occurring ? Is there anyway to
get our homepage crawled and indexed?
Out site's Tool bar Page Rank has stayed at 7/10 and our European
sites are still showing up in the index for our branded terms but not
our .com site. We actually are still ranking for all terms that
didn't direct traffic to our homepage. Is this some kind of penalty
for the 404 errors over the weekend?
Our site is a major ecommerce site that has been around for over 11
years without ever going down for extended periods of time we
shouldn't be penalize for 2 days of 404 errors.
> We had a major issue over the weekend. Our sitemap was being blocked
> by a proxy server that caused a 404 error every time Google tried to
> crawl us. Our site was never down but Google bot was getting 404
> errors from when i tried to index from our .xml sitemap.
> We have now fixed the error and our site is being indexed again but
> our homepage has not been indexed and in the reporting it says that
> the last time the homepage was accessed it gave a 404 error. Our
> operations team set up an external computer that acted like the Google
> Bot and was able to access the homepage so we don't know why it is not
> being indexed.
> How do i know if the 404 error is still occurring ? Is there anyway to
> get our homepage crawled and indexed?
> Out site's Tool bar Page Rank has stayed at 7/10 and our European
> sites are still showing up in the index for our branded terms but not
> our .com site. We actually are still ranking for all terms that
> didn't direct traffic to our homepage. Is this some kind of penalty
> for the 404 errors over the weekend?
> Our site is a major ecommerce site that has been around for over 11
> years without ever going down for extended periods of time we
> shouldn't be penalize for 2 days of 404 errors.
> We had a major issue over the weekend. Our sitemap was being blocked
> by a proxy server that caused a 404 error every time Google tried to
> crawl us. Our site was never down but Google bot was getting 404
> errors from when i tried to index from our .xml sitemap.
> We have now fixed the error and our site is being indexed again but
> our homepage has not been indexed and in the reporting it says that
> the last time the homepage was accessed it gave a 404 error. Our
> operations team set up an external computer that acted like the Google
> Bot and was able to access the homepage so we don't know why it is not
> being indexed.
> How do i know if the 404 error is still occurring ? Is there anyway to
> get our homepage crawled and indexed?
> Out site's Tool bar Page Rank has stayed at 7/10 and our European
> sites are still showing up in the index for our branded terms but not
> our .com site. We actually are still ranking for all terms that
> didn't direct traffic to our homepage. Is this some kind of penalty
> for the 404 errors over the weekend?
> Our site is a major ecommerce site that has been around for over 11
> years without ever going down for extended periods of time we
> shouldn't be penalize for 2 days of 404 errors.
Would it be duplicate content if our... AbeBooks.de, AbeBooks.fr,
AbeBooks.it homepages are all showing in search results. It was
only .com homepage that isnt showing. But all the other euro sites
are in languages other than English.
Is there anyway to check forsure if it is duplicate content?
As AbeBooks.de, AbeBooks.fr, AbeBooks.it are in German, French and
Italian (respectively!) it would not constitute duplicate content.
The meaning might be the same, but it is different content because it
is in a different language.
Man spricht Deutsch! On parle francais! We speak English! Habla
espanol! Parlo Italiano! Kein Problem! Pas de probleme! etc
A borderline issue might theoretically arise for SOME product pages
if the identical book was being offered with identical information
(ie. if the English language version of the book was being sold on
each of the different country-specific websites). I still think that
in the vast majority of cases there would be enough difference between
the EN, DE, FR, and IT pages so that "duplicate content" would still
not be an issue.
> Would it be duplicate content if our... AbeBooks.de, AbeBooks.fr,
> AbeBooks.it homepages are all showing in search results. It was
> only .com homepage that isnt showing. But all the other euro sites
> are in languages other than English.
> Is there anyway to check forsure if it is duplicate content?
Also notice that you have got 302 temporary redirection from the non-
www version to the www-version ie
from: abebooks.co.uk/ to www.abebooks.co.uk/ and
from abebooks.com/ to www.abebooks.com/
The 302 temporary should be changed to 301 permanent redirection.
> Also notice that you have got 302 temporary redirection from the non-
> www version to the www-version ie
> from: abebooks.co.uk/ towww.abebooks.co.uk/ > and
> from abebooks.com towww.abebooks.com
> The 302 should be changed to 301 permanent redirection.
> Robbo
> On Jan 15, 6:35 pm, Robbo wrote:
> > As AbeBooks.de, AbeBooks.fr, AbeBooks.it are in German, French and
> > Italian (respectively!) it would not constitute duplicate content.
> > The meaning might be the same, but it is different content because it
> > is in a different language.
> > Man spricht Deutsch! On parle francais! We speak English! Habla
> > espanol! Parlo Italiano! Kein Problem! Pas de probleme! etc
> > A borderline issue might theoretically arise for SOME product pages
> > if the identical book was being offered with identical information
> > (ie. if the English language version of the book was being sold on
> > each of the different country-specific websites). I still think that
> > in the vast majority of cases there would be enough difference between
> > the EN, DE, FR, and IT pages so that "duplicate content" would still
> > not be an issue.
> > Robbo
> > On Jan 15, 6:01 pm, ABE wrote:
> > > I greatly appreciate your help!
> > > Would it be duplicate content if our... AbeBooks.de, AbeBooks.fr,
> > > AbeBooks.it homepages are all showing in search results. It was
> > > only .com homepage that isnt showing. But all the other euro sites
> > > are in languages other than English.
> > > Is there anyway to check forsure if it is duplicate content?
Thanks for the advice on the 301 over 302 i have passed it on and this
is being implemented but i don't think it is the cause for our problem
or the duplicate content either
I am fairly confident that the server that was blocking Google's
access to our site and giving Google a 404 for 2 whole days is the
problem. In Webmaster Tools it says the rest of my site is being
indexed but the exact error i get for the homepage is:
"We can't currently access your home page because of an HTTP error
(404).:"
There is nothing preventing people from accessing our homepage as far
as we can see and we tried from external locations and mimicked Google
bot as well.
Is there anything else that we can try?
Thanks again for all the help guys your time is greatly appreciated
> Thanks for the advice on the 301 over 302 i have passed it on and this
> is being implemented but i don't think it is the cause for our problem
> or the duplicate content either
> I am fairly confident that the server that was blocking Google's
> access to our site and giving Google a 404 for 2 whole days is the
> problem. In Webmaster Tools it says the rest of my site is being
> indexed but the exact error i get for the homepage is:
> "We can't currently access your home page because of an HTTP error
> (404).:"
> There is nothing preventing people from accessing our homepage as far
> as we can see and we tried from external locations and mimicked Google
> bot as well.
> Is there anything else that we can try?
> Thanks again for all the help guys your time is greatly appreciated
It appears that the https version being in the results is not the
cause of our problem but one of the symptoms.
The http:// version of our homepage is not being indexed. The https
version has always been indexed by Google but never came on top of the
http version until this problem occurred as the http version is not
indexing. Basically i can find every other page on our site but the
http version of the homepage in Google SERPs. 90% of our sales
driving keywords go to our homepage so basically our natural search
traffic is being drastically muted.
I forwarded your original response to our operations team and they are
looking further into it as we speak.
The server log doesn't show any 404 errors but it never did. The
errors were caused by an external server and never actually came from
our site. The only way we know there was a 404 error in the first
place was because Google reported it in webmaster tools when I viewed
the sitemaps. We fixed the problem with the server and now our
sitemaps are being consumed fine without error but Wbmaster Tools
overview states that the homepage is giving a 404 now.
I don't know what is going on, but when I access the site with a
Googlebot User Agent the homepage takes forever to load up. It does
return a 200 response, but it takes over a minute to load up. When I
switch back to a regular browser user agent, it loads quite quick
again.
> It appears that the https version being in the results is not the
> cause of our problem but one of the symptoms.
> The http:// version of our homepage is not being indexed. The https
> version has always been indexed by Google but never came on top of the
> http version until this problem occurred as the http version is not
> indexing. Basically i can find every other page on our site but the
> http version of the homepage in Google SERPs. 90% of our sales
> driving keywords go to our homepage so basically our natural search
> traffic is being drastically muted.
> I forwarded your original response to our operations team and they are
> looking further into it as we speak.
It looks like you have got us one step closer to the problem. It
looks like one of our servers may be causing an error for only search
engine bot traffic. The error is causing the bots to redirect which
is taking way too long and timing out.
If a page takes more than a min to load for a bot would the bot give
up after a few seconds and move on thus making note that the page
doesn't exist? (causing it to report a 404 error)
Not sure how they would report the error, there are specific errors
for timing out. I can only mimic the useragent, only a Google person
could mimic it coming from their specific IP address, so my test is
less than complete, though I've done it with three different methods
for testing the useragent change with the same response each time.
The behavior is quite repeatable on my end, switching user agents
changes the speed at which the page is downloaded, so you may be on to
something.
> It looks like you have got us one step closer to the problem. It
> looks like one of our servers may be causing an error for only search
> engine bot traffic. The error is causing the bots to redirect which
> is taking way too long and timing out.
> If a page takes more than a min to load for a bot would the bot give
> up after a few seconds and move on thus making note that the page
> doesn't exist? (causing it to report a 404 error)
Thanks JLH. I reported this to my operations team and they can see
that the server isn't directing the traffic correctly. So, they are
going to make some further changes.
Will you be able to check again as user agent once we can make some
changes to the servers?
> Thanks JLH. I reported this to my operations team and they can see
> that the server isn't directing the traffic correctly. So, they are
> going to make some further changes.
> Will you be able to check again as user agent once we can make some
> changes to the servers?
Your site responds very quickly to most request from the web sniffer
tool that I am using.
But when I select the googlebot-emulation, there is an indefinite
wait.
I suggest that you scrutinize your .htaccess file and any other script
you can think of that is attempting to detect a googlebot and remove
it.
Robbo
By the way I see that your .com has about 825,000 pages in the google
indexed whereas your .co.uk has over 1.25 million.
Is it possible this deindexing problem has been going on gradually for
some time and it has now come to the fore because the Homepage is
involved. Just a speculation.
> Thanks JLH. I reported this to my operations team and they can see
> that the server isn't directing the traffic correctly. So, they are
> going to make some further changes.
> Will you be able to check again as user agent once we can make some
> changes to the servers?