Web Images Videos Maps News Shopping Gmail more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Discussions > Crawling, indexing, and ranking > 10.379 not found pages in few days
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  19 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Dynamical.Biz  
View profile  
 More options May 9 2008, 3:45 am
From: Dynamical.Biz
Date: Fri, 9 May 2008 00:45:00 -0700 (PDT)
Subject: 10.379 not found pages in few days
Without any technical reason (as far as I know) webmastertools (Web
crawl errors > Not found) reports 10.3790 errors for my site http://lamundial.net

I've checked the server errors log but no clues, this non existing
urls have a strange pattern that makes me think in a glitch in
googlebot but everything toghether is quite odd.

PR has been dropped from 6 to 4. Could this be the reason?

Someone had a similar experience?
Thanks


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
cristina  
View profile  
 More options May 9 2008, 7:57 am
From: cristina
Date: Fri, 9 May 2008 04:57:49 -0700 (PDT)
Local: Fri, May 9 2008 7:57 am
Subject: Re: 10.379 not found pages in few days
It is difficult to have an opinion without some of the
not-found URLs.
Did you look at the Links and
the 'What Googlebot sees' pages in Google Webmaster Tools,
to check that all is OK there.

Cristina.

On May 9, 8:45 am, Dynamical.Biz wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dynamical.Biz  
View profile  
 More options May 9 2008, 12:00 pm
From: Dynamical.Biz
Date: Fri, 9 May 2008 09:00:01 -0700 (PDT)
Local: Fri, May 9 2008 12:00 pm
Subject: Re: 10.379 not found pages in few days
Here you have some
http://lamundial.net/home.php/bandas/ser-raro/songs/mp3dwn.php?id=61&...
http://lamundial.net/home.php/cfg/songs/songs/mp3dwn.php?pgrow=18&id=650

they are redirected to 404 error page obviously

thanks Cristina for your interest

On 9 mayo, 13:57, cristina wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
cristina  
View profile  
 More options May 9 2008, 5:33 pm
From: cristina
Date: Fri, 9 May 2008 14:33:46 -0700 (PDT)
Local: Fri, May 9 2008 5:33 pm
Subject: Re: 10.379 not found pages in few days
Check just in case if your site was hacked.
If you can, block the URLs in the robots.txt file,
if many of them start in the same way and
differently from good URLs.

I get HTTP status response 200 (OK)
( after redirection to the non-found page )
for inexistent URLs from your site,
you should have 404 (Not Found).

Cristina.

On May 9, 5:00 pm, Dynamical.Biz wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Berghausen Google employee  
View profile  
(1 user)  More options May 9 2008, 5:58 pm
From: Berghausen
Date: Fri, 9 May 2008 14:58:36 -0700 (PDT)
Local: Fri, May 9 2008 5:58 pm
Subject: Re: 10.379 not found pages in few days
Hi Dynamical-

There are two possible reasons I can think of for so many not-found
URLs happening:

1- Your site used to have a different URL structure where those URLs
were valid (as a result of a redesign or a hack), and there are broken
links on your site or other sites to these pages.  These might be
worth correcting. (see B and C below)

2- Sometimes authors type URL's wrong, spammers generate buggy URLs,
or any number of other things which produce dead-end links that never
did have any good content.  These you can usually safely ignore.

Here are some things to do:

A- It looks like Cristina just beat me to this (and point C,
congrats!), but you should redo your 404's.  Currently, when I go to
those "broken" links that "redirect to a 404" I don't actually get a
'404 Not Found'.  I am getting a 302 redirect to 404.php, which
returns a '200 OK'.  Perhaps one of the .htaccess wizards in the group
can advise you on how to configure your server to return a proper 404
header instead, while serving the body of 404.php as the message
content.

B- Take a good look around your site for broken links to bad pages.
Other folks around here have suggested Xenu as a useful tool for this,
but I cannot personally (or officially, as a Googler) endorse it, as
I've never used it.

C- If you notice that most of the nonexistent URL's can be matched to
a few simple patterns, you may want to add those patterns to your
robots.txt file.  That way, Googlebot won't even ask for them--you can
save both your server and our crawler precious time and bandwidth. :-)

As for your PR, I would advise you not to worry much about the green
pixels.  They're a rough approximation of one of hundreds of factors
we use when ranking sites.  Not to mention that the number is usually
only updated every few weeks or months, so it's not guaranteed to be
up to date.  If you've experienced a sudden drop in green-pixel PR, it
may be worth going over the Google Webmaster Guidelines to make sure
you're not in violation.  Just remember that PR changes over time just
like the web changes, so all sites can expect some fluctuation.

Let us know how it goes--if you'd like advice or explanation of
anything I've said (or for any other responses that show up here),
keep on asking.  We like questions around here.  Especially good, hard
ones.
-Bergy

On May 9, 2:33 pm, cristina wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dynamical.Biz  
View profile  
 More options May 11 2008, 6:59 pm
From: Dynamical.Biz
Date: Sun, 11 May 2008 15:59:45 -0700 (PDT)
Local: Sun, May 11 2008 6:59 pm
Subject: Re: 10.379 not found pages in few days
Hi Bergy, thanks a lot for your help
point by point

On 9 mayo, 23:58, Berghausen wrote:

> HiDynamical-

> There are two possible reasons I can think of for so many not-found
> URLs happening:

> 1- Your site used to have a different URL structure where those URLs
> were valid (as a result of a redesign or a hack), and there are broken
> links on your site or other sites to these pages.  These might be
> worth correcting. (see B and C below)

none of this to options, I checked as soon they appeared

> 2- Sometimes authors type URL's wrong, spammers generate buggy URLs,
> or any number of other things which produce dead-end links that never
> did have any good content.  These you can usually safely ignore.

yes, in error log I can see some worng URL's but not the kind that
webmastertool is showing and that is strnage

> Here are some things to do:

> A- It looks like Cristina just beat me to this (and point C,
> congrats!), but you should redo your 404's.  Currently, when I go to
> those "broken" links that "redirect to a 404" I don't actually get a
> '404 Not Found'.  I am getting a 302 redirect to 404.php, which
> returns a '200 OK'.

what is doing htaccess now is instread of showing the typical 404 Not
Found Apache webpage they are redirected to a custom error page trying
to keep as much trafic inside the web as possible giving some other
navigation options

> Perhaps one of the .htaccess wizards in the group
> can advise you on how to configure your server to return a proper 404
> header instead, while serving the body of 404.php as the message
> content.

> B- Take a good look around your site for broken links to bad pages.
> Other folks around here have suggested Xenu as a useful tool for this,
> but I cannot personally (or officially, as a Googler) endorse it, as
> I've never used it.

xenu report says everything is ok, I checked

> C- If you notice that most of the nonexistent URL's can be matched to
> a few simple patterns, you may want to add those patterns to your
> robots.txt file.  That way, Googlebot won't even ask for them--you can
> save both your server and our crawler precious time and bandwidth. :-)

Ok I'll Disallow /home*$ and see how it goes

> As for your PR, I would advise you not to worry much about the green
> pixels.  They're a rough approximation of one of hundreds of factors
> we use when ranking sites.  Not to mention that the number is usually
> only updated every few weeks or months, so it's not guaranteed to be
> up to date.  If you've experienced a sudden drop in green-pixel PR, it
> may be worth going over the Google Webmaster Guidelines to make sure
> you're not in violation.  Just remember that PR changes over time just
> like the web changes, so all sites can expect some fluctuation.

yes I know but is an strange coincidence, the 10.379 broken urls
appear and PR goes down 2 points, isn't it?

> Let us know how it goes--if you'd like advice or explanation of
> anything I've said (or for any other responses that show up here),
> keep on asking.  We like questions around here.  Especially good, hard
> ones.

thanks Bergy. this is a personal website so it is not be considered a
'dead or alive' thing but this so strange problems are very
interesting for a proffesional SEO

I'll keep you updated

Regards


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
webado  
View profile  
 More options May 11 2008, 8:35 pm
From: webado
Date: Sun, 11 May 2008 17:35:40 -0700 (PDT)
Local: Sun, May 11 2008 8:35 pm
Subject: Re: 10.379 not found pages in few days
Your .htaccess file likely has this directive in it:

ErrorDocument 404 http://lamundial.net/404.php

What the above does is in case of a 404 it does a 302 redirections to
the specifed url (because it is a fully quaified url).

This should be replaced by:

ErrorDocument 404 /404.php

Since the above is a url relative to the root there is no redirection
involved, the content of the error page will be shown with the 404
response code.

On May 11, 6:59 pm, Dynamical.Biz wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
webado  
View profile  
 More options May 11 2008, 8:59 pm
From: webado
Date: Sun, 11 May 2008 17:59:11 -0700 (PDT)
Local: Sun, May 11 2008 8:59 pm
Subject: Re: 10.379 not found pages in few days

Put this in the .htaccess file of your website root folder:

Options +Indexes +FollowSymlinks
RewriteEngine on
RewriteBase /

### re-direct index.php  to root / ###
RewriteCond %{THE_REQUEST} ^.*\/index\.php\ HTTP/
RewriteRule ^(.*)index\.php$ /$1 [R=301,L]

The above will 301 redirect /index.php to the root. You shoudl do that
to keep the site streamlined and not have duplciaiotn.

Remove any redirection you are currently doing from home.php to root.
Delete the file home.php if you actually have it.

Create a folder in the root of the website, named home.php  - yes, a
folder not a file.

Upload an .htaccess to the folder /home.php/ which contains just this:

Options +Indexed +FollowSymlinks
RewriteEngine on
Rrewritebase /
RewriteRule ^(.*)$ http://lamundial.net/ [R=301,nc]

This will 301 redirect all those bad urls to your homepage.

On May 11, 6:59 pm, Dynamical.Biz wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dynamical.Biz  
View profile  
 More options May 12 2008, 4:59 am
From: Dynamical.Biz
Date: Mon, 12 May 2008 01:59:22 -0700 (PDT)
Local: Mon, May 12 2008 4:59 am
Subject: Re: 10.379 not found pages in few days

On 12 mayo, 02:59, webado wrote:

> Put this in the .htaccess file of your website root folder:

> Options +Indexes +FollowSymlinks
> RewriteEngine on
> RewriteBase /

> ### re-direct index.php  to root / ###
> RewriteCond %{THE_REQUEST} ^.*\/index\.php\ HTTP/
> RewriteRule ^(.*)index\.php$ /$1 [R=301,L]

> The above will 301 redirect /index.php to the root. You shoudl do that
> to keep the site streamlined and not have duplciaiotn.

thanks, I forgot this one

> Remove any redirection you are currently doing from home.php to root.
> Delete the file home.php if you actually have it.

home.php is not existing from time ago

> Create a folder in the root of the website, named home.php  - yes, a
> folder not a file.

> Upload an .htaccess to the folder /home.php/ which contains just this:

> Options +Indexed +FollowSymlinks
> RewriteEngine on
> Rrewritebase /
> RewriteRule ^(.*)$http://lamundial.net/[R=301,nc]

> This will 301 redirect all those bad urls to your homepage.

I'll test tha one

In any case what I want you all to keep in mind is that all this
10.379 urls are not existing anywhere but somewhere in "google
indexing system memory".

I'm begining to believe quite sure is a google indexing problem, Why?

The wrong URL's are not broken internal/external linking.
- I'm not a PR neurotic but I try to ensure a good user experience,
this is why I did some htaccess redirections just in case
- 10.379 wrong urls appearing at webmastertool and PR going down 2
points not a coincidence?

They not appear at server error log
- if they would really be broken links the server error log would had
reflected all they time ago before google could discover them. None of
this 10.379 ones are in server error log!

Not too much time ago webmatertools for this account was not working
good, now works better but maybe there still is some glitch.

As I said, I'll wait several days till the website is reindexed again
and see if the robots.txt exclusion is having some positive efect.
Otherwise I'll ask reconsideration for the PR thing and if any way to
delete this URLs from webmatertools from google side

Saludos


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Phil Payne  
View profile  
 More options May 12 2008, 5:26 am
From: Phil Payne
Date: Mon, 12 May 2008 02:26:04 -0700 (PDT)
Local: Mon, May 12 2008 5:26 am
Subject: Re: 10.379 not found pages in few days

Yup.  I first observed this in 2006 and I've seen it several times
since.  Search Usenet on "googlebot active imagination".

I warn you - you're ploughing a lone furrow.  Everyone seems to
believe Google is infallible but if you've the kind of experience I
have (forty years now) you'll recognise that Google is not very good
at coding and even worse at testing.

Anyway, the basic test is blindingly obvious.  If such links existed
ANYWHERE the other bots would come looking for them and they don't.

It usually looks like the Googlebot has concatenated your domain name
with a bunch of relative URLs from some other random site.  One
characteristic is tha all these URLs appear in the reports in one pass
- another is that they'll never appear again - which, of course, they
would if they were the result of some misconfiguration either on your
site or somewhere else.

You wouldn't BELIEVE how reluctant otherwise rational people are to
accept this is a Googlebug.  The infallibility of the Googlebot is
part of their catechsim.


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dynamical.Biz  
View profile  
 More options May 12 2008, 6:14 am
From: Dynamical.Biz
Date: Mon, 12 May 2008 03:14:52 -0700 (PDT)
Local: Mon, May 12 2008 6:14 am
Subject: Re: 10.379 not found pages in few days
On 12 mayo, 11:26, Phil Payne wrote:

> Yup.  I first observed this in 2006 and I've seen it several times
> since.  Search Usenet on "googlebot active imagination".

> I warn you - you're ploughing a lone furrow.  Everyone seems to
> believe Google is infallible but if you've the kind of experience I
> have (forty years now) you'll recognise that Google is not very good
> at coding and even worse at testing.

I don't believe Google is infallible, nothing is but hard to
demostrate

> Anyway, the basic test is blindingly obvious.  If such links existed
> ANYWHERE the other bots would come looking for them and they don't.

thanks

> It usually looks like the Googlebot has concatenated your domain name
> with a bunch of relative URLs from some other random site.  One
> characteristic is tha all these URLs appear in the reports in one pass
> - another is that they'll never appear again - which, of course, they
> would if they were the result of some misconfiguration either on your
> site or somewhere else.

completely right, patern seems some crazy concatenation of pieces of
urls
if this was a site misconfiguration I would have notice time ago but
not

> You wouldn't BELIEVE how reluctant otherwise rational people are to
> accept this is a Googlebug.  The infallibility of the Googlebot is
> part of their catechsim.

I have no religion, any kind of them

    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
cristina  
View profile  
 More options May 12 2008, 7:08 am
From: cristina
Date: Mon, 12 May 2008 04:08:34 -0700 (PDT)
Local: Mon, May 12 2008 7:08 am
Subject: Re: 10.379 not found pages in few days
Another way to return HTTP status response
404 (Not Found) for non-existent URLs of the
site http://lamundial.net is to use the PHP header function
in the 404.php error page
http://lamundial.net/ 404.php
(I added a space in the URL not to be followed as a link)

Nonexistent URLs are redirected to 404.php
which returns HTTP status 200 (OK),
change that so 404.php to return HTTP status
response 404 (Not Found).

At the start of the 404.php source code add

<?php
header("HTTP/1.0 404 Not Found");
?>


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dynamical.Biz  
View profile  
 More options May 12 2008, 8:41 am
From: Dynamical.Biz
Date: Mon, 12 May 2008 05:41:37 -0700 (PDT)
Local: Mon, May 12 2008 8:41 am
Subject: Re: 10.379 not found pages in few days
Thanks Cristina but lets forguet the error handling

this is waht I've got in mi htaccess

 ErrorDocument 404 /404.php
 RewriteRule ^home.php/(.*)$ http://lamundial.net/404.php

first line is a regular 404 personaliced error page
second one is to capture th strange urls if any so I'll delete it as
soon as I get any conclusion

On 12 mayo, 13:08, cristina wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
cristina  
View profile  
 More options May 12 2008, 8:56 am
From: cristina
Date: Mon, 12 May 2008 05:56:09 -0700 (PDT)
Local: Mon, May 12 2008 8:56 am
Subject: Re: 10.379 not found pages in few days
I get now the correct response 404 (Not Found)
for non-existent URLs from your site,
including for the spammy URLs, so it looks OK.

I think you should continue to check if there was
some hacking, just in case,
look at 'what googlebot sees'
page in Google Webmaster Tools, and to
the cached copy as text from search results.

Cristina.

On May 12, 1:41 pm, Dynamical.Biz wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dynamical.Biz  
View profile  
 More options May 12 2008, 11:27 am
From: Dynamical.Biz
Date: Mon, 12 May 2008 08:27:53 -0700 (PDT)
Local: Mon, May 12 2008 11:27 am
Subject: Re: 10.379 not found pages in few days
thanks Cristina

On 12 mayo, 14:56, cristina wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dynamical.Biz  
View profile  
 More options May 13 2008, 11:29 am
From: Dynamical.Biz
Date: Tue, 13 May 2008 08:29:46 -0700 (PDT)
Local: Tues, May 13 2008 11:29 am
Subject: Re: 10.379 not found pages in few days
today error pages reported in webmatertools went down to 7.942

I think this  Disallow: /home*$  at robots.txt is helping a bit

Lets see in the next days


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dynamical.Biz  
View profile  
 More options May 28 2008, 3:57 am
From: Dynamical.Biz
Date: Wed, 28 May 2008 00:57:04 -0700 (PDT)
Local: Wed, May 28 2008 3:57 am
Subject: Re: 10.379 not found pages in few days
Googlebot came to lamundial.net 26/05/2008
now only 8 "404 errors"
10346 restricted in robots.txt using this not so standard but working
"Disallow: /home*$"

not a single coma changed in the website
maybe I'm going to remove the "Disallow: /home*$" to see if this
googlebot error comes again

PR matters
in the mean time PR downgrade from 6 to 4 is making my website lose
organic traffic and the ones gaining positions are no so relevant, the
first one for "musica copyleft" is just a bunch of copy and paste
articles while we write every single line of our posts. that's why we
are a reference

in other hand we want to get sponsors to finance our next copyleft
musical project and the lose of visibility is not a good thing right
now

On 13 mayo, 17:29, Dynamical.Biz wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dynamical.Biz  
View profile  
 More options Jun 2 2008, 7:54 am
From: Dynamical.Biz
Date: Mon, 2 Jun 2008 04:54:53 -0700 (PDT)
Local: Mon, Jun 2 2008 7:54 am
Subject: Re: 10.379 not found pages in few days
I removed the "Disallow: /home*$" from the robots.txt to see if this
strange 10.000 wrong URLs could happend again.
Since then googlebot has come 2 more times to my web and I have no
changed any single "," in the site so for me the conclusion is clear
right now:

googlebot got confused itself and generated this 10.000 non real wrong
urls
what is real for me is the consecuence of it, worst sites than mine
ranking better but this could be another chapter

that's all folks!
saludos


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dynamical.Biz  
View profile  
 More options Jun 2 2008, 10:47 am
From: Dynamical.Biz
Date: Mon, 2 Jun 2008 07:47:28 -0700 (PDT)
Local: Mon, Jun 2 2008 10:47 am
Subject: Re: 10.379 not found pages in few days
Sorry, I mised some words, it should be:

Since then googlebot has come 2 more times to my web, no changed a
single "," in it and the errors are not appearing again so for me the
conclusion is clear.


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google