I told you folks a while back about the disappearance of my best performing page (it was a page about how to get out of the supplemental index) from the google index - it's now happened to my SECOND best performing page -
The loss of the supps page halved my SE traffic. The loss of this page has QUARTERED that again... so because of the complete 'REMOVAL' of these two pages I'm down to barely 10% of the SE traffic I once had.
WTF is going on??! Why does google keep removing my pages? This latest removal occured on August 1, accorduing to my traffic logs.
I am gettin this 'WARNING' in my webmaster tools console -
"URLs not followed When we tested a sample of the URLs from your Sitemap, we found that some URLs were not accessible to Googlebot because they contained too many redirects. Please change the URLs in your Sitemap that redirect and replace them with the destination URL (the redirect target). All valid URLs will still be submitted. [?] "
Google seems to be getting more and more wacky with their results. Several search terms I track clearly put Yahoo in front in terms of best results for those terms. How are your Yahoo/MSN rankings?
Anyway, perhaps a long shot, but someone smart has just posted a detailed review of one way that can be used to get a site penalized in google (yes, that's right, google isn't perfect - I know a few people will come along and shoot me down for this one!) that apparently G knows about but hasn't been able to fix:
Those redirects are a known problem (I wonder if anyone ever mentioned it to Google?). The problem comes from having URLs in your sitemap file that are redirected ("too many" times being at least one time :-)).
Looking at your sitemap file, I see that ALL your URLs are without the trailing slash!!
Wait a second. Could it be that you're knocking yourself out of the index?
Not only is your sitemap without trailing slashes, but ALL internal links are also without them. Not that good...
I imagine that you're doing the redirect in your htaccess file, right? You need to make sure that your permalink structure also has the trailing slash (that puts it into your sitemap file and also changes the internal links). Then regenerate your sitemap file + ping it with the engines.
Your site map should contain the FINAL urls, thus those with trailing slashes, so no redirection will happen to those url's.
Also your navigation must have those url's with trailing slashes, so robots don't get redirected. when crawling the site at all. The redirection is in order to take care of previously indexed url's that didn't have trailing slashes and any inbound or bookmarked links floating around that don't have trailing slashes.
I'm starting to think that 'something is rotten in the state of Denmark' as Shakespeare used to say when he ran a website. This problem with Google unable to follow even one redirect has been voiced alot lately.
My guess is that they are trying to selectively follow 301's based on ANOTHER anti-spam algorithm due to people buying domains and 301'ing them to gain backlinks and as usual good sites get caught up in their dragnet. I of course can offer no proof of this.
I even read graywolf talking about it. Why you want to have the redirected URLS in your sitemap "so they will find them" is beyond me, I thought he was much brighter than that, I always thought the point was to get the new URLs indexed and the old ones dropped out.
> Your site map should contain the FINAL urls, thus those with trailing > slashes, so no redirection will happen to those url's.
> Also your navigation must have those url's with trailing slashes, so > robots don't get redirected. when crawling the site at all. > The redirection is in order to take care of previously indexed url's > that didn't have trailing slashes and any inbound or bookmarked links > floating around that don't have trailing slashes.
I didn't say to put the old urls that get redirected in the sitemap but rather the exact opposite. Use the new urls to which the old one get redirected.
> I'm starting to think that 'something is rotten in the state of > Denmark' as Shakespeare used to say when he ran a website. This > problem with Google unable to follow even one redirect has been voiced > alot lately.
> My guess is that they are trying to selectively follow 301's based on > ANOTHER anti-spam algorithm due to people buying domains and 301'ing > them to gain backlinks and as usual good sites get caught up in their > dragnet. I of course can offer no proof of this.
> I even read graywolf talking about it. Why you want to have the > redirected URLS in your sitemap "so they will find them" is beyond me, > I thought he was much brighter than that, I always thought the point > was to get the new URLs indexed and the old ones dropped out.
> On Aug 17, 8:16 am, webado wrote:
> > Your site map should contain the FINAL urls, thus those with trailing > > slashes, so no redirection will happen to those url's.
> > Also your navigation must have those url's with trailing slashes, so > > robots don't get redirected. when crawling the site at all. > > The redirection is in order to take care of previously indexed url's > > that didn't have trailing slashes and any inbound or bookmarked links > > floating around that don't have trailing slashes.
> > On Aug 17, 6:09 am, dockarl wrote:
> > > the url's it's reporting this problem with -
Webado, I know you didn't!!!!!!! I was refering to Graywolfs post, I'm sorry I should have been clearer on that. he said, "I want to keep feeding them the sitemap at least until they pick up that most of the files have moved and adjust them in the index" which doesn't make any sense to me.
> I didn't say to put the old urls that get redirected in the sitemap > but rather the exact opposite. Use the new urls to which the old one > get redirected.
> On Aug 17, 10:02 am, JLH wrote:
> > I'm starting to think that 'something is rotten in the state of > > Denmark' as Shakespeare used to say when he ran a website. This > > problem with Google unable to follow even one redirect has been voiced > > alot lately.
> > My guess is that they are trying to selectively follow 301's based on > > ANOTHER anti-spam algorithm due to people buying domains and 301'ing > > them to gain backlinks and as usual good sites get caught up in their > > dragnet. I of course can offer no proof of this.
> > I even read graywolf talking about it. Why you want to have the > > redirected URLS in your sitemap "so they will find them" is beyond me, > > I thought he was much brighter than that, I always thought the point > > was to get the new URLs indexed and the old ones dropped out.
> > On Aug 17, 8:16 am, webado wrote:
> > > Your site map should contain the FINAL urls, thus those with trailing > > > slashes, so no redirection will happen to those url's.
> > > Also your navigation must have those url's with trailing slashes, so > > > robots don't get redirected. when crawling the site at all. > > > The redirection is in order to take care of previously indexed url's > > > that didn't have trailing slashes and any inbound or bookmarked links > > > floating around that don't have trailing slashes.
> > > On Aug 17, 6:09 am, dockarl wrote:
> > > > the url's it's reporting this problem with -
The lack of a trailing slash on a URL ending in a directory name is going to cause an additional redirect every time, as well as being bad practice. It's not likely to completely solve the issues you're facing, but adding the trailing slash in all such links would seem a pretty painless step in the right direction.
> The lack of a trailing slash on a URL ending in a directory name is > going to cause an additional redirect every time, as well as being bad > practice. It's not likely to completely solve the issues you're > facing, but adding the trailing slash in all such links would seem a > pretty painless step in the right direction.
>Any truth in the rumour that you take the studio photos too
No, I actually tended to ping off when we had the studio shots - I'd feel like a pervert :)
>You'll just have to spend more time on the coding and less time on the >beers and the BLs (backless lingerie, not backlinks).
The way wordpress implements back slashes (and to an extent, rewrites) is a total debacle. I've now changed my rewrite to remove the trailing slash instead of add it. We'll see how that goes.
I must admit though, I'm extremely sceptical that this is the CAUSE of the 'disappearing pages', after all, the only pages that are disappearing are my high traffic pages - I've had pages indexed perfectly well since I added the trailing slash redirect, and also the Supplementals page disappeared well before I ever had that redirect in place.
I'm thinking this is some kind of "We'll knock off your best performing pages" penalty - that's something I've never heard of before. John's suggested that it may be because my site benefits from a large number of incoming links from footer links in my relatively succesful wordpress theme - ie that they are considered paid links. Ok, so I have strong opinions about that - I think that if a site chooses to use a free theme it's well and truly acceptable that that site should vote for the theme - also, being open source, people can feel free to remove the link if they want to.
Maybe Google thinks differently, but, if so - again - I think they should enunciate that - because with 10K copies of my theme 'in the wild' it's sure as hell going to be difficult to go back and get all those links nofollowed. That kind of thing would be a perfect candidate for an automated message in webmaster tools.
> >Any truth in the rumour that you take the studio photos too
> No, I actually tended to ping off when we had the studio shots - I'd > feel like a pervert :)
> >You'll just have to spend more time on the coding and less time on the > >beers and the BLs (backless lingerie, not backlinks).
> The way wordpress implements back slashes (and to an extent, rewrites) > is a total debacle. I've now changed my rewrite to remove the trailing > slash instead of add it. We'll see how that goes.
> M
> On Aug 18, 10:01 am, Robbo wrote:
> > Matt
> > You'll just have to spend more time on the coding and less time on the > > beers and the BLs (backless lingerie, not backlinks).
> > Any truth in the rumour that you take the studio photos too? :-)
I agree, normally the trailing slash shouldn't be a problem. However, high-value pages are crawled more often and will get treated with a "higher priority"... If -- assuming the worst -- the "too many redirects" were actually a real problem instead of just a warning, it would affect your high-priority pages first.
I understand your thinking about the theme, that's what I would want in your case as well. However, I could see that Google might not like that thinking as much (eg "paying" for the theme through a link, in other words, it's a paid link). Assuming they were doing something like that, I imagine we'd see a lot more of it in the forums (assuming they're paying as much attention as you are, lol). Wouldn't it hit "wordpress.org" first? Which WP blog doesn't have a clean link to wordpress.org?
The way I see it, when a quality resource is dropped from Google, is doing nothing wrong according to their terms and conditions and still fine in all the other search engines, there are 4 options.
a. Something g changed with its algo is starting to affect quality sites/pages in undesired ways. No one at Google is realizing this or they can't seem to fix the problem. b. Something within the algo's storing mechanism is broken, perhaps based on past stored information that affects a sites quality measure being lost and no one realizing this or it taking a long time to recalibrate this. This would explain why it's not affecting more sites, but slowly having a knock on effect. I read somewhere that Eric Schmidt quoted their servers as being very full, so this is not as much of a stretch as it seems. Perhaps some intern wiped out a few harddrives while trying to add more space for gmail :) c. They changed something and haven't notified anyone anywhere that x is now considered bad. If this option they will get their n***s sued off. d. You are being targetted by some negative form of SEO that Google isn't able to filter out. The above link is a really good example of something that Google knows about but which they don't seem willing/ able to stop (a disgrace!).
I know of at least one quality resource that has posted here about a similar problem problem and has had a (well known) googler actually looking at the issue but the googler was just as stumped as the site owner.... not a good sign!! Any way you look at this, it blows and for users means that they are no longer getting the best results through Google.
> I agree, normally the trailing slash shouldn't be a problem. However, > high-value pages are crawled more often and will get treated with a > "higher priority"... If -- assuming the worst -- the "too many > redirects" were actually a real problem instead of just a warning, it > would affect your high-priority pages first.
> I understand your thinking about the theme, that's what I would want > in your case as well. However, I could see that Google might not like > that thinking as much (eg "paying" for the theme through a link, in > other words, it's a paid link). Assuming they were doing something > like that, I imagine we'd see a lot more of it in the forums (assuming > they're paying as much attention as you are, lol). Wouldn't it hit > "wordpress.org" first? Which WP blog doesn't have a clean link to > wordpress.org?
The thing that gets my knickers in a knot is this - one of the major aims of my site (when i'm not going through a posting doldrum as i am at present) is to find and write about answers to problems that I've personally had (usually IT related), and not been able to find any answer on Google... That particular page is one example.
The other one that got dropped (about supps) is another example, although that information can be found elsewhere, prob not as well 'summarised'..
But another one that was dropped was about a problem where I couldn't get any sound when I was watching video's on the CNN website.
Of those, prob the two that had the most traffic and 'oh my God - I'm so glad you wrote about this - I couldn't find the answer ANYWHERE' type comments are this one and the CNN video page.
So, it would seem that these pages offer unique information, the information can't be found elsewhere - and yet, because this kind of stuff doesn't get a lot of backlinks, it's virtually doomed to slip into supps (or, the newest trick, get delisted) after a few weeks never to be found again, whereas airy fairy discussion type posts or recaps get link love and survive.
I DO find it irritating - and it's prob one of the major deficiencies of a system reliant on backlinks as its major measure of popularity.
> The way I see it, when a quality resource is dropped from Google, is > doing nothing wrong according to their terms and conditions and still > fine in all the other search engines, there are 4 options.
> a. Something g changed with its algo is starting to affect quality > sites/pages in undesired ways. No one at Google is realizing this or > they can't seem to fix the problem. > b. Something within the algo's storing mechanism is broken, perhaps > based on past stored information that affects a sites quality measure > being lost and no one realizing this or it taking a long time to > recalibrate this. This would explain why it's not affecting more > sites, but slowly having a knock on effect. I read somewhere that Eric > Schmidt quoted their servers as being very full, so this is not as > much of a stretch as it seems. Perhaps some intern wiped out a few > harddrives while trying to add more space for gmail :) > c. They changed something and haven't notified anyone anywhere that x > is now considered bad. If this option they will get their n***s sued > off. > d. You are being targetted by some negative form of SEO that Google > isn't able to filter out. The above link is a really good example of > something that Google knows about but which they don't seem willing/ > able to stop (a disgrace!).
> I know of at least one quality resource that has posted here about a > similar problem problem and has had a (well known) googler actually > looking at the issue but the googler was just as stumped as the site > owner.... not a good sign!! Any way you look at this, it blows and for > users means that they are no longer getting the best results through > Google.
> On Aug 18, 9:14 am, JohnMu wrote:
> > I agree, normally the trailing slash shouldn't be a problem. However, > > high-value pages are crawled more often and will get treated with a > > "higher priority"... If -- assuming the worst -- the "too many > > redirects" were actually a real problem instead of just a warning, it > > would affect your high-priority pages first.
> > I understand your thinking about the theme, that's what I would want > > in your case as well. However, I could see that Google might not like > > that thinking as much (eg "paying" for the theme through a link, in > > other words, it's a paid link). Assuming they were doing something > > like that, I imagine we'd see a lot more of it in the forums (assuming > > they're paying as much attention as you are, lol). Wouldn't it hit > > "wordpress.org" first? Which WP blog doesn't have a clean link to > > wordpress.org?
But Google running low on storage capacity? That's an interesting theory from Eric. I tend to doubt it, but it's interesting just the same.
I wonder where Google buys its hard drives? I'd love to have that contract.
I have thought though that it would be interesting to see how their processor requirements scale with exponential growth in network size. One would tend to think that the processing requirements would tend to increase non-linearly with network size - that's gotta be their biggest challenge, and it's my hunch that's the reason for the supps.
> The thing that gets my knickers in a knot is this - one of the major > aims of my site (when i'm not going through a posting doldrum as i am > at present) is to find and write about answers to problems that I've > personally had (usually IT related), and not been able to find any > answer on Google... That particular page is one example.
> The other one that got dropped (about supps) is another example, > although that information can be found elsewhere, prob not as well > 'summarised'..
> But another one that was dropped was about a problem where I couldn't > get any sound when I was watching video's on the CNN website.
> Of those, prob the two that had the most traffic and 'oh my God - I'm > so glad you wrote about this - I couldn't find the answer ANYWHERE' > type comments are this one and the CNN video page.
> So, it would seem that these pages offer unique information, the > information can't be found elsewhere - and yet, because this kind of > stuff doesn't get a lot of backlinks, it's virtually doomed to slip > into supps (or, the newest trick, get delisted) after a few weeks > never to be found again, whereas airy fairy discussion type posts or > recaps get link love and survive.
> I DO find it irritating - and it's prob one of the major deficiencies > of a system reliant on backlinks as its major measure of popularity.
> doc
> On Aug 18, 8:11 pm, Sam I Am wrote:
> > The way I see it, when a quality resource is dropped from Google, is > > doing nothing wrong according to their terms and conditions and still > > fine in all the other search engines, there are 4 options.
> > a. Something g changed with its algo is starting to affect quality > > sites/pages in undesired ways. No one at Google is realizing this or > > they can't seem to fix the problem. > > b. Something within the algo's storing mechanism is broken, perhaps > > based on past stored information that affects a sites quality measure > > being lost and no one realizing this or it taking a long time to > > recalibrate this. This would explain why it's not affecting more > > sites, but slowly having a knock on effect. I read somewhere that Eric > > Schmidt quoted their servers as being very full, so this is not as > > much of a stretch as it seems. Perhaps some intern wiped out a few > > harddrives while trying to add more space for gmail :) > > c. They changed something and haven't notified anyone anywhere that x > > is now considered bad. If this option they will get their n***s sued > > off. > > d. You are being targetted by some negative form of SEO that Google > > isn't able to filter out. The above link is a really good example of > > something that Google knows about but which they don't seem willing/ > > able to stop (a disgrace!).
> > I know of at least one quality resource that has posted here about a > > similar problem problem and has had a (well known) googler actually > > looking at the issue but the googler was just as stumped as the site > > owner.... not a good sign!! Any way you look at this, it blows and for > > users means that they are no longer getting the best results through > > Google.
> > On Aug 18, 9:14 am, JohnMu wrote:
> > > I agree, normally the trailing slash shouldn't be a problem. However, > > > high-value pages are crawled more often and will get treated with a > > > "higher priority"... If -- assuming the worst -- the "too many > > > redirects" were actually a real problem instead of just a warning, it > > > would affect your high-priority pages first.
> > > I understand your thinking about the theme, that's what I would want > > > in your case as well. However, I could see that Google might not like > > > that thinking as much (eg "paying" for the theme through a link, in > > > other words, it's a paid link). Assuming they were doing something > > > like that, I imagine we'd see a lot more of it in the forums (assuming > > > they're paying as much attention as you are, lol). Wouldn't it hit > > > "wordpress.org" first? Which WP blog doesn't have a clean link to > > > wordpress.org?
> But Google running low on storage capacity? That's an interesting > theory from Eric. I tend to doubt it, but it's interesting just the > same.
Hey, he's only the CEO of Google; what could he possibly know :)
> I have thought though that it would be interesting to see how their > processor requirements scale with exponential growth in network size. > One would tend to think that the processing requirements would tend to > increase non-linearly with network size - that's gotta be their > biggest challenge, and it's my hunch that's the reason for the supps.
This is along the lines of what I'm thinking too. I'm wondering if in their constant attempts to get the absolute freshest results, older sites with more established content are losing out as those older backlinks devalue in some way (ie. you have to have more fresh backlinks to be considered). If that's the case, it's a bad move of course, but you know they have to be struggling with the amount of data they are trying to parse as millions (billions??) of new pages are added each day!!
> But Google running low on storage capacity? That's an interesting > theory from Eric. I tend to doubt it, but it's interesting just the > same.
> I wonder where Google buys its hard drives? I'd love to have that > contract.
> I have thought though that it would be interesting to see how their > processor requirements scale with exponential growth in network size. > One would tend to think that the processing requirements would tend to > increase non-linearly with network size - that's gotta be their > biggest challenge, and it's my hunch that's the reason for the supps.
> M
> On Aug 18, 8:28 pm, dockarl wrote:
> > The thing that gets my knickers in a knot is this - one of the major > > aims of my site (when i'm not going through a posting doldrum as i am > > at present) is to find and write about answers to problems that I've > > personally had (usually IT related), and not been able to find any > > answer on Google... That particular page is one example.
> > The other one that got dropped (about supps) is another example, > > although that information can be found elsewhere, prob not as well > > 'summarised'..
> > But another one that was dropped was about a problem where I couldn't > > get any sound when I was watching video's on the CNN website.
> > Of those, prob the two that had the most traffic and 'oh my God - I'm > > so glad you wrote about this - I couldn't find the answer ANYWHERE' > > type comments are this one and the CNN video page.
> > So, it would seem that these pages offer unique information, the > > information can't be found elsewhere - and yet, because this kind of > > stuff doesn't get a lot of backlinks, it's virtually doomed to slip > > into supps (or, the newest trick, get delisted) after a few weeks > > never to be found again, whereas airy fairy discussion type posts or > > recaps get link love and survive.
> > I DO find it irritating - and it's prob one of the major deficiencies > > of a system reliant on backlinks as its major measure of popularity.
> > doc
> > On Aug 18, 8:11 pm, Sam I Am wrote:
> > > The way I see it, when a quality resource is dropped from Google, is > > > doing nothing wrong according to their terms and conditions and still > > > fine in all the other search engines, there are 4 options.
> > > a. Something g changed with its algo is starting to affect quality > > > sites/pages in undesired ways. No one at Google is realizing this or > > > they can't seem to fix the problem. > > > b. Something within the algo's storing mechanism is broken, perhaps > > > based on past stored information that affects a sites quality measure > > > being lost and no one realizing this or it taking a long time to > > > recalibrate this. This would explain why it's not affecting more > > > sites, but slowly having a knock on effect. I read somewhere that Eric > > > Schmidt quoted their servers as being very full, so this is not as > > > much of a stretch as it seems. Perhaps some intern wiped out a few > > > harddrives while trying to add more space for gmail :) > > > c. They changed something and haven't notified anyone anywhere that x > > > is now considered bad. If this option they will get their n***s sued > > > off. > > > d. You are being targetted by some negative form of SEO that Google > > > isn't able to filter out. The above link is a really good example of > > > something that Google knows about but which they don't seem willing/ > > > able to stop (a disgrace!).
> > > I know of at least one quality resource that has posted here about a > > > similar problem problem and has had a (well known) googler actually > > > looking at the issue but the googler was just as stumped as the site > > > owner.... not a good sign!! Any way you look at this, it blows and for > > > users means that they are no longer getting the best results through > > > Google.
> > > On Aug 18, 9:14 am, JohnMu wrote:
> > > > I agree, normally the trailing slash shouldn't be a problem. However, > > > > high-value pages are crawled more often and will get treated with a > > > > "higher priority"... If -- assuming the worst -- the "too many > > > > redirects" were actually a real problem instead of just a warning, it > > > > would affect your high-priority pages first.
> > > > I understand your thinking about the theme, that's what I would want > > > > in your case as well. However, I could see that Google might not like > > > > that thinking as much (eg "paying" for the theme through a link, in > > > > other words, it's a paid link). Assuming they were doing something > > > > like that, I imagine we'd see a lot more of it in the forums (assuming > > > > they're paying as much attention as you are, lol). Wouldn't it hit > > > > "wordpress.org" first? Which WP blog doesn't have a clean link to > > > > wordpress.org?