In all likelihood the cure to this will probably be to wait a little while - a week is not a long time.
But if you'd like to post the url of one of the new pages (preferably the oldest one) that hasn't been indexed, someone might have a look a and check for anything obvious that might be wrong.
> I have a site that is well indexed (last indexed yesterday), but about > a week ago google stopped indexing any NEW pages that I add.
> Normally, I'd add a new page and it'd appear in google web searches > within 2 days.
> I've done nothing different, and google is still regularly indexing > the home page.
> I uploaded sitemaps yesterday thinking that might help, but so far, > nothing. I tried manually submitting a new page or two to google, and > that didn't work either.
> Yahoo is still indexing everything normally. It's just google. Anyone > any idea what gives or experiencing the same issues? Has google done > anything differently within the past week or so or are they just > giving priority to blogs now?!?!
> I uploaded sitemaps yesterday thinking that might help, but so far, > nothing. I tried manually submitting a new page or two to google, and > that didn't work either.
And setting every lastmod to today and every changefreq to daily is going to have what effect exactly?
Have you read and at least tried to understand the concept of sitemaps?
It could be, again, that your archives are too deep in the directory structure.
There certainly isn't anything I can see (obviously - not excluded in robots.txt, no meta noindex tag) that would stop gbot from accessing those pages.
Sorry I can't be of more specific assistance - but good luck!
> I've looked into it further, and although it started happening about a > week ago with new pages, I've now discovered google has dropped at > least a hundred or so pages that were previously indexed too - going > back months. All are in two directories http://www.clooneystudio.com/scans > andhttp://www.clooneystudio.com/articles2007. Both of these > directories are linked to from the main page (pagerank of 5).
> When I add a new page, a link goes on my main page for 3-4 days before > it moves to another page, which has always been plenty of time to > index it, as it seems to index daily or every second day at least.
> I'll wait a little longer as you say. I just find it very strange that > everything was working ok until just recently and I've done nothing to > affect this change.
> On Jul 8, 2:23 am, dockarl wrote:
> > Hi CS!
> > In all likelihood the cure to this will probably be to wait a little > > while - a week is not a long time.
> > But if you'd like to post the url of one of the new pages (preferably > > the oldest one) that hasn't been indexed, someone might have a look a > > and check for anything obvious that might be wrong.
Ah yes.. but you need to remember that those pages (under your old structure) are probably quite deeply buried. Gbot is pretty clever when it comes to working out how often to crawl pages.. if it sees a page virtually never changes, it will virtually never crawl that page.
Until gbot crawls those pages, it won't know they've moved, they won't be indexed under the new site and hence they won't show up in a site search either.
If you really want to speed up the process try getting putting some links to the OLD url's on your front page for a couple days - MAY help as the front page is generally crawled pretty regularly, and should send the bots winging their way to the old content for them to discover the redirects.
> One other point that is now concerning me with new pages not being > added, is that google adsense for search on my site doesn't work for > people wanting to search recent material, when it previously did.
> I added a new page on June 27 that was accepted but my next new page > on July 1, and *all* of them since have still to go through. New pages > have always been added within 1-2 days for my site, so I'm more > doubtful of the "week isn't very long" reason. To have a bunch > outstanding eight days later, and no idea when or if they'll actually > get added is frustrating.
Sorry CS - I was inadvertantly getting you mixed up with another person who has been asking similar questions - but in their case they had recently completed some 301 redirects - moved their site across to another domain - so prob my answer would have looked about as clear as mud :) Sorry!
As I said I personally can't see anything wrong with your site (after looking quite deeply) that would be an obvious impediment to search engines.. My hunch is that it is a matter of waiting a bit longer, even though I know it is incredibly frustrating (having suffered the same frustration myself on numerous occasions).
Something that might help you bide your time is to keep an eye on your server logs - do a search of your logs on a daily basis for the term 'googlebot' and you'll be able to see which pages have been crawled.
Gbot indeed sometimes seems quite random about the way it indexes, but still, by far the best way I know of accelerating a crawl is to try your best to get external links to the content you'd like crawled.
> It's the same site structure that I've always had. These new pages > that are being missed were linked on my front page for at least 4 > days, during which time the front page has indexed twice and missed > them. Adding them back there again would be hoping for third time > lucky lol! It *always* picked them up before and added them quickly. > Even the archived location where the URLs are now on is linked on a > regularly indexed page. Gbot crawled my news archive page (http://www.clooneystudio.com/news/) on July 5 and missed a link I added July > 1. A link where most of the new pages are.
> What I don't get is that there's no change in what I'm doing, the new > pages are linked to regularly indexed pages, yet somehow gbot is now > missing new links. I even manually submitted a couple of them without > success.
> I've now made minor text changes to the new pages to republish them > all again thinking that might help and updated my sitemap.
> Yahoo search picked them up no problem days ago. I thought Google was > supposed to be smarter than Yahoo? ;)
> On Jul 10, 1:23 am, dockarl wrote:
> > Ah yes.. but you need to remember that those pages (under your old > > structure) are probably quite deeply buried. Gbot is pretty clever > > when it comes to working out how often to crawl pages.. if it sees a > > page virtually never changes, it will virtually never crawl that page.
> > Until gbot crawls those pages, it won't know they've moved, they won't > > be indexed under the new site and hence they won't show up in a site > > search either.
> > If you really want to speed up the process try getting putting some > > links to the OLD url's on your front page for a couple days - MAY help > > as the front page is generally crawled pretty regularly, and should > > send the bots winging their way to the old content for them to > > discover the redirects.
> It's the same site structure that I've always had. These new pages > that are being missed were linked on my front page for at least 4 > days, during which time the front page has indexed twice and missed > them. Adding them back there again would be hoping for third time > lucky lol! It *always* picked them up before and added them quickly. > Even the archived location where the URLs are now on is linked on a > regularly indexed page. Gbot crawled my news archive page (http://www.clooneystudio.com/news/) on July 5 and missed a link I added July > 1. A link where most of the new pages are.
> What I don't get is that there's no change in what I'm doing, the new > pages are linked to regularly indexed pages, yet somehow gbot is now > missing new links. I even manually submitted a couple of them without > success.
> I've now made minor text changes to the new pages to republish them > all again thinking that might help and updated my sitemap.
> Yahoo search picked them up no problem days ago. I thought Google was > supposed to be smarter than Yahoo? ;)
> On Jul 10, 1:23 am, dockarl wrote:
> > Ah yes.. but you need to remember that those pages (under your old > > structure) are probably quite deeply buried. Gbot is pretty clever > > when it comes to working out how often to crawl pages.. if it sees a > > page virtually never changes, it will virtually never crawl that page.
> > Until gbot crawls those pages, it won't know they've moved, they won't > > be indexed under the new site and hence they won't show up in a site > > search either.
> > If you really want to speed up the process try getting putting some > > links to the OLD url's on your front page for a couple days - MAY help > > as the front page is generally crawled pretty regularly, and should > > send the bots winging their way to the old content for them to > > discover the redirects.
Good point Aaron (+ welcome back from your hiatus) - C S there is a thread that might at first seem only tangentially related - but another of the 'regular posters' is talking a bit about pagerank, site structure and how modifying the flow of pagerank through a site and/or modifying your site structure can help with indexing - it's an interesting read.
I've also written a plugin recently (it is for wordpress though) which led me to think and write a a little bit about many of the same issues - you might like to read that as the general concepts can probably be applied to most any site.
Whether Craig, Aaron or I are on the right track is a matter of personal opinion though - but from personal experience getting 'deeper' pages in a site indexed can be pretty difficult unless you make efforts to guide the bots and pagerank to them. Aaron's advice might be right on track - a little bit more thought about your site structure could improve it both for your visitors and the robots.
> Whether Craig, Aaron or I are on the right track is a matter of > personal opinion though - but from personal experience getting > 'deeper' pages in a site indexed can be pretty difficult unless you > make efforts to guide the bots and pagerank to them.
Network Unreachable usually indicates just that - a server or DNS problem - it's probably just a temporary glitch.
It could be that the 'temporary glitch' could be related to your delayed indexing. If I were you what I'd be doing is checking your site logs (do a search for useragent "googlebot") and see whether Googlebot has actually visited or not in the intervening period.
More than likely (since its only three days) that 'problem' will self correct. Generally you see the bot come by and the index updated a day or so later.
I wouldn't delete the sitemap. I'd be 99.99% certain you are right - that's a coincedence.
> Thank you Matt. You've been most helpful throughout.
> Okay... new problem. My homepage has stopped indexing regularly. It > hasn't been indexed since July 7, which is very, very unusual for me > and means the cached page is totally different. That's the same date I > started with a sitemap, so I've deleted it for now... coincidence > maybe, but it was crawling my site fine before without it.
> I've gone into webmaster tools. Google is accessing my robots.txt file > daily (200). I've got 5 reports of unreachable URLs : 1 is a 502 bad > gateway and the other 4 (one of which was July 8) are "network > unreachable". Never had those messages before, but the URLs are fine > now, so I'm assuming that's temporary.
> So, why does it access these yet not crawl the site?
> I've looked into it further, and although it started happening about a > week ago with new pages, I've now discovered google has dropped at > least a hundred or so pages that were previously indexed too - going > back months. All are in two directories http://www.clooneystudio.com/scans > andhttp://www.clooneystudio.com/articles2007. Both of these > directories are linked to from the main page (pagerank of 5).
Well, right off the bat I see that the NORMAL page header is missing on all of those pages, as in ...
Home | Search Clooney Studio | Calendar | Movies | News | Pictures | Articles | Scans | Video | Audio | Biography | Charities | Smoke House | Fan Encounters | Links | About / Disclaimer | F.A.Q | Contact
Most importantly there is no link to your home page!!!
> When I add a new page, a link goes on my main page for 3-4 days before > it moves to another page, which has always been plenty of time to > index it, as it seems to index daily or every second day at least.
That is a problem right there. Constantly changing the link structure is not a very good ideal. Further, if Google has stopped crawling your home page as often, it obviously wont see any new pages added there.
I don't care how long you have been using the above defective linking structure. Perhaps, Google has simply decided to crawl your site less often? Their algo does change constantly after all. And, your ever changing link structure is not helping any.