Firstly, I am very let down that you can no longer contact Google for help. Even their contact page is only links to self help pages.
I am trying to remove pages and directories from Google's index, these pages and directories have been removed and return an 404 error. Google has denied my requests... why???
> Firstly, I am very let down that you can no longer contact Google for > help. Even their contact page is only links to self help pages.
> I am trying to remove pages and directories from Google's index, these > pages and directories have been removed and return an 404 error. > Google has denied my requests... why???
Actually, that is not an anomaly. My robots.txt file says that only the listed bots are okay and all others are not. And the list of "everybody and their uncle" are friendly (and very common) bots... there are other search engines than Google.
Back to my question: I have complied with Google's requirement by having one of the three listed options, mine being a 404 error and yet I am still denied. I do understand I can also control this with my robots.txt file and by using a meta tag, but I have too many to list. Any help would be appreciated.
> Your robots.txt file is a mess. > After allowing everybody and their uncle everywhere, and at the end > you have this:
> User-agent: * > Disallow: /
> So all robots are disallowed from the whole site.
> You had better fix that anomaly first because you whole site will end > up being dropped from the index, not just the folder you want to > remove.
> You could use this (get rid of all the other robots from robots.txt or > at least figure otu exactly what you want to do with them):
> User-agent: * > Disallow: /GemaFilter/
> This will disallow that folder - and then you can get it removed.
> On Sep 24, 7:32 pm, JAC wrote:
> > Firstly, I am very let down that you can no longer contact Google for > > help. Even their contact page is only links to self help pages.
> > I am trying to remove pages and directories from Google's index, these > > pages and directories have been removed and return an 404 error. > > Google has denied my requests... why???
Hi JAC, I cannot find URLs from www.gamersunderground.net/GameFilter/ in search results. Can you give the full URL of a Google cache for one of your URLs?
I can't find a cache either, I see the links in Google's Webmaster Tools within the Web crawl Not Found area... the problem is A: they don't ever disappear off that list and B: because there are thousands of them, every week or so more and more are added to the list. This is why I want to remove the entire directory.
> Hi JAC, > I cannot find URLs fromwww.gamersunderground.net/GameFilter/ > in search results. > Can you give the full URL of a Google cache for one of your URLs?
There is no referer in the list of not-found URLs in Google Webmaster Tools, so there is no way to know where these URLs are followed from by Googlebot, maybe from some out-of-date links (?)
If these URLs do not appear in search results then the removal tool does not apply to them, since it removes URLs from the search results.
I suggest you block these URLs in your robots.txt file, as Webado already wrote.
If you disallow these URLs in your robots.txt file then Googlebot will not follow them, so maybe in time it will stop looking for them.
> I can't find a cache either, I see the links in Google's Webmaster > Tools within the Web crawl Not Found area... the problem is A: they > don't ever disappear off that list and B: because there are thousands > of them, every week or so more and more are added to the list. This is > why I want to remove the entire directory.
> Thanks, > JAC
> On Sep 25, 8:19 am, cristina wrote:
> > Hi JAC, > > I cannot find URLs fromwww.gamersunderground.net/GameFilter/ > > in search results. > > Can you give the full URL of a Google cache for one of your URLs?
It looks like there are two separate issues going on here. First, webado is correct in saying that in order to request a removal of http://www.gamersunderground.net/GameFilter/ you would need to block /GameFilter/ using your robots.txt file. Check out this help topic for details: http://google.com/support/webmasters/bin/answer.py?answer=59819 In particular: "To remove a directory and its contents, you must ensure that the pages you want to remove have been blocked using a robots.txt file. Returning a 404 isn't enough, because it's possible for a directory to return a 404 status code, but still serve out files underneath it. Using robots.txt to block a directory ensures that all of its children are disallowed as well."
However, as Cristina points out, you don't seem to have any pages from http://www.gamersunderground.net/GameFilter/ currently indexed. The purpose of a URL removal request is to request that URLs get removed from our index; and since this directory is already not in our index, a URL removal request would have no effect.
Susan, as I stated to webado, who I only disagreed with her regarding the incorrect statement about my robots.txt file being a "MESS," (last time I'm stating this) I have too many to list in a robots.txt file... that is why I am here! The directory /GameFilter is not on my server (AT ALL), thus there are no files "underneath" it. I read the FAQ and I do feel much better now.
FOR THOSE SEEKING AN ANSWER:
In a very odd way the Crawl index is not a list of indexed pages, but rather a list of pages the googlebot tried to follow (likely from other sites linking to dead pages), but failed. With out Google providing the source of the link the list is mostly not useful, but could be informative to tell you which pages, which cannot be removed, could use a 301 redirect which I guess is the answer for me!!!
> It looks like there are two separate issues going on here. > First, webado is correct in saying that in order to request a removal > ofhttp://www.gamersunderground.net/GameFilter/you would need to > block > /GameFilter/ using your robots.txt file. Check out this help topic for > details:http://google.com/support/webmasters/bin/answer.py?answer=59819 > In particular: > "To remove a directory and its contents, you must ensure that the > pages you want to remove have been blocked using a robots.txt file. > Returning a 404 isn't enough, because it's possible for a directory to > return a 404 status code, but still serve out files underneath it. > Using robots.txt to block a directory ensures that all of its children > are disallowed as well."
> However, as Cristina points out, you don't seem to have any pages fromhttp://www.gamersunderground.net/GameFilter/currently indexed. The > purpose of a URL removal request is to request that URLs get removed > from our index; and since this directory is already not in our index, > a URL removal request would have no effect.
JAC, your robots.txt is still a tangled mess LOL Though maybe it might sort of work at the moment but only just. It's high maintenance I feel.
Rogue robots do not read and obey robots.txt in any case, so trying to disallow all but those you listed which you consider good robots, is the same thing as not disallowing any.
> Susan, as I stated to webado, who I only disagreed with her regarding > the incorrect statement about my robots.txt file being a "MESS," (last > time I'm stating this) I have too many to list in a robots.txt file... > that is why I am here! The directory /GameFilter is not on my server > (AT ALL), thus there are no files "underneath" it. I read the FAQ and > I do feel much better now.
> FOR THOSE SEEKING AN ANSWER:
> In a very odd way the Crawl index is not a list of indexed pages, but > rather a list of pages the googlebot tried to follow (likely from > other sites linking to dead pages), but failed. With out Google > providing the source of the link the list is mostly not useful, but > could be informative to tell you which pages, which cannot be removed, > could use a 301 redirect which I guess is the answer for me!!!
> Thanks.
> On Sep 25, 3:52 pm, Susan Moskwa wrote:
> > Hi JAC--
> > It looks like there are two separate issues going on here. > > First, webado is correct in saying that in order to request a removal > > ofhttp://www.gamersunderground.net/GameFilter/youwould need to > > block > > /GameFilter/ using your robots.txt file. Check out this help topic for > > details:http://google.com/support/webmasters/bin/answer.py?answer=59819 > > In particular: > > "To remove a directory and its contents, you must ensure that the > > pages you want to remove have been blocked using a robots.txt file. > > Returning a 404 isn't enough, because it's possible for a directory to > > return a 404 status code, but still serve out files underneath it. > > Using robots.txt to block a directory ensures that all of its children > > are disallowed as well."
> > However, as Cristina points out, you don't seem to have any pages fromhttp://www.gamersunderground.net/GameFilter/currentlyindexed. The > > purpose of a URL removal request is to request that URLs get removed > > from our index; and since this directory is already not in our index, > > a URL removal request would have no effect.
> In a very odd way the Crawl index is not a list of indexed pages, but > rather a list of pages the googlebot tried to follow (likely from > other sites linking to dead pages), but failed. With out Google > providing the source of the link the list is mostly not useful, but > could be informative to tell you which pages, which cannot be removed, > could use a 301 redirect which I guess is the answer for me!!!
This is not really odd at all. If it was a list of indexed pages it would probably be called "indexed pages", not "web crawl". To see pages that link to you go to Links > Pages that link to you - makes sense to me :)
The "web crawl" can be very useful to identify pages that you might have forgotten about and are still being followed so that you can take action (301/re-implement etc.). Also note that it appears that sometimes Googlebot follows imaginary links, so the links that you see do not necessarily ever have to have existed on your site or be linked to from anywhere. This is a recent thing though and most likely just a googlebot bug.
You mention having too many to list so just to avoid confusion I'll repeat what the others said. You do not have to add every single file to robots.txt, just a simple User-agent: * Disallow: /GameFilter/
will block the entire directory and also stop those "web crawl" errors from appearing (since then the googlebot won't try and follow those links anymore). If that's the errors you want to stop seeing, this is one way to go although I'd personally 301 any pages that are being linked to before doing this.
It can take months and months of returning a 404 before Google stops crawling pages without any links pointing to them that might once upon a time have had a link. I still have Google trying to find a couple of pages that were only linked to in that format for 24 hours last December. In that 24 hours, it managed to crawl and keep in memory all those links and it's been trying to find those pages ever since (admittedly, now it's down to the last 3 or 4, so looks like it's finally cleaning out that memory!).
> Susan, as I stated to webado, who I only disagreed with her regarding > the incorrect statement about my robots.txt file being a "MESS," (last > time I'm stating this) I have too many to list in a robots.txt file... > that is why I am here! The directory /GameFilter is not on my server > (AT ALL), thus there are no files "underneath" it. I read the FAQ and > I do feel much better now.
> FOR THOSE SEEKING AN ANSWER:
> In a very odd way the Crawl index is not a list of indexed pages, but > rather a list of pages the googlebot tried to follow (likely from > other sites linking to dead pages), but failed. With out Google > providing the source of the link the list is mostly not useful, but > could be informative to tell you which pages, which cannot be removed, > could use a 301 redirect which I guess is the answer for me!!!
> Thanks.
> On Sep 25, 3:52 pm, Susan Moskwa wrote:
> > Hi JAC--
> > It looks like there are two separate issues going on here. > > First, webado is correct in saying that in order to request a removal > > ofhttp://www.gamersunderground.net/GameFilter/youwould need to > > block > > /GameFilter/ using your robots.txt file. Check out this help topic for > > details:http://google.com/support/webmasters/bin/answer.py?answer=59819 > > In particular: > > "To remove a directory and its contents, you must ensure that the > > pages you want to remove have been blocked using a robots.txt file. > > Returning a 404 isn't enough, because it's possible for a directory to > > return a 404 status code, but still serve out files underneath it. > > Using robots.txt to block a directory ensures that all of its children > > are disallowed as well."
> > However, as Cristina points out, you don't seem to have any pages fromhttp://www.gamersunderground.net/GameFilter/currentlyindexed. The > > purpose of a URL removal request is to request that URLs get removed > > from our index; and since this directory is already not in our index, > > a URL removal request would have no effect.