...Avoid infinite crawls. For instance, if your site has an infinite calendar, add a nofollow attribute to links to dynamically-created future calendar pages...
....Meta tags can exclude all outgoing links on a page, but you can also instruct Googlebot not to crawl individual links by adding rel="nofollow" to a hyperlink. When Google sees the attribute rel="nofollow" on hyperlinks, those links won't get any credit when we rank websites in our search results....
"instruct Googlebot not to crawl" and "those links won't get any credit when we rank websites" are two different terms.
Does Google crawl a rel="NOFOLLOW" tagged link and not give it credit, or does it just stop at the link and not visit that page unless it's found elsewhere? Having infinite looping pages like a calender not linked to from your own site with NOFOLLOW may stop Googlebot from crawling internally, but if someone were to link to that page from the outside Googlebot would get caught in the trap again. Shouldn't the META nofollow tag then be used to keep any/all good behaving bots off of a page you don't want crawled?
we lodged a petition on a UK gov site - ( for a legitimate complaint) - and that site gave our site a link with a rel=nofollow. In my links tab of sitemaps console i see that google has listed the link from the UK gov web-site. So from that i assume googlebot still follows the link but does not give credit for ranking purposes. The blog should simply state avoid linking to infinite loops altogether.
"Does Google crawl a rel="NOFOLLOW" tagged link and not give it credit, or does it just stop at the link and not visit that page unless it's found elsewhere?"
Unless linked elsewhere Google will not crawl a nofollow, I tested this with a link to webmaster central the other day.
> ...Avoid infinite crawls. For instance, if your site has an infinite > calendar, add a nofollow attribute to links to dynamically-created > future calendar pages...
> ....Meta tags can exclude all outgoing links on a page, but you can > also instruct Googlebot not to crawl individual links by adding > rel="nofollow" to a hyperlink. When Google sees the attribute > rel="nofollow" on hyperlinks, those links won't get any credit when we > rank websites in our search results....
> "instruct Googlebot not to crawl" and "those links won't get any > credit when we rank websites" are two different terms.
> Does Google crawl a rel="NOFOLLOW" tagged link and not give it credit, > or does it just stop at the link and not visit that page unless it's > found elsewhere? Having infinite looping pages like a calender not > linked to from your own site with NOFOLLOW may stop Googlebot from > crawling internally, but if someone were to link to that page from the > outside Googlebot would get caught in the trap again. Shouldn't the > META nofollow tag then be used to keep any/all good behaving bots off > of a page you don't want crawled?
I agree with you both. If a page only has a link to it that has nofollow it won't be indexed, however if any other link in the universe points to it there's a chance it will be indexed. And let's not even get into yahoo, I think they look for no follow links to follow!
In other words, by having internal nofollow links to pages you don't want indexed, you have not eliminated the potential only stopped it from getting indexed by your own links. Someone externally could still link to your calender in 2053 and start the whole mess up again.
Using <meta name="robots" content="noindex,nofollow"> is the only way to stop the bots even if there is an external link.
> "Does Google crawl a rel="NOFOLLOW" tagged link and not give it > credit, > or does it just stop at the link and not visit that page unless it's > found elsewhere?"
> Unless linked elsewhere Google will not crawl a nofollow, I tested > this with a link to webmaster central the other day.
> > ...Avoid infinite crawls. For instance, if your site has an infinite > > calendar, add a nofollow attribute to links to dynamically-created > > future calendar pages...
> > ....Meta tags can exclude all outgoing links on a page, but you can > > also instruct Googlebot not to crawl individual links by adding > > rel="nofollow" to a hyperlink. When Google sees the attribute > > rel="nofollow" on hyperlinks, those links won't get any credit when we > > rank websites in our search results....
> > "instruct Googlebot not to crawl" and "those links won't get any > > credit when we rank websites" are two different terms.
> > Does Google crawl a rel="NOFOLLOW" tagged link and not give it credit, > > or does it just stop at the link and not visit that page unless it's > > found elsewhere? Having infinite looping pages like a calender not > > linked to from your own site with NOFOLLOW may stop Googlebot from > > crawling internally, but if someone were to link to that page from the > > outside Googlebot would get caught in the trap again. Shouldn't the > > META nofollow tag then be used to keep any/all good behaving bots off > > of a page you don't want crawled?
It looks she was speaking of one particular instance where crawling/ indexing is done for *future* calendar dates when it really is not necessary. I asked myself why would anyone link to a future date anyway (from outside)? Then, I thought of a future date in my calendar that I would like to see published: an upcoming event.
So how does one go about preventing infinite future calendars being crawled/indexed while leaving future dates with planned events (content) unscath? I guess the answer is by creating a script that determines whether an event has been posted or not and then dynamically writing (or not) the urls with a nofollow... Too complex.
Not everyone that links to you is friendly either. If webmaster B sees that Webmaster A is trying to stop googlebot from getting caught in Webmasters A's perpetual calender by use of the rel="nofollow", perhaps webmaster B throws out a link to get the party rolling. A few weeks later webmaster A is on this forum saying, "I did everything Vanessa said, but Googlebot just keeps crawling my calender pages." I just think it should be clarified a bit, rel="nofollow" doesn't stop a page that's linked to from being followed it just stops that link from passing page rank.
Once again, whoever named rel="nofollow" in the first place should have their hands slapped.
> It looks she was speaking of one particular instance where crawling/ > indexing is done for *future* calendar dates when it really is not > necessary. I asked myself why would anyone link to a future date > anyway (from outside)? Then, I thought of a future date in my calendar > that I would like to see published: an upcoming event.
> So how does one go about preventing infinite future calendars being > crawled/indexed while leaving future dates with planned events > (content) unscath? I guess the answer is by creating a script that > determines whether an event has been posted or not and then > dynamically writing (or not) the urls with a nofollow... Too complex.
Most calendar scripts are deadly crawler traps. I seriously doubt that 99% of the webmasters out there can add rel=nofollow to appropriate places in calendar scripts -- and those that can, might as well just make them crawlable :-). The same goes for too many of the scripts used by so many sites (eg galleries and forums) .... why, oh why can't they just make things crawlable.... Some day I am going to rewrite them all just to get it done :-).
Using rel=nofollow to restrict crawling is however not really such a good plan (in my opinion): you never know which other links are pointing to an URL - from your site or from others. If just a single scraper takes your site and removes the nofollows or a single search engine ignores them, then the URL behind the link can still get indexed. It is much safer to restrict crawling with a robots.txt and a robots meta-tag.
The rel=nofollow was originally not meant to restrict crawling - it was just a signal showing "that another person placed a link on your site". Even if the Googlebot were to use it as such now, it might not in the future. It's just not a secure method of crawler control.
I had a calendar script running on a site previously that did cause the infinite loop condition. But the good news is that Google indexed about 400 pages of calendars (hehe), gave most of them pagerank (better still), which I then hooked back to the homepage.
And if you read this John, I was talking to both Matt Cutts and Vanessa today and mentioned you. They both spoke highly of you :)
> Most calendar scripts are deadly crawler traps. I seriously doubt that > 99% of the webmasters out there can add rel=nofollow to appropriate > places in calendar scripts -- and those that can, might as well just > make them crawlable :-). The same goes for too many of the scripts > used by so many sites (eg galleries and forums) .... why, oh why can't > they just make things crawlable.... Some day I am going to rewrite > them all just to get it done :-).
> Using rel=nofollow to restrict crawling is however not really such a > good plan (in my opinion): you never know which other links are > pointing to an URL - from your site or from others. If just a single > scraper takes your site and removes the nofollows or a single search > engine ignores them, then the URL behind the link can still get > indexed. It is much safer to restrict crawling with a robots.txt and a > robots meta-tag.
> The rel=nofollow was originally not meant to restrict crawling - it > was just a signal showing "that another person placed a link on your > site". Even if the Googlebot were to use it as such now, it might not > in the future. It's just not a secure method of crawler control.
> Most calendar scripts are deadly crawler traps. I seriously doubt that > 99% of the webmasters out there can add rel=nofollow to appropriate > places in calendar scripts -- and those that can, might as well just > make them crawlable :-). The same goes for too many of the scripts > used by so many sites (eg galleries and forums) .... why, oh why can't > they just make things crawlable.... Some day I am going to rewrite > them all just to get it done :-).
> Using rel=nofollow to restrict crawling is however not really such a > good plan (in my opinion): you never know which other links are > pointing to an URL - from your site or from others. If just a single > scraper takes your site and removes the nofollows or a single search > engine ignores them, then the URL behind the link can still get > indexed. It is much safer to restrict crawling with a robots.txt and a > robots meta-tag.
> The rel=nofollow was originally not meant to restrict crawling - it > was just a signal showing "that another person placed a link on your > site". Even if the Googlebot were to use it as such now, it might not > in the future. It's just not a secure method of crawler control.
> Does Google crawl a rel="NOFOLLOW" tagged link and not give it credit, > or does it just stop at the link and not visit that page unless it's > found elsewhere?
As Aaron correctly noted, the answer is the latter :)
I thought I read somewhere that rel="nofollow" is only used by Google admittedly. Whether other search engine robots use that and whether they use that in the same manner is not clear. So I take no chances.
I disallow things I can in robots.txt as well as in links to them with rel="nofollow"and noindex robots meta tag, whenever possible. I'm motivated less by the giving or not of credit than by the desire of not having those links crawled and those pages indexed in the first place.
> > Does Google crawl a rel="NOFOLLOW" tagged link and not give it credit, > > or does it just stop at the link and not visit that page unless it's > > found elsewhere?
> As Aaron correctly noted, the answer is the latter :)
Thanks Adam, that's good to know. Though i still wont' draw any conclusions based on any behavior by blogspot's linked here bahavior. While technically owned by google, its a pretty buggy system.
Webado does bring up a good point also, while Google may be the big dogs in search, they are not the only ones and you must consider all means of traffic when designing your site. Yahoo will follow and visit the targets of "nofollow" links, I've caught them many times in my own experiments. Google has not been caught in any of those bot traps so far.
> > Does Google crawl a rel="NOFOLLOW" tagged link and not give it credit, > > or does it just stop at the link and not visit that page unless it's > > found elsewhere?
> As Aaron correctly noted, the answer is the latter :)
Your example is actually not that straight forward -- the blogs track links through "trackbacks". When you blog about something and include a link, your blog software can send a trackback to the site that you linked to. This is something that most blog software does automatically, but you can override it, and perhaps if you use a rel=nofollow for the link, then your blog software might not do it for that link. When a trackback is done, your blog software uses a special URL to the blog software of the site that you're linking to. In other words, the other blog does not have to find the link, but rather the linking party notifies them instead.
I wouldn't use rel=nofollow as crawler directive meaning "don't fetch the destination". At least I wouldn't rely on it. Better code your calendar in a way that pointless links answer "not implemented" to search engine crawlers and display a warning to users as well, or just suppress pointless links when the user agent is a crawler. Rel=nofollow doesn't harm and in some cases it may work as desired (*when* the link is the only one on the Web and the crawler is Googlebot), but robots.txt (disallowing partial URLs or URL fragments), the robots meta tag, and search engine friendly cloaking are your real friends;) Sebastian
> Your example is actually not that straight forward -- the blogs track > links through "trackbacks". When you blog about something and include > a link, your blog software can send a trackback to the site that you > linked to. This is something that most blog software does > automatically, but you can override it, and perhaps if you use a > rel=nofollow for the link, then your blog software might not do it for > that link. When a trackback is done, your blog software uses a special > URL to the blog software of the site that you're linking to. In other > words, the other blog does not have to find the link, but rather the > linking party notifies them instead.
This is a bit different of what has been discussed but it also a question about the nofollow behaviour .
By using a nofollow link you don't pass weight on destination page, but are the links with nofollow counted in the total amount of links of the origin page?
So if one of the Pagerank factors is the number of links on the page the link is coming from... Would nofollow links help to pass better PR to the pages pointed by standard links?
For instance, I have pages with both photo and caption linked to the same article (same URL). I need to do that because the user might clic on the image or on the title to get the article but in fact I'm duplicating the links in the page. Would a no follow atribute on for instance the images, help?
> > Does Google crawl a rel="NOFOLLOW" tagged link and not give it credit, > > or does it just stop at the link and not visit that page unless it's > > found elsewhere?
> As Aaron correctly noted, the answer is the latter :)
> I mentioned to Matt Cutts that we all missed you :)
> On Feb 14, 2:47 am, Adam Lasnik wrote:
> > > Does Google crawl a rel="NOFOLLOW" tagged link and not give it credit, > > > or does it just stop at the link and not visit that page unless it's > > > found elsewhere?
> > As Aaron correctly noted, the answer is the latter :)