subdomain, etc.
> Ahhh ... I see a great debate coming :)
> Lets concentrate on the binding statement in the webmaster guidelines:http://www.google.com/support/webmasters/bin/answer.py?answer=35769
> "Use robots.txt to prevent crawling of search results pages or other
> auto-generated pages that don't add much value for users coming from
> search engines"
> (robots meta tags and nofollow rel values possibly in combination with
> robots.txt will do the trick too, telling the crawler not to index
> particular content is important)
> I don't read this as "prevent Googlebot from crawling dynamic content
> when this content contains links".
> Lets define "search results pages" as a result set matching a query
> submitted in a search box. Using GET instead of POST makes the result
> sets linkable spider fodder, and there are many other ways to feed
> crawlers with SERPs.
> The new rule covers scrapers, MFA sites, every META SE out there and
> site internal search facilities to some extent as well, but in no way
> directories, tagged links lists or editorial use of a search script to
> produce a list of links to related products or a list of products with
> similar properties or usages.
> Well, there's a fine and vague defined borderline ("don't add much
> value for users"), so lets just say that a script looping a complete
> shop iterating all possible keyword combos to output these as GET
> links to the search facility on a ton of links pages created for SE
> crawlers not users would be abuse.
> In between these extremes I'd say that common sense is a good enough
> criteria to judge whether editorial or navigational links to
> predefined search results make sense for users or not. It's Google's
> job to drill their algos, I just hope that'll work with as less
> collateral damage as possible.
> Definining "auto-generated pages that don't add much value for users"
> is done here:http://www.google.com/support/webmasters/bin/answer.py?answer=35291&q...
> "Another illicit practice is to place 'doorway' pages loaded with
> keywords on the client's site somewhere." My abuse example above falls
> under this definition.http://www.google.com/support/webmasters/bin/answer.py?answer=35769&q...
> "Avoid 'doorway' pages created just for search engines."http://www.google.com/support/webmasters/bin/answer.py?answer=40349&q...
> "Keep in mind that our algorithms can distinguish natural links from
> unnatural links ... Only natural links are useful for the indexing and
> ranking of your site." I do know that Google can spot artificial
> internal linkage, for example an unnatural high number of links to
> thin product pages or systematic link patterns involving machine
> generated hallways, doorways and similar attempts.
> My take is that "don't add much value for users coming from search
> engines" is the core message. Again that's a question of judging
> intent, not a positive or negative statement with regard to particular
> techniques. The whole Wikipedia is autogenerated, and fully indexed.
> Google doesn't care how (in the sense of which technology gets used)
> contents get presented to the searcher, Google cares about valuable
> content presentations without machine generated noise (e.g. doorways
> and unnecessary duplication) generated for machines (crawlers).
> There's no need to explain the technical aspects more clearly,
> probably that's impossible at all. Technology is not the issue. One
> could produce doorway spam with vi.
> I'd say that your list of rock 'n roll shoes stays "legit". Delicious/
> digg/... and other sites won't suffer less or more from the new policy
> (which is not that new BTW) than your user tagged shoes. They offer
> their auto generated links lists and feeds ordered by tags and users,
> you offer your autogenerated links lists ordered by rock 'n roll and
> black leather. A searcher seeking rock 'n roll shoes should find your
> list of shoes with thumbs, price, description and a link to the
> product page on Google's SERPs. As long as you don't overload your
> site with static GET links to search scripts outputting shoes ordered
> by SKUs alphabetical or by reversed size which both make no sense for
> users, you're safe. If you think particular SERPs could be seen as
> noise, just insert a "noindex,follow" robots meta tag. Approving
> stored SERPs for these purposes should be a suitable procedure.
> Sebastian
> On Mar 13, 2:33 pm, rumblepup wrote:
> > Sebastian
> > Thank you for your input. My problem with this scenario is that in a
> > typical e-commerce application, no matter what the application, php,
> > asp, cfm, or asp.net, a website is linking to queries against a
> > database. And this is not just e-commerce, but any dynamic site. You
> > create a link to a category of content, which is really a link to a
> > search result. In other words, ALL LINKS are search results, filtered
> > to a particular category.
> > Now, maybe what Vanessa and Matt and others are referring to are sites
> > using their own user generated search results to create links to user
> > search result pages. In all honesty, I don't see this as harmful, in
> > the sense that a website is creating relevant content to a search
> > term, both on the site and as a indexable search result. What about
> > tagging? Let's say I create a way for my customers to "tag" products
> > for terms they think are relevant. Now, in technical terms, I'm
> > creating a "search result" for tags that my customers have generated,
> > and thus the user is actually creating new content that is relevant to
> > a particular term. There is no way I can make a product detail page
> > or a category search result page relevant to every single word that
> > someone "might" use to define that set of results, but my customers
> > can, and I don't understand why it's wrong to serve up those "tag"
> > results to any of the SE's as relative content for a particular term.
> > Here's my for instance.
> > Let's say I'm a online shoe sales site, and I have a bunch of shoes
> > that fall into a set or categories and grouping, i.e., Dress Shoes,
> > Dining, Tennis Shoes, Sneakers, etc. I can't optimize for every
> > single term that someone might use to search for specific products or
> > product categories, otherwise I'd be keyword stuffing my pages, and
> > they'd look dumb as well. But now, let's say I have a Black leather
> > shoe with brass buttons and leather engraving, and my customer tags
> > this shoe as "cool rock and roll shoes," something that I can't
> > optimize for or something I didn't think to optimize for. Now, let's
> > say I have a few shoes that users would tag as "cool rock and roll
> > shoes" and I can't optimize them all for the same thing, because that
> > would be duplicating content, but I do serve up my tags as links, so
> > that my site can address people searching for "cool rock and roll
> > shoes." Now I have a page that users think is relevant to a search
> > term that might not be very competitive, but is used maybe 1000 times
> > a month on Google. What am I doing wrong by doing that? What if
> > with that search result I get a first page SERP? I'm not trying to
> > spam, my content, I think, is bonafide, and the Google algo thinks
> > that the page I'm serving up is relevant.
> > Doesn't the algo make this decision?
> > Now back to my original query. Now in the Google Webmaster
> > Guidelines, Vanessa has added "Use robots.txt to prevent crawling of
> > search results pages or other auto-generated pages that don't add much
> > value for users coming from search engines." How does my dynamic
> > content NOT give value to customers, or the tagged pages that
> > customers have created, NOT give value.
> > If I'm wrong in my assumptions, and Google is fine with my
> > navigational search results, I apologize, but the statement in the
> > guidelines is way too broad, I think. If you search for "22" couch
> > cushion" and Google SERP's come up with a page from a site that is a
> > SEARCH for the same thing, I'm treated to a page full of 22" couch
> > cushions, just what I was looking for.