Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
Message from discussion Indexing of Search Results
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
JLH  
View profile  
 More options Mar 13 2007, 1:40 pm
From: JLH
Date: Tue, 13 Mar 2007 10:40:13 -0700
Local: Tues, Mar 13 2007 1:40 pm
Subject: Re: Indexing of Search Results
Being paranoid of course I'm robots.txt'ing out search results even
though with a brief look none are indexed.
<<looks over shoulder>>

On Mar 13, 12:09 pm, JLH wrote:

> Here's an example of said autogenerated search results, except that
> each search result resulted in a new subdomain, which spawned another
> subdomain, etc.

> http://forums.digitalpoint.com/showthread.php?t=97090

> The net effect was an infinitely large website built with one page on
> each subdomain.

> I'd imagine that's what they are shooting after a site that's in
> effect a crawler trap generating an infinite amount of pages based on
> crawling.

> On Mar 13, 11:53 am, Sebastian wrote:

> > Ahhh ... I see a great debate coming :)

> > Lets concentrate on the binding statement in the webmaster guidelines:http://www.google.com/support/webmasters/bin/answer.py?answer=35769
> > "Use robots.txt to prevent crawling of search results pages or other
> > auto-generated pages that don't add much value for users coming from
> > search engines"
> > (robots meta tags and nofollow rel values possibly in combination with
> > robots.txt will do the trick too, telling the crawler not to index
> > particular content is important)

> > I don't read this as "prevent Googlebot from crawling dynamic content
> > when this content contains links".

> > Lets define "search results pages" as a result set matching a query
> > submitted in a search box. Using GET instead of POST makes the result
> > sets linkable spider fodder, and there are many other ways to feed
> > crawlers with SERPs.

> > The new rule covers scrapers, MFA sites, every META SE out there and
> > site internal search facilities to some extent as well, but in no way
> > directories, tagged links lists or editorial use of a search script to
> > produce a list of links to related products or a list of products with
> > similar properties or usages.

> > Well, there's a fine and vague defined borderline ("don't add much
> > value for users"), so lets just say that a script looping a complete
> > shop iterating all possible keyword combos to output these as GET
> > links to the search facility on a ton of links pages created for SE
> > crawlers not users would be abuse.

> > In between these extremes I'd say that common sense is a good enough
> > criteria to judge whether editorial or navigational links to
> > predefined search results make sense for users or not. It's Google's
> > job to drill their algos, I just hope that'll work with as less
> > collateral damage as possible.

> > Definining "auto-generated pages that don't add much value for users"
> > is done here:http://www.google.com/support/webmasters/bin/answer.py?answer=35291&q...
> > "Another illicit practice is to place 'doorway' pages loaded with
> > keywords on the client's site somewhere." My abuse example above falls
> > under this definition.http://www.google.com/support/webmasters/bin/answer.py?answer=35769&q...
> > "Avoid 'doorway' pages created just for search engines."http://www.google.com/support/webmasters/bin/answer.py?answer=40349&q...
> > "Keep in mind that our algorithms can distinguish natural links from
> > unnatural links ... Only natural links are useful for the indexing and
> > ranking of your site." I do know that Google can spot artificial
> > internal linkage, for example an unnatural high number of links to
> > thin product pages or systematic link patterns involving machine
> > generated hallways, doorways and similar attempts.

> > My take is that "don't add much value for users coming from search
> > engines" is the core message. Again that's a question of judging
> > intent, not a positive or negative statement with regard to particular
> > techniques. The whole Wikipedia is autogenerated, and fully indexed.
> > Google doesn't care how (in the sense of which technology gets used)
> > contents get presented to the searcher, Google cares about valuable
> > content presentations without machine generated noise (e.g. doorways
> > and unnecessary duplication) generated for machines (crawlers).
> > There's no need to explain the technical aspects more clearly,
> > probably that's impossible at all. Technology is not the issue. One
> > could produce doorway spam with vi.

> > I'd say that your list of rock 'n roll shoes stays "legit". Delicious/
> > digg/... and other sites won't suffer less or more from the new policy
> > (which is not that new BTW) than your user tagged shoes. They offer
> > their auto generated links lists and feeds ordered by tags and users,
> > you offer your autogenerated links lists ordered by rock 'n roll and
> > black leather. A searcher seeking rock 'n roll shoes should find your
> > list of shoes with thumbs, price, description and a link to the
> > product page on Google's SERPs. As long as you don't overload your
> > site with static GET links to search scripts outputting shoes ordered
> > by SKUs alphabetical or by reversed size which both make no sense for
> > users, you're safe. If you think particular SERPs could be seen as
> > noise, just insert a "noindex,follow" robots meta tag. Approving
> > stored SERPs for these purposes should be a suitable procedure.

> > Sebastian

> > On Mar 13, 2:33 pm, rumblepup wrote:

> > > Sebastian

> > > Thank you for your input.  My problem with this scenario is that in a
> > > typical e-commerce application, no matter what the application, php,
> > > asp, cfm, or asp.net, a website is linking to queries against a
> > > database.  And this is not just e-commerce, but any dynamic site.  You
> > > create a link to a category of content, which is really a link to a
> > > search result.  In other words, ALL LINKS are search results, filtered
> > > to a particular category.

> > > Now, maybe what Vanessa and Matt and others are referring to are sites
> > > using their own user generated search results to create links to user
> > > search result pages.  In all honesty, I don't see this as harmful, in
> > > the sense that a website is creating relevant content to a search
> > > term, both on the site and as a indexable search result.  What about
> > > tagging?  Let's say I create a way for my customers to "tag" products
> > > for terms they think are relevant.  Now, in technical terms, I'm
> > > creating a "search result" for tags that my customers have generated,
> > > and thus the user is actually creating new content that is relevant to
> > > a particular term.  There is no way I can make a product detail page
> > > or a category search result page relevant to every single word that
> > > someone "might" use to define that set of results, but my customers
> > > can, and I don't understand why it's wrong to serve up those "tag"
> > > results to any of the SE's as relative content for a particular term.

> > > Here's my for instance.

> > > Let's say I'm a online shoe sales site, and I have a bunch of shoes
> > > that fall into a set or categories and grouping, i.e., Dress Shoes,
> > > Dining, Tennis Shoes, Sneakers, etc.  I can't optimize for every
> > > single term that someone might use to search for specific products or
> > > product categories, otherwise I'd be keyword stuffing my pages, and
> > > they'd look dumb as well.  But now, let's say I have a Black leather
> > > shoe with brass buttons and leather engraving, and my customer tags
> > > this shoe as "cool rock and roll shoes," something that I can't
> > > optimize for or something I didn't think to optimize for.  Now, let's
> > > say I have a few shoes that users would tag as "cool rock and roll
> > > shoes" and I can't optimize them all for the same thing, because that
> > > would be duplicating content, but I do serve up my tags as links, so
> > > that my site can address people searching for "cool rock and roll
> > > shoes."  Now I have a page that users think is relevant to a search
> > > term that might not be very competitive, but is used maybe 1000 times
> > > a month on Google.   What am I doing wrong by doing that?  What if
> > > with that search result I get a first page SERP?  I'm not trying to
> > > spam, my content, I think, is bonafide, and the Google algo thinks
> > > that the page I'm serving up is relevant.

> > > Doesn't the algo make this decision?

> > > Now back to my original query.  Now in the Google Webmaster
> > > Guidelines, Vanessa has added "Use robots.txt to prevent crawling of
> > > search results pages or other auto-generated pages that don't add much
> > > value for users coming from search engines."  How does my dynamic
> > > content NOT give value to customers, or the tagged pages that
> > > customers have created, NOT give value.

> > > If I'm wrong in my assumptions, and Google is fine with my
> > > navigational search results, I apologize, but the statement in the
> > > guidelines is way too broad, I think.  If you search for "22" couch
> > > cushion" and Google SERP's come up with a page from a site that is a
> > > SEARCH for the same thing, I'm treated to a page full of 22" couch
> > > cushions, just what I was looking for.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.