Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Discussions > Crawling, indexing, and ranking > Indexing of Search Results
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  12 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Philipp Lenssen  
View profile  
 More options Feb 18 2007, 6:54 am
From: Philipp Lenssen
Date: Sun, 18 Feb 2007 11:54:21 -0000
Local: Sun, Feb 18 2007 6:54 am
Subject: Indexing of Search Results
Is there any official Google statement regarding that search result on
one's own site ought to be disallowed from indexing (e.g. via
robots.txt)?

It's a trick many spammers use, after all, and sometimes, it happens
inadvertently. And it creates a lot of redundancy in search results.
Sites like YouTube do it to:
http://www.google.com/search?hl=en&safe=off&q=site%3Awww.youtube.com%...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
VanessaFox Google employee  
View profile  
 More options Feb 19 2007, 9:24 am
From: VanessaFox
Date: Mon, 19 Feb 2007 14:24:40 -0000
Local: Mon, Feb 19 2007 9:24 am
Subject: Re: Indexing of Search Results
Typically, web search results don't add value to users, and since our
core goal is to provide the best search results possible, we generally
exclude search results from our web search index. (Not all URLs that
contains things like "/results" or "/search" are search results, of
course.)

I'll take a look at the YouTube example. Thanks.

On Feb 18, 3:54 am, Philipp Lenssen wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
rumblepup  
View profile  
 More options Mar 12 2007, 5:43 pm
From: rumblepup
Date: Mon, 12 Mar 2007 21:43:44 -0000
Local: Mon, Mar 12 2007 5:43 pm
Subject: Re: Indexing of Search Results
Vanessa,
I'm curious because this little message sparked a couple of blog
posts, and I need a little clarification.
For instance, my site is a dynamic e-commerce site that uses search
results to display dynamic content.  After reading Matt Cutts, it
seems that this is out of favor.  Can you clarify a little more.  Is
my search result page in danger?

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sebastian  
View profile  
 More options Mar 12 2007, 6:21 pm
From: Sebastian
Date: Mon, 12 Mar 2007 22:21:26 -0000
Local: Mon, Mar 12 2007 6:21 pm
Subject: Re: Indexing of Search Results
Probably. Too many ecommerce sites have used their search facilities
to "produce" relevant content.
http://searchengineland.com/070312-104201.php
http://www.seroundtable.com/archives/012671.html
http://webmastershelp.iblogget.com/2007/03/12/vanishing-ecommerce-sites/
http://sebastianx.blogspot.com/2007/03/why-ecommerce-systems-suck.html
...
Consider disallowing your search result pages in robots.txt. Curious
on Vanessa's answer though.
Sebastian

On Mar 12, 10:43 pm, rumblepup wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
rumblepup  
View profile  
 More options Mar 13 2007, 9:33 am
From: rumblepup
Date: Tue, 13 Mar 2007 13:33:01 -0000
Local: Tues, Mar 13 2007 9:33 am
Subject: Re: Indexing of Search Results
Sebastian

Thank you for your input.  My problem with this scenario is that in a
typical e-commerce application, no matter what the application, php,
asp, cfm, or asp.net, a website is linking to queries against a
database.  And this is not just e-commerce, but any dynamic site.  You
create a link to a category of content, which is really a link to a
search result.  In other words, ALL LINKS are search results, filtered
to a particular category.

Now, maybe what Vanessa and Matt and others are referring to are sites
using their own user generated search results to create links to user
search result pages.  In all honesty, I don't see this as harmful, in
the sense that a website is creating relevant content to a search
term, both on the site and as a indexable search result.  What about
tagging?  Let's say I create a way for my customers to "tag" products
for terms they think are relevant.  Now, in technical terms, I'm
creating a "search result" for tags that my customers have generated,
and thus the user is actually creating new content that is relevant to
a particular term.  There is no way I can make a product detail page
or a category search result page relevant to every single word that
someone "might" use to define that set of results, but my customers
can, and I don't understand why it's wrong to serve up those "tag"
results to any of the SE's as relative content for a particular term.

Here's my for instance.

Let's say I'm a online shoe sales site, and I have a bunch of shoes
that fall into a set or categories and grouping, i.e., Dress Shoes,
Dining, Tennis Shoes, Sneakers, etc.  I can't optimize for every
single term that someone might use to search for specific products or
product categories, otherwise I'd be keyword stuffing my pages, and
they'd look dumb as well.  But now, let's say I have a Black leather
shoe with brass buttons and leather engraving, and my customer tags
this shoe as "cool rock and roll shoes," something that I can't
optimize for or something I didn't think to optimize for.  Now, let's
say I have a few shoes that users would tag as "cool rock and roll
shoes" and I can't optimize them all for the same thing, because that
would be duplicating content, but I do serve up my tags as links, so
that my site can address people searching for "cool rock and roll
shoes."  Now I have a page that users think is relevant to a search
term that might not be very competitive, but is used maybe 1000 times
a month on Google.   What am I doing wrong by doing that?  What if
with that search result I get a first page SERP?  I'm not trying to
spam, my content, I think, is bonafide, and the Google algo thinks
that the page I'm serving up is relevant.

Doesn't the algo make this decision?

Now back to my original query.  Now in the Google Webmaster
Guidelines, Vanessa has added "Use robots.txt to prevent crawling of
search results pages or other auto-generated pages that don't add much
value for users coming from search engines."  How does my dynamic
content NOT give value to customers, or the tagged pages that
customers have created, NOT give value.

If I'm wrong in my assumptions, and Google is fine with my
navigational search results, I apologize, but the statement in the
guidelines is way too broad, I think.  If you search for "22" couch
cushion" and Google SERP's come up with a page from a site that is a
SEARCH for the same thing, I'm treated to a page full of 22" couch
cushions, just what I was looking for.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sebastian  
View profile  
 More options Mar 13 2007, 12:53 pm
From: Sebastian
Date: Tue, 13 Mar 2007 16:53:59 -0000
Local: Tues, Mar 13 2007 12:53 pm
Subject: Re: Indexing of Search Results
Ahhh ... I see a great debate coming :)

Lets concentrate on the binding statement in the webmaster guidelines:
http://www.google.com/support/webmasters/bin/answer.py?answer=35769
"Use robots.txt to prevent crawling of search results pages or other
auto-generated pages that don't add much value for users coming from
search engines"
(robots meta tags and nofollow rel values possibly in combination with
robots.txt will do the trick too, telling the crawler not to index
particular content is important)

I don't read this as "prevent Googlebot from crawling dynamic content
when this content contains links".

Lets define "search results pages" as a result set matching a query
submitted in a search box. Using GET instead of POST makes the result
sets linkable spider fodder, and there are many other ways to feed
crawlers with SERPs.

The new rule covers scrapers, MFA sites, every META SE out there and
site internal search facilities to some extent as well, but in no way
directories, tagged links lists or editorial use of a search script to
produce a list of links to related products or a list of products with
similar properties or usages.

Well, there's a fine and vague defined borderline ("don't add much
value for users"), so lets just say that a script looping a complete
shop iterating all possible keyword combos to output these as GET
links to the search facility on a ton of links pages created for SE
crawlers not users would be abuse.

In between these extremes I'd say that common sense is a good enough
criteria to judge whether editorial or navigational links to
predefined search results make sense for users or not. It's Google's
job to drill their algos, I just hope that'll work with as less
collateral damage as possible.

Definining "auto-generated pages that don't add much value for users"
is done here:
http://www.google.com/support/webmasters/bin/answer.py?answer=35291&q...
"Another illicit practice is to place 'doorway' pages loaded with
keywords on the client's site somewhere." My abuse example above falls
under this definition.
http://www.google.com/support/webmasters/bin/answer.py?answer=35769&q...
"Avoid 'doorway' pages created just for search engines."
http://www.google.com/support/webmasters/bin/answer.py?answer=40349&q...
"Keep in mind that our algorithms can distinguish natural links from
unnatural links ... Only natural links are useful for the indexing and
ranking of your site." I do know that Google can spot artificial
internal linkage, for example an unnatural high number of links to
thin product pages or systematic link patterns involving machine
generated hallways, doorways and similar attempts.

My take is that "don't add much value for users coming from search
engines" is the core message. Again that's a question of judging
intent, not a positive or negative statement with regard to particular
techniques. The whole Wikipedia is autogenerated, and fully indexed.
Google doesn't care how (in the sense of which technology gets used)
contents get presented to the searcher, Google cares about valuable
content presentations without machine generated noise (e.g. doorways
and unnecessary duplication) generated for machines (crawlers).
There's no need to explain the technical aspects more clearly,
probably that's impossible at all. Technology is not the issue. One
could produce doorway spam with vi.

I'd say that your list of rock 'n roll shoes stays "legit". Delicious/
digg/... and other sites won't suffer less or more from the new policy
(which is not that new BTW) than your user tagged shoes. They offer
their auto generated links lists and feeds ordered by tags and users,
you offer your autogenerated links lists ordered by rock 'n roll and
black leather. A searcher seeking rock 'n roll shoes should find your
list of shoes with thumbs, price, description and a link to the
product page on Google's SERPs. As long as you don't overload your
site with static GET links to search scripts outputting shoes ordered
by SKUs alphabetical or by reversed size which both make no sense for
users, you're safe. If you think particular SERPs could be seen as
noise, just insert a "noindex,follow" robots meta tag. Approving
stored SERPs for these purposes should be a suitable procedure.

Sebastian

On Mar 13, 2:33 pm, rumblepup wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JLH  
View profile  
 More options Mar 13 2007, 1:09 pm
From: JLH
Date: Tue, 13 Mar 2007 10:09:27 -0700
Local: Tues, Mar 13 2007 1:09 pm
Subject: Re: Indexing of Search Results
Here's an example of said autogenerated search results, except that
each search result resulted in a new subdomain, which spawned another
subdomain, etc.

http://forums.digitalpoint.com/showthread.php?t=97090

The net effect was an infinitely large website built with one page on
each subdomain.

I'd imagine that's what they are shooting after a site that's in
effect a crawler trap generating an infinite amount of pages based on
crawling.

On Mar 13, 11:53 am, Sebastian wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JLH  
View profile  
 More options Mar 13 2007, 1:40 pm
From: JLH
Date: Tue, 13 Mar 2007 10:40:13 -0700
Local: Tues, Mar 13 2007 1:40 pm
Subject: Re: Indexing of Search Results
Being paranoid of course I'm robots.txt'ing out search results even
though with a brief look none are indexed.
<<looks over shoulder>>

On Mar 13, 12:09 pm, JLH wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
rumblepup  
View profile  
 More options Mar 13 2007, 5:58 pm
From: rumblepup
Date: Tue, 13 Mar 2007 21:58:57 -0000
Local: Tues, Mar 13 2007 5:58 pm
Subject: Re: Indexing of Search Results
"I just hope that'll work with as less collateral damage as possible."

That is my hope exactly.

In this particular case, I agree with your assessment, and not arguing
the case.  Just got scared out of my mind, because we, and other
sites, depend on this so much.

However, let me make another case here for s's and g's.

For some of these SERP's that have "search result" pages listed, I
think those pages ARE relevant and helpful.  I mean, for my
theoretical shoes site, let's say I'm tracking the most popular terms
a particular product comes under, and create a link to those search
results.  Now, I have a page that is a search result, but still highly
relevant, because it's all about items that directly relate to the
term.

For instance, searching for "fender precision bass pickguard" gives me
this result

http://www.google.com/search?sourceid=navclient&ie=UTF-8&rlz=1T4GFRC_...

which is a SEARCH RESULT from the Fender website.

(upon further looking, this might be a page that's in danger, because
fender uses search queries as navigation)

And, it's exactly what I was looking for.  How is that not relevant,
or helpful?  I liked that result.  In fact, if I had gone to the
website and searched for that term, I would have gotten the same
results.  Why not allow it as a Google search result?

I completely agree that FLOODING the SERP's with thin result sets
might not be the best thing in the world, but I don't see the harm in
creating a link to a popular on site search term, that gives a result
set directly related to the term, that would be a great resource for
the searcher.

"Here's an example of said autogenerated search results, except that
each search result resulted in a new subdomain, which spawned another
subdomain, etc. "

I know and loathe that particular spammer, and have been in the debate
of how to work something like that out.  (By the way, that spammer is
back with a new spamming technique)
http://rumblepup.blogspot.com/2006/12/infamous-spammer-using-blogger-...

So it's a tossup.

I agree with both of your assesments, but it's always...scary, when an
algo change occurs.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sebastian  
View profile  
 More options Mar 13 2007, 6:50 pm
From: Sebastian
Date: Tue, 13 Mar 2007 22:50:17 -0000
Local: Tues, Mar 13 2007 6:50 pm
Subject: Re: Indexing of Search Results
Thanks for the heads up, I'm sure Google's Web spam team will take
care of this nasty stuff asap.

As for your other example, I guess aggregating atomized information in
the right context can make sense on the SERPs, well it indeed makes
sense. Sure Google can gather the information itself, but searchers
looking for the aggregated view don't get the expected result with
simple queries. Probably that's a case we'll discuss when Google has
invented more technologies to scan the searchers brain for intents of
search queries. Where's the point for Google to disallow such
"helpers" whilst Google Web search is not yet able to fulfill the
searchers expectation? Nothing's set in stone ;)

Algo changes *are* scary from a site owner's perspective, but somewhat
predictable. Often I tell clients redesign this and revamp that, get
rid of the shortcuts coz that's not going to work forever ... guess
what happens? It works today and nailed on particular issues I refuse
to bode how long it can slip through the filters. That's human nature.

Sebastian

On Mar 13, 10:58 pm, rumblepup wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
rumblepup  
View profile  
 More options Mar 13 2007, 7:46 pm
From: rumblepup
Date: Tue, 13 Mar 2007 23:46:37 -0000
Local: Tues, Mar 13 2007 7:46 pm
Subject: Re: Indexing of Search Results
"As for your other example, I guess aggregating atomized information
in
the right context can make sense on the SERPs, well it indeed makes
sense. "

I think so too.  And I think the technological giant that is Google
might come up with a neat filter for the abusers, but I see value in
an aggregated SERP site search...thing (I just don't know what to call
it anymore), because it directly responds to a users search request.

Let's go back to my example.  Say a web searcher does a  search on
Google for "cool rock and roll shoes" which is an opinion, not a
fact.  It's hard to categorize products or content for that, but I'm
sure there are sites that might (did a search on it for c's a g's,
nothing even like my response) and my user's "tagged" an item that
way, but for whatever reason, say for programming acumen, I can't
supply a list of tagged results, but I can supply a searchresults.aspx?
tags=cool+rock+and+roll+shoes, which I'd like to, because here is all
my shoes that USER's think are cool rock and roll shoes.  It seams
this is in danger, when all I'm trying to do is give you a page all
about rock and roll shoes, i.e., relative content.

Very interested on how this plays out.  Thanks for the discussion.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sebastian  
View profile  
 More options Mar 13 2007, 8:08 pm
From: Sebastian
Date: Wed, 14 Mar 2007 00:08:48 -0000
Local: Tues, Mar 13 2007 8:08 pm
Subject: Re: Indexing of Search Results
Think of applying properties and opinions to bare facts, what is
considered "adding value for users coming from SERPs". You're still
safe IMO. I'd add "rock 'n roll approved" to the product descriptions
though and avoid terms like "cool" and "best of breed" in the query
string of preserved SERPs ;)  Apply those attributes via anchor text,
preferably found in external links.
Sebastian

On Mar 14, 12:46 am, rumblepup wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »