Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
Discussions > Crawling, indexing, and ranking > Using Robots.txt to block "infinite spaces" good? bad?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  10 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
SmokeDog  
View profile  
 More options Oct 29 2008, 1:50 pm
From: SmokeDog
Date: Wed, 29 Oct 2008 10:50:16 -0700 (PDT)
Local: Wed, Oct 29 2008 1:50 pm
Subject: Using Robots.txt to block "infinite spaces" good? bad?
Hey everyone...

Our site uses query string variables in our URL's to track traffic and
internal vist patterns. As an example our email offers our external
ads will use a URL similar to: www.website.com/shop/scarfs.asp?cpn=bbl2345

And then upon entering the website they might click on internal links
like:
www.website.com/shop/scarfs.asp?sc=2332

Those Variables have the capability to combine:
www.website.com/shop/scarfs.asp?cpn=bbl2345&sc=2332

Over time this has created millions of variables in Google's index. Of
course this is undesirable but Google has been reluctant in giving us
a straight answer. I received a webmaster tools email that said we can
use our robots.txt file to filter out all the variable URLs by doing:

User-Agent: Googlebot
Disallow: *CPN=*
Disallow: *SC=*

So the questions are, does this work?
Will it have any negative effects?
Will it kill some of my link popularity?
Will I lose any rankings?

In Yahoo's dynamic parameters tool they simply redirect the pages to
the original so no link popularity is lost and rankings actually got
better for us while killing millions of erronious results. It's a
shame that Google doesn't have something like this as we are pretty
leary about killing pages without understanding what might happen.

Thanks for your replies!

JJ


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
luzie  
View profile  
 More options Oct 29 2008, 2:09 pm
From: luzie
Date: Wed, 29 Oct 2008 11:09:44 -0700 (PDT)
Local: Wed, Oct 29 2008 2:09 pm
Subject: Re: Using Robots.txt to block "infinite spaces" good? bad?
Hello JJ,

>>> I received a webmaster tools email that said we can
>>> use our robots.txt file to filter out all the variable URLs by doing:
>>> User-Agent: Googlebot
>>> Disallow: *CPN=*
>>> Disallow: *SC=*

This is more than I would ever have expected to hear from Google ...

>>> So the questions are, does this work?

It will.

>>> Will it have any negative effects?

No. It will have the positive effect of avoiding a negative one in the
future.

>>> Will it kill some of my link popularity?

If you think of "link-popularity" in terms of the amount of internal
(labyrinthic that may be) links, yes. In terms of real link-popularity
from outside of course not (hoping you don't have inbound links
pointing to all the 'parameterized' pages).

>>> Will I lose any rankings?

Can't predict that ... but in the end it will help save your rankings.

-luzie-


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
SmokeDog  
View profile  
 More options Oct 29 2008, 4:16 pm
From: SmokeDog
Date: Wed, 29 Oct 2008 13:16:53 -0700 (PDT)
Local: Wed, Oct 29 2008 4:16 pm
Subject: Re: Using Robots.txt to block "infinite spaces" good? bad?
Thanks for the answer Luzie,

There are two major problems that we see...

The general public is linking to our site using the variables like the
coupon sites and other crap sites trying to redirect traffic. This is
one of the reasons our CPN variable pages rank.

While we do appreciate the links from these sites , the unfortunate
part is that there are thousands of these links out thre and wiping
them out (I think) could hurt our rankings.

Secondly Googlebot goes through the site like a person and picks up
our SC codes. It gets confused and winds up ranking the pages with our
SC codes because the internal linking.

Of course my dream would be to take these codes out but our VP's are
unwilling to create an entirely new traffic monitoring system.

JJ


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Autocrat  
View profile  
 More options Oct 29 2008, 5:06 pm
From: Autocrat
Date: Wed, 29 Oct 2008 14:06:04 -0700 (PDT)
Local: Wed, Oct 29 2008 5:06 pm
Subject: Re: Using Robots.txt to block "infinite spaces" good? bad?
Well ... so long as you are not altering the content...
then maybe (please get this verified) ... you could detect the
useragent and remove the string?

That way, if it's a GBot (or a Ybot, LiveBot etc.) you could possible
301 redirect to the 'main url' for that page, without the silly string
inclusion?

BEWARNED - that may count as manipulation and could cause problems.
Again, please get it verified first.

On Oct 29, 9:16 pm, SmokeDog wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
luzie  
View profile  
 More options Oct 29 2008, 5:15 pm
From: luzie
Date: Wed, 29 Oct 2008 14:15:53 -0700 (PDT)
Local: Wed, Oct 29 2008 5:15 pm
Subject: Re: Using Robots.txt to block "infinite spaces" good? bad?

>>> you could detect the useragent and remove the string?

gosh ... this is cloaking ... :-/
(i.e.: it _could_ be)

>>> Secondly Googlebot goes through the site like a person and picks up
>>> our SC codes. It gets confused and winds up ranking the pages with our
>>> SC codes because the internal linking.

And that's exactly why you have to get rid of all these weird
parameter-addresses soon. If they send you a kind warning detailing
what to do ... you better do it. There must be a way to redirect links
coming in to 'now unwanted addresses'?

-luzie-


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
webado  
View profile  
 More options Oct 29 2008, 5:19 pm
From: webado
Date: Wed, 29 Oct 2008 14:19:41 -0700 (PDT)
Local: Wed, Oct 29 2008 5:19 pm
Subject: Re: Using Robots.txt to block "infinite spaces" good? bad?
I would introduce script to test for the query string and presence and
order of parameters and if needed 301 redirect to the a url with teh
query strign built in the correct form.

I wouldn't rely on robots.txt for getting thign straightened out.

On 29 oct, 17:06, Autocrat wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
luzie  
View profile  
 More options Oct 29 2008, 5:24 pm
From: luzie
Date: Wed, 29 Oct 2008 14:24:05 -0700 (PDT)
Local: Wed, Oct 29 2008 5:24 pm
Subject: Re: Using Robots.txt to block "infinite spaces" good? bad?
@JJ:

"infinite spaces" ...

Is that something that you said yourself - or is it in any way
something THEY said?

-luzie-


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile  
 More options Oct 29 2008, 6:59 pm
From: JohnMu
Date: Wed, 29 Oct 2008 15:59:35 -0700 (PDT)
Local: Wed, Oct 29 2008 6:59 pm
Subject: Re: Using Robots.txt to block "infinite spaces" good? bad?
Hi everyone

Here's what I would do:

Assuming you want to keep track of those numbers, move them to a
cookie and out of the URL. If you can do that, you could 301 redirect
from the tagged URL (with the numbers) to a clean URL while setting a
cookie on the user's side. In other words, everyone is redirected to
the clean URLs and users can still be tracked appropriately. Of
course, this involves changing a bit on the server side -- and
depending on how much time you have it might be hard to get done
anytime soon...

The quick and easy alternative is to use the robots.txt disallows you
mentioned in your original post. That will help you to clean things up
a bit and it will certainly help us with crawling your site better (we
won't waste time on all of these duplicates). It might be a bit sub-
optimal with regards to links because your users will still end up
linking to the URL they see (which would be the one with additional
parameters), but it's an easy way to get things improved.

Personally, if the website is important to you, I'd bite the bullet
and move to cookies.

Hope it helps!
John


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
SmokeDog  
View profile  
 More options Oct 31 2008, 11:01 am
From: SmokeDog
Date: Fri, 31 Oct 2008 08:01:21 -0700 (PDT)
Local: Fri, Oct 31 2008 11:01 am
Subject: Re: Using Robots.txt to block "infinite spaces" good? bad?
Thanks John,

It helps to get a answer from the higher power....

I think this might be the news that everyone will love to hate.
Hopefully they go for it.

JJ


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
SmokeDog  
View profile  
 More options Nov 5 2008, 2:15 pm
From: SmokeDog
Date: Wed, 5 Nov 2008 11:15:50 -0800 (PST)
Local: Wed, Nov 5 2008 2:15 pm
Subject: Re: Using Robots.txt to block "infinite spaces" good? bad?
John,

The company actually wants to move forward with the redirect/cookie
process! That's the good news.

The bad news is that they don't know how to do it exactly. Is there a
spot on the web that explains the technical process? Can you explain
it?

Their questions were:

Does it need to be a 301?

Is it a URL rewrite or forwarding?

I can't thank you enough for the answers!

JJ


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »