Our site uses query string variables in our URL's to track traffic and
internal vist patterns. As an example our email offers our external
ads will use a URL similar to: www.website.com/shop/scarfs.asp?cpn=bbl2345
Over time this has created millions of variables in Google's index. Of
course this is undesirable but Google has been reluctant in giving us
a straight answer. I received a webmaster tools email that said we can
use our robots.txt file to filter out all the variable URLs by doing:
So the questions are, does this work?
Will it have any negative effects?
Will it kill some of my link popularity?
Will I lose any rankings?
In Yahoo's dynamic parameters tool they simply redirect the pages to
the original so no link popularity is lost and rankings actually got
better for us while killing millions of erronious results. It's a
shame that Google doesn't have something like this as we are pretty
leary about killing pages without understanding what might happen.
>>> I received a webmaster tools email that said we can
>>> use our robots.txt file to filter out all the variable URLs by doing:
>>> User-Agent: Googlebot
>>> Disallow: *CPN=*
>>> Disallow: *SC=*
This is more than I would ever have expected to hear from Google ...
>>> So the questions are, does this work?
It will.
>>> Will it have any negative effects?
No. It will have the positive effect of avoiding a negative one in the
future.
>>> Will it kill some of my link popularity?
If you think of "link-popularity" in terms of the amount of internal
(labyrinthic that may be) links, yes. In terms of real link-popularity
from outside of course not (hoping you don't have inbound links
pointing to all the 'parameterized' pages).
>>> Will I lose any rankings?
Can't predict that ... but in the end it will help save your rankings.
The general public is linking to our site using the variables like the
coupon sites and other crap sites trying to redirect traffic. This is
one of the reasons our CPN variable pages rank.
While we do appreciate the links from these sites , the unfortunate
part is that there are thousands of these links out thre and wiping
them out (I think) could hurt our rankings.
Secondly Googlebot goes through the site like a person and picks up
our SC codes. It gets confused and winds up ranking the pages with our
SC codes because the internal linking.
Of course my dream would be to take these codes out but our VP's are
unwilling to create an entirely new traffic monitoring system.
Well ... so long as you are not altering the content...
then maybe (please get this verified) ... you could detect the
useragent and remove the string?
That way, if it's a GBot (or a Ybot, LiveBot etc.) you could possible
301 redirect to the 'main url' for that page, without the silly string
inclusion?
BEWARNED - that may count as manipulation and could cause problems.
Again, please get it verified first.
> The general public is linking to our site using the variables like the
> coupon sites and other crap sites trying to redirect traffic. This is
> one of the reasons our CPN variable pages rank.
> While we do appreciate the links from these sites , the unfortunate
> part is that there are thousands of these links out thre and wiping
> them out (I think) could hurt our rankings.
> Secondly Googlebot goes through the site like a person and picks up
> our SC codes. It gets confused and winds up ranking the pages with our
> SC codes because the internal linking.
> Of course my dream would be to take these codes out but our VP's are
> unwilling to create an entirely new traffic monitoring system.
>>> you could detect the useragent and remove the string?
gosh ... this is cloaking ... :-/
(i.e.: it _could_ be)
>>> Secondly Googlebot goes through the site like a person and picks up
>>> our SC codes. It gets confused and winds up ranking the pages with our
>>> SC codes because the internal linking.
And that's exactly why you have to get rid of all these weird
parameter-addresses soon. If they send you a kind warning detailing
what to do ... you better do it. There must be a way to redirect links
coming in to 'now unwanted addresses'?
I would introduce script to test for the query string and presence and
order of parameters and if needed 301 redirect to the a url with teh
query strign built in the correct form.
I wouldn't rely on robots.txt for getting thign straightened out.
> Well ... so long as you are not altering the content...
> then maybe (please get this verified) ... you could detect the
> useragent and remove the string?
> That way, if it's a GBot (or a Ybot, LiveBot etc.) you could possible
> 301 redirect to the 'main url' for that page, without the silly string
> inclusion?
> BEWARNED - that may count as manipulation and could cause problems.
> Again, please get it verified first.
> On Oct 29, 9:16 pm, SmokeDog wrote:
> > Thanks for the answer Luzie,
> > There are two major problems that we see...
> > The general public is linking to our site using the variables like the
> > coupon sites and other crap sites trying to redirect traffic. This is
> > one of the reasons our CPN variable pages rank.
> > While we do appreciate the links from these sites , the unfortunate
> > part is that there are thousands of these links out thre and wiping
> > them out (I think) could hurt our rankings.
> > Secondly Googlebot goes through the site like a person and picks up
> > our SC codes. It gets confused and winds up ranking the pages with our
> > SC codes because the internal linking.
> > Of course my dream would be to take these codes out but our VP's are
> > unwilling to create an entirely new traffic monitoring system.
> > JJ- Masquer le texte des messages précédents -
Assuming you want to keep track of those numbers, move them to a
cookie and out of the URL. If you can do that, you could 301 redirect
from the tagged URL (with the numbers) to a clean URL while setting a
cookie on the user's side. In other words, everyone is redirected to
the clean URLs and users can still be tracked appropriately. Of
course, this involves changing a bit on the server side -- and
depending on how much time you have it might be hard to get done
anytime soon...
The quick and easy alternative is to use the robots.txt disallows you
mentioned in your original post. That will help you to clean things up
a bit and it will certainly help us with crawling your site better (we
won't waste time on all of these duplicates). It might be a bit sub-
optimal with regards to links because your users will still end up
linking to the URL they see (which would be the one with additional
parameters), but it's an easy way to get things improved.
Personally, if the website is important to you, I'd bite the bullet
and move to cookies.