Google Groups Home
Help | Sign in
Discussions > Crawling, indexing, and ranking > Dynamic urls growing like Topsy
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  7 messages - Collapse all
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
enoon  
View profile
 More options Sep 9 2007, 3:41 pm
From: enoon
Date: Sun, 09 Sep 2007 12:41:33 -0700
Local: Sun, Sep 9 2007 3:41 pm
Subject: Dynamic urls growing like Topsy
I have a website that up until a few months ago showed products on
static pages. Around June this year I added about dynamic asp pages
with fairly ordinary readable urls but because each page could be
sorted by price (high or low) and product it generated about 150
'difficult' urls with the usual '?' and '%20' in them. In addition
because for each product a pre-filled quote request form could be
generated the number of these urls multiplied.

Using GSite Crawler all these ages were in the sitemap.xml (it
excluded any duplicates that could occur by the same quote form url
being generated from different pages on the site).

A fairly large number of these urls seem to have been indexed - most
surprisingly a number of quote form urls which surprised me because
the tail end of it - for example - 'asp?lease-contract=120' will quite
likely change every time the product database changes.

None of this seemed to be presenting any problems. However, as the
database grows some individual pages have become too long. So it
occurred to me to use pagination to restrict pages to about 10
products sorted by one of 4 ways. I worked on the code for that,
including navigation and sorting and it works well but have not
implemented it yet.

The reason for this is that it will generate many more complex urls.
So it occurred to me to pre make asp pages with straightforward urls
where the number would be approximately proportionate to the size of
the database for each make and a small routine would work out the
pagination. It is difficult but possible to do.

If I succeed in this I have the problem of re-directing the old
'difficult' urls. There are so many of them I am not sure that it is
even possible.

There is always ISAPI but I don't have server access or the expertise
to use regular expressions.

Does anyone here REALLY know if just switching will do damage?

The url is http://www.newlease.co.uk/.


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
webado  
View profile
 More options Sep 10 2007, 10:19 am
From: webado
Date: Mon, 10 Sep 2007 14:19:49 -0000
Local: Mon, Sep 10 2007 10:19 am
Subject: Re: Dynamic urls growing like Topsy
It seems to me yuo basiclaly don't want to let url's containing query
strings like:

?sort-by=vehicle_leasing_offers.businessTerm

get indexed at all.

So one thing to do is only offer links to those url's in javascript so
robots don't pick them up in the first place.

Then I think you can disallow such url structures in robots.txt
(Google says it can understand wild cards in a Disallow directive).
Then you don't have to change your url structure.

User-agent: *
Disallow  /*?sort-by

But (in another lifetime)  I'd have gone further and created a folder
based url structure for each kind of sorting you offer so I can
disallow those scripts based on prefix.

For instance a url like:
http://www.newlease.co.uk/tipper-dropside-lease-business-contract-hir...

I'd turn it into something like:

http://www.newlease.co.uk/sort-by/vehicle_leasing_offers.businessTerm...

Then I can disallow in robots.txt all urls starting with /sort-by/
and be done.

Then what's left is to remove the query string from any of the old
url's so as to redirect to the url without any query string.
So that
http://www.newlease.co.uk/tipper-dropside-lease-business-contract-hir...
gets redirected to
http://www.newlease.co.uk/tipper-dropside-lease-business-contract-hir...

No idea if this is easier to implement at this point especialy.

On Sep 9, 3:41 pm, enoon wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Chris Gunn  
View profile
 More options Sep 10 2007, 11:51 am
From: Chris Gunn
Date: Mon, 10 Sep 2007 08:51:57 -0700
Local: Mon, Sep 10 2007 11:51 am
Subject: Re: Dynamic urls growing like Topsy
On Sep 9, 1:41 pm, enoon wrote:

> A fairly large number of these urls seem to have been indexed - most
> surprisingly a number of quote form urls which surprised me because
> the tail end of it - for example - 'asp?lease-contract=120' will quite
> likely change every time the product database changes.

Howdy,

Open the http://bizynet.biz shopping cart demonstration and use
View>Source to see how the Robots meta tag is used to avoid the
crawlers following anything other than the main content pages.  You
should never let the crawlers index search pages or pages with Order
buttons.

Google still has serious problems establishing pages that can have
different URL's due to tracking customers through a web site.  Because
their Gogglebot can't do cookies, ecommerce sites have to treat them
as a new customer every time they arrive.

They do eventually sort it out and don't duplicate too many pages.

Chris


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
webado  
View profile
 More options Sep 10 2007, 11:58 am
From: webado
Date: Mon, 10 Sep 2007 15:58:53 -0000
Local: Mon, Sep 10 2007 11:58 am
Subject: Re: Dynamic urls growing like Topsy
No robots do cookies, you should know that.

It's not Google's fault, it's the site not functioning logically.

On Sep 10, 11:51 am, Chris Gunn wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Chris Gunn  
View profile
 More options Sep 10 2007, 12:43 pm
From: Chris Gunn
Date: Mon, 10 Sep 2007 09:43:37 -0700
Local: Mon, Sep 10 2007 12:43 pm
Subject: Re: Dynamic urls growing like Topsy
On Sep 10, 9:58 am, webado wrote:

> No robots do cookies, you should know that.

Howdy Webado,

Considering the massive number of dynamic sites that put cookies to
effective use, it's overdue for the robots to get up-to-date.  Arpanet
faded away a long time ago.

> It's not Google's fault, it's the site not functioning logically.

There are number of things that can be done to help on that count.
However, if the site is doing a good job of getting Order buttons
clicked, they should not be penalized by inadaquate programming by
Google.  Not to mention, Google could reduce their storage
requirements significantly.

Chris


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
enoon  
View profile
 More options Sep 10 2007, 3:26 pm
From: enoon
Date: Mon, 10 Sep 2007 12:26:03 -0700
Local: Mon, Sep 10 2007 3:26 pm
Subject: Re: Dynamic urls growing like Topsy
Thanks both for your thoughts. Not sure I want to stop Google or any
other search engine from indexing dynamic generated pages. For
example, if I use pagination to make the pages more user friendly the
introductory text above the table will give a mention to a higher
percentage of the products and none of it will be duplication. Even
the sorted pages aren't duplicates because the intro text changes (it
grabs the first 6 products).

Is this robots text quite specific

User-agent: *
Disallow  /*?sort-by

in so far as it would still allow -
http://www.newlease.co.uk/audi-lease-business-contract-hire.asp?page-...

The folder based structure you mention Webado is like what I was
planning but I was concerned about the impact of Google not being able
to find the dynamic urls it has already indexed.

This bit -

'Then what's left is to remove the query string from any of the old
url's so as to redirect to the url without any query string.'

I have absolutely NO idea how to do that and get the feeling I have
missed a point. Nevertheless, what you seem to be saying is that if I
do the robots.txt, then folders it won't cause me problems so far as
SEs are concerned. It's the redirect bit that's worrying me. No access
to server.

On Sep 10, 5:43 pm, Chris Gunn wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
webado  
View profile
 More options Sep 10 2007, 5:09 pm
From: webado
Date: Mon, 10 Sep 2007 21:09:14 -0000
Local: Mon, Sep 10 2007 5:09 pm
Subject: Re: Dynamic urls growing like Topsy
Sorryl, I can't help at all with IIS and asp.

Looking at the indexed pages I think I only saw any with query strigns
for those situations - for sorting. So it makes sense to block them,
as they  seem to be irrelevant and there's a perfectly good url
without any query string giving the same information.

On Sep 10, 3:26 pm, enoon wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2008 Google