Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
Discussions > Crawling, indexing, and ranking > Upper Case and Lower Case = Duplicate Content?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  9 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Jay Is The Boss  
View profile  
 More options Jul 18 2008, 2:35 am
From: Jay Is The Boss
Date: Thu, 17 Jul 2008 23:35:43 -0700 (PDT)
Local: Fri, Jul 18 2008 2:35 am
Subject: Upper Case and Lower Case = Duplicate Content?
Hi everyone,

I am concerned about being penalized (or having page rank slip) due to
possible duplicate content when it comes to upper case and lower case
versions of a URL.

what I mean is I have gone into Web master tools and gone to:

Diagnostics -> Content Analysis -> Duplicate meta descriptions

And I noticed that google had one URL listed twice such as:

‎/page/siam1/PROD/Hindu-Statues/hindu-statues-lst18

and

‎/page/siam1/PROD/hindu-statues/hindu-statues-lst18

In the first one, Hindu-Statues is capitalized, while in the second,
it is lower case.

My site is a shopping cart (Miva Merchant), and even though the url
SHOULD be capitalized, the all lower case will resolve to the same
page.

In fact, you could use any mixture of lower or upper case letters and
it will still resolve to the same page.

Is this something to worry about?

In case a complete URL would help, here is one:

http://www.siamese-dream.com/page/siam1/PROD/Hindu-Statues/hindu-stat...

Thanks in advance for any suggestions (even if they don't pertain to
this particular point).

Mark


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Autocrat  
View profile  
 More options Jul 18 2008, 5:08 am
From: Autocrat
Date: Fri, 18 Jul 2008 02:08:18 -0700 (PDT)
Local: Fri, Jul 18 2008 5:08 am
Subject: Re: Upper Case and Lower Case = Duplicate Content?
In many cases things like URLs are not case-sensitive.
so
thatsite.com / HereIsAnItem
and
thatsite.com / hereisanITEM
Are the same page (In come cases... on some servers, it is possible to
have them as different pages... but URLs are best thought of as case
insensitive).

.

Solutions....
If you are using an Apache server, I believe you can use a tiny bit of
code in the .htacess file.
This will take any URL request and redirect it to a lower-case only
version.
That way any request ends up on the lowercase page...
:D

On Jul 18, 7:35 am, Jay Is The Boss wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
webado  
View profile  
 More options Jul 18 2008, 6:42 am
From: webado
Date: Fri, 18 Jul 2008 03:42:58 -0700 (PDT)
Local: Fri, Jul 18 2008 6:42 am
Subject: Re: Upper Case and Lower Case = Duplicate Content?
Microsoft IIS is usually case insensitive, which is why those urls all
return a 200 (success).

Apache is usually set up as case sensitive, so those urls are
considered different. Google is case sensitive.

If the server is case insensitive, then you have duplication, because
those urls all work and Google assumes them to be different urls
serving the same content.

The best method to build a site is to always use lower case in all
urls. As Autocrat said, if you can set up the server to 301 redirect
urls to their lower case version, you will at least eliminate
duplication due to that.

Not sure if it can be done through server settings on an IIS server.
Might need a script added to each page that compares the uri used to
the lowercase transformation of the same uri and if not equal then it
performs a 301 redirection. Probably difficult to do when using canned
software like Miva.

But luckily your server is Apache, so the .htaccess solution should
work. Probably however it's set up to be case insensitive, so the
first thing to do is get rid of this. After that add the redirection
directives, lest you get 404's.

I found these .htaccess directives, but I have not tested this:

~~~~~~~~~~~~~~~~~~~~

# Skip this entire section if no uppercase letters in requested URL
RewriteRule ![A-Z] - [S=28]
# Else rewrite one of each uppercase letter to lowercase
RewriteRule ^([^A]*)A(.*)$ /$1a$2
RewriteRule ^([^B]*)B(.*)$ /$1b$2
RewriteRule ^([^C]*)C(.*)$ /$1c$2
...
...
RewriteRule ^([^Z]*)Z(.*)$ /$1z$2
# If more uppercase letters remain, re-invoke .htaccess and start
over
RewriteRule [A-Z] - [N]
# Else do a 301 redirect to the all-lowercase URL
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

~~~~~~~~~~~~~~~~~

The lines shown as ... need to contain directives for the missing
letters.

It's bound to be a long process so it's best to hurry up and  clean up
the site navigation  to avoid this lengthy transfromation, and to
eliminate all the redirections found during navigation due to the case
adjustment

On Jul 18, 5:08 am, Autocrat wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jay Is The Boss  
View profile  
 More options Jul 18 2008, 12:47 pm
From: Jay Is The Boss
Date: Fri, 18 Jul 2008 09:47:04 -0700 (PDT)
Local: Fri, Jul 18 2008 12:47 pm
Subject: Re: Upper Case and Lower Case = Duplicate Content?
Thank you, both Autocrat and webado.

I am on an apache server, so I will see about doing the 301 redirect
(will probably need my hosting company's help).

Webado; You said;

> It's bound to be a long process so it's best to hurry up and  clean up
> the site navigation  to avoid this lengthy transfromation, and to
> eliminate all the redirections found during navigation due to the case
> adjustment

By that, I assume you mean I should change my items (and category
names) so they are all lower case letters first and then do the 301
redirect in the .htaccess files, right?

Or at least, you mean go ahead and change any upper case letters down
to lower case, right?

Also, you said:

> Google is case sensitive...

Does that mean if I change my URLs to lower case, then any results in
the google search results WON'T resolve until the next crawl? (Or do
you just mean that google things they are two different urls?)

Thanks again,

Mark

On Jul 18, 3:42 am, webado wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Autocrat  
View profile  
 More options Jul 18 2008, 12:58 pm
From: Autocrat
Date: Fri, 18 Jul 2008 09:58:14 -0700 (PDT)
Local: Fri, Jul 18 2008 12:58 pm
Subject: Re: Upper Case and Lower Case = Duplicate Content?
Thats about right.

Any links on your site (including in a sitemap) should go to the right
place, first time... no redirect etc.

The redirects are there to pick up 'old' problems.
You should not have any links/navigation that require redirection.

On Jul 18, 5:47 pm, Jay Is The Boss wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jay Is The Boss  
View profile  
 More options Jul 18 2008, 3:40 pm
From: Jay Is The Boss
Date: Fri, 18 Jul 2008 12:40:33 -0700 (PDT)
Local: Fri, Jul 18 2008 3:40 pm
Subject: Re: Upper Case and Lower Case = Duplicate Content?
Thank you again, autocrat

On Jul 18, 9:58 am, Autocrat wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
augean  
View profile  
 More options Aug 25 2008, 6:29 pm
From: augean
Date: Mon, 25 Aug 2008 15:29:55 -0700 (PDT)
Local: Mon, Aug 25 2008 6:29 pm
Subject: Re: Upper Case and Lower Case = Duplicate Content?
"Google is case sensitive."

I have notoced that and unfortunately Google is case sensitive. But I
don't understand why Google should be case sensitive, when it returns
the same results for search stings "abc" and "ABC".
Being case sesitive, has kept my site in low rank. I am using IIS
server and it returns the same content for /page123.asp and /
Page123.asp
It does not make sence to me why google sees them as different pages ?
Don't you see the same page when you type GOOGLE.COM and google.com ?

In my opinion this is a bug not an advantage of Google.

Thanks,
~augean

On Jul 18, 3:42 am, webado wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JohnMu Google employee  
View profile  
 More options Aug 25 2008, 7:35 pm
From: JohnMu
Date: Mon, 25 Aug 2008 16:35:54 -0700 (PDT)
Local: Mon, Aug 25 2008 7:35 pm
Subject: Re: Upper Case and Lower Case = Duplicate Content?
Hi Augean and welcome to the groups!

In the URL, the path, file name and query string are by definition
case-sensitive. In this case, Google is just following the standards
as defined in the "RFCs" 1738 and 1808. The host name, as you noticed,
is not case-sensitive, neither is the choice of protocol (http:// is
the same as HTTP://).

So technically speaking, the following URLs are all identical:
http://domain.com/path/file.htm
HTTP://domain.com/path/file.htm
http://DOMAIN.com/path/file.htm

However, these are not:
http://domain.com/PATH/file.htm (a different path)
http://domain.com/path/FILE.htm (a different file name)
http://www.domain.com/path/file.htm (a different host name)

IIS does treat things a bit differently, so it's important that you
make sure that all of your internal links point to the same version of
the URL, otherwise search engines (including Google) may recognize two
(or more) distinct URLs. Once we crawl them, we'll notice that they're
identical, but until then we may treat them separately.

Hope it helps!
John


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
augean  
View profile  
 More options Aug 25 2008, 8:25 pm
From: augean
Date: Mon, 25 Aug 2008 17:25:50 -0700 (PDT)
Local: Mon, Aug 25 2008 8:25 pm
Subject: Re: Upper Case and Lower Case = Duplicate Content?
Hello John and thank you for the prompt reply,

I had reviewd these documents few years ago and checked them again,
but I still cannot find any where it mention being case sensitive
path ?
It does talk about base being case in-sensitive, as you mentioned, but
I didn't find about the path.

I know it comes down from unix base OS systems, but on the net it
doesn't make sence to have case sesitivity on the path. (my opinion)

Thanks and reagards,
~augean

On Aug 25, 4:35 pm, JohnMu wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »