Fixing bad Camel-Case URLS

245 views
Skip to first unread message

Robert A. Rosenberg

unread,
Aug 23, 2009, 8:43:01 PM8/23/09
to bbe...@googlegroups.com
I have just inherited responsibility for maintaining a Windows/IIS
Site that is due to be moved to a Linux/Apache Server Environment.

I have run into the problem that the current Environment is
Case-Insensitive while the new one will be Case-Sensitive and while
all the target file names are pure lower-case, the URL HREF and SRC
tags are Mixed Case URLs (at least I do not have the problem of going
from Linux to Windows where I can run into multiple file names
mapping to the same Case-Insensitive string).

Are there any commands that I can run to lower case all the URLs in
bulk or am I going to have to find all the bad names and do
find/replace on them one at a time (I have already bulk fixed the
GIF, JPG, HTML, HTM, and CGI Suffixes but that leaves me with the
Mixed Case Directories and file prefixes to correct).

While I would rather a utility that compares the URL contents to the
target locations (and then corrects the HTML), I can live with just
going the "Make Lower Case" route.

If anyone can offer any advice, I will be welcomed.

Lewis@Gmail

unread,
Aug 24, 2009, 12:10:26 AM8/24/09
to bbe...@googlegroups.com
On Aug 23, 2009, at 18:43, "Robert A. Rosenberg" <rar...@banet.net>
wrote:

> Are there any commands that I can run to lower case all the URLs in
> bulk or am I going to have to find all the bad names and do
> find/replace on them one at a time

You can set apache to be case insensitive, as I recall. This might be
the best thing as you don't want to break external links to you.

Charlie Garrison

unread,
Aug 24, 2009, 3:40:28 AM8/24/09
to bbe...@googlegroups.com
Good afternoon,

On 23/08/09 at 10:10 PM -0600, Lewis@Gmail <gkr...@gmail.com> wrote:

>You can set apache to be case insensitive, as I recall. This
>might be the best thing as you don't want to break external
>links to you.

I wouldn't do that just for the external links; that's what
mod_rewrite is for. Requests with the wrong case can be fixed by
mod_rewrite and then you're not having to 'cripple' the whole
site for the sake of a few incorrect links.


Charlie

--
Charlie Garrison <garr...@zeta.org.au>
PO Box 141, Windsor, NSW 2756, Australia

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
http://www.ietf.org/rfc/rfc1855.txt

Patrick Woolsey

unread,
Aug 24, 2009, 8:28:30 AM8/24/09
to bbe...@googlegroups.com
"Robert A. Rosenberg" <rar...@banet.net> sez:
[...]

>I have run into the problem that the current Environment is
>Case-Insensitive while the new one will be Case-Sensitive and while
>all the target file names are pure lower-case, the URL HREF and SRC
>tags are Mixed Case URLs (at least I do not have the problem of going
>from Linux to Windows where I can run into multiple file names
>mapping to the same Case-Insensitive string).
>
>Are there any commands that I can run to lower case all the URLs in
>bulk or am I going to have to find all the bad names and do
>find/replace on them one at a time (I have already bulk fixed the
>GIF, JPG, HTML, HTM, and CGI Suffixes but that leaves me with the
>Mixed Case Directories and file prefixes to correct).


You can perform a search & replace with grep to locate these mixed-case
paths and transform them to lowercase.

The "Case Transformations" section in Chapter 8 of the PDF manual describes
the relevant modifiers and gives some examples; here's a basic cut at the
task:

search for: (href|src)="(.+?)"
replace with: \1="\L\2\E"

i.e. find any HREF or SRC attribute, collect its path, and transform the
latter by forcing it to lowercase.


Regards,

Patrick Woolsey
==
Bare Bones Software, Inc. <http://www.barebones.com>
P.O. Box 1048, Bedford, MA 01730-1048

Lewis@Gmail

unread,
Aug 24, 2009, 10:35:47 AM8/24/09
to bbe...@googlegroups.com
On 24-Aug-2009, at 01:40, Charlie Garrison wrote:
> On 23/08/09 at 10:10 PM -0600, Lewis@Gmail <gkr...@gmail.com> wrote:
> You can set apache to be case insensitive, as I recall. This
>> might be the best thing as you don't want to break external
>> links to you.
>
> I wouldn't do that just for the external links; that's what
> mod_rewrite is for. Requests with the wrong case can be fixed by
> mod_rewrite and then you're not having to 'cripple' the whole
> site for the sake of a few incorrect links.

Er... I don't think that is possible. As I recall, it took
manipulating the spelling module. Mod-rewrite cannot rewrite
arbitrarily long URLS to change the case as I recall.

…and I don't know about 'crippling'. I've never needed both an
index.html and an Index.html.

--
The only good thing ever to come out of religion was the music.

Charlie Garrison

unread,
Aug 24, 2009, 11:40:25 AM8/24/09
to bbe...@googlegroups.com
Good morning,

On 24/08/09 at 8:35 AM -0600, Lewis@Gmail <gkr...@gmail.com> wrote:

>On 24-Aug-2009, at 01:40, Charlie Garrison wrote:
>> On 23/08/09 at 10:10 PM -0600, Lewis@Gmail <gkr...@gmail.com> wrote:
>> You can set apache to be case insensitive, as I recall. This
>>> might be the best thing as you don't want to break external
>>> links to you.
>>
>> I wouldn't do that just for the external links; that's what
>> mod_rewrite is for. Requests with the wrong case can be fixed by
>> mod_rewrite and then you're not having to 'cripple' the whole
>> site for the sake of a few incorrect links.
>
>Er... I don't think that is possible. As I recall, it took
>manipulating the spelling module. Mod-rewrite cannot rewrite
>arbitrarily long URLS to change the case as I recall.

I'm not sure about the "arbitrarily long" part; I'm not aware of
length limitations with mod_rewrite. For the substitution I was
thinking about a RewriteMap; eg:

RewriteEngine on
## setup the mapping for CamelCase to lowercase for known file names
RewriteMap fix-case txt:/path/to/fix-case.txt
## only apply the RewriteRule if the requested file doesn't exist
RewriteCond /your/docroot/%{REQUEST_FILENAME} !-f
## apply the map we defined above, make the external redirect PERMANENT
RewriteRule ^/(.+) /${fix-case:$1} [R=permanent,L]


##
## fix-case.txt -- Fix CamelCasedFiles to be lowercase
##

FileOne.html fileone.html
FileTwo.html filetwo.html
DirOne/File.html dirone/file.html

##EOF##

Using the RewriteMap would also allow you to flexibly rename
files; eg. add an underscore in place of CamelCase. And if there
are LOTS of files, then RewriteMap can use a dbm rather than txt
file for improved performance (see the apache manual for mod_rewrite).

>…and I don't know about 'crippling'. I've never needed both
>an index.html and an Index.html.

I meant crippling in that the original issue hasn't really been
solved, and the problem is just perpetuated. And while I've not
used it myself, I've known sites that use first-letter caps for
directories and regular files are all lowercase; that sometimes
would result in files with the 'same' name.

For me, I just don't like the idea of a rare config option (I've
never even heard of the option to make apache case insensitive)
to fix something that can be fixed 'better' in other ways. I
call case-sensitive file systems a feature; telling apache to
ignore that is, to me, crippling it.

Robert A. Rosenberg

unread,
Aug 24, 2009, 3:46:45 PM8/24/09
to bbe...@googlegroups.com
At 22:10 -0600 on 08/23/2009, Lewis@Gmail wrote about Re: Fixing bad
Camel-Case URLS:


Is this a setting that I can make for MY site without affecting all
the other sites that my Hosting Service hosts? If so, what file do I
alter/create to make this setting?

Thank You.

Robert A. Rosenberg

unread,
Aug 24, 2009, 3:56:58 PM8/24/09
to bbe...@googlegroups.com
At 08:28 -0400 on 08/24/2009, Patrick Woolsey wrote about Re: Fixing
bad Camel-Case URLS:

>You can perform a search & replace with grep to locate these mixed-case


>paths and transform them to lowercase.
>
>The "Case Transformations" section in Chapter 8 of the PDF manual describes
>the relevant modifiers and gives some examples; here's a basic cut at the
>task:
>
>search for: (href|src)="(.+?)"
>replace with: \1="\L\2\E"
>
>i.e. find any HREF or SRC attribute, collect its path, and transform the
>latter by forcing it to lowercase.

Thanks. I will try this. Hopefully it will solve some of my problem
(which has gotten worse since there are also scrips for rollovers
that also have mixed case). I will attack the HREF/SRC first and then
see about the scripts.

Doug McNutt

unread,
Aug 24, 2009, 8:30:41 PM8/24/09
to bbe...@googlegroups.com
I'm not so sure it's a complete solution but you should remember that you can make links, hard variety to files and symbolic variety for directories.

man ln # for more on a Linux box.
--

--> From the U S of A, the only socialist country that refuses to admit it. <--

Reply all
Reply to author
Forward
0 new messages