Quick mod_rewrite syntax question

0 views
Skip to first unread message

Handsome Prints

unread,
Sep 6, 2008, 11:04:37 PM9/6/08
to Only Validation + Navigation = Crawlability
Hi Webado,

I notice you add the string "HTTP/" add the end of the following line:
RewriteCond %{THE_REQUEST} ^.*\/index\.html\ HTTP/

in you example on fixing canonical issues. I was wondering what impact
that has? I don't always see that added in other examples.

Thank you in advance.

HP

webado

unread,
Sep 7, 2008, 2:56:56 AM9/7/08
to Only Validation + Navigation = Crawlability
Good question. I don't know, I found it that way somewhere and monkey
see, monkey do :)

I was happy enough that it worked, so mine's not to question ;)

Duncan Hill

unread,
Sep 7, 2008, 5:50:43 PM9/7/08
to only-va...@googlegroups.com
I don't know the specifics but it is used to ensure that various page
types are all fed to a common file.
The obvious is to feed a .html file to the .htm page from a typed or
linked address. It can also achieve the same for a .php page type.
On a site that might use a variety of pages, for example a mix of HTM and
PHP, it produces the correct redirect for the page.

I've probably explained that about as clear as mud! but you get the idea.

Duncan
--
Duncan Hill
(DHadmin)

webado

unread,
Sep 7, 2008, 9:42:34 PM9/7/08
to Only Validation + Navigation = Crawlability
Yep, mud's right LOL
> (DHadmin)- Hide quoted text -
>
> - Show quoted text -

Duncan Hill

unread,
Sep 8, 2008, 3:06:29 AM9/8/08
to only-va...@googlegroups.com
Clarify or confuse!

A webserver normally has a default file name and type, commonly index.htm
or index.html but just about anything else can be configured as well,
default.htm or home.htm for example.
The file type is given in the config for the server as accepting let's say
.htm .html .php, through the modrewrite and htaccess you can either add in
file extensions or ensure that they all reach a common file type and
display as such in the address bar.
More than anything I suspect it is there to make up for deficiencies in
configuration of the server, we can't all get in to deep enough a level to
set the base config so have to rely on htaccess type rules.
I think (but not sure) that it shows a 301 permanent redirect to search
engines as well which helps to keep your files correctly indexed.
I haven't seen the .html one used but I am just setting a similar one for
php files that are only partially used through a site.

Guess the mud is a bit deeper now, give me 5 minutes and I'll sink beneath
it altogether :D

Duncan
--
Duncan Hill
(DHadmin)

webado

unread,
Sep 8, 2008, 7:13:35 AM9/8/08
to Only Validation + Navigation = Crawlability
I have to talk in terms of an Apache server, since that's all I'm
familiar enough with.

Actually that is the default index page name which is going to be
automatically fetched by the server for http accesses, when you give
the root address with no actual file name, or the address of any
subfolder, again with no file name. The server will look for the file
with that name and serve its contents, if it finds it, or produce an
index listing unless it's suppressed in the server configuraton (the
httpd.conf file) or directives in an .htaccess in the root level or
lower in any folder along the path from the root down to the current
folder location.

And in fact it's usually a set of names which will be applied in the
order given, left to right, given in the httpd.conf file.

The server admin would have decide what the set of index page names
and extensions shouild be, a installation time. The priority is in the
order given, left to right.

The httpd.conf is very simliar to the .htacess file, and applies by
default to all accounts. Many of the defautl settings can be changed
through the .htccess file at the account level.

So let's say on your server the httpd.conf file sets this:


DirectoryIndex index.php index.html index.htm index.shtml


That means for a directory path, fidn the index.php file and display
that. If that file doesn't exist, get the index.html. if that doesn't
exist, get index.htm, and if that doesnt' exist either, get
index.shtml .

If none of those files exist in that directory, show the listing of
all files ... unless there was some other httpd.conf or .hatccess
directive that suppressed it by another directive.

You the user might decide that you don't like pages to be called
index.ext, and much prefer default.ext or perhaps home.ext .

Thats' when you override the server's default value with your
own .htaccess file placed in the root of your site:

DirectoryIndex default.php default.html default.htm default.shtml

It could be any sequence of possible file names and extensions as
well:

DirectoryIndex home.php home.html welcome.htm default.xyz
etc.

The .xyz extension would have to have been defined elsewhere as a
possible allowed MIME type and what it stands for.

Of course it's best to keep it short, focused, vanilla or it becomes
unmanageable.
You'd need to use DirectoryIndex directives if your site got
transferred from a different serevr where defaults were diffrent from
the current one, and you don't want to rename your homepage, for
instance.


Most users don't have access to the httpd.conf file, certainly not in
a shared hosting environment, which is the prevalent kind of hosting.
But they can use the .htaccess file to override many of the default
server settings. Not all or it can spell disaster very quickly.


Setting the DirectoryIndex directive deos not result in any
redirection. It's a static directive. It's just a set of options.

The mod_rewrite module serves to change the url as accessed by http
requests to a possibly different url, either internal or external,
unconditionally or conditionally.
When it's used for modifying the external url, then a redireciton
takes place, and the type of redirection is either the uusal default
(302) or whatever code is specified e.g 301).
When its' used to modify the internal url, here we get into voodoo.
From the outside there's no detectable response code other than 200
(OK), no outwardly indication that any url change has taken place - no
redirection that a browser or robot will know about. But the server
quietly will crunch away and serve the content of the file or script
at the internal url and keep the external url the same as what was
entered.

So yes, the mud maybe will clarify here. For internal url rewriting
we won't use the HTTP/ at the end of the RewriteCond line.

Duncan Hill

unread,
Sep 8, 2008, 8:00:03 AM9/8/08
to only-va...@googlegroups.com
That reply is about to be stored, thanks for digging so much deeper than I
ever did.

I know where to come when (not IF) I get stuck LOL

Thanks Webado, I haven't taken it all in yet but I'll work at it. :)

Duncan
--
Duncan Hill
(DHadmin)
Reply all
Reply to author
Forward
0 new messages