Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

where to store web files in a dir tree

3 views
Skip to first unread message

Helmut Richter

unread,
Jan 2, 2023, 9:35:23 AM1/2/23
to
The following question pertains to a web server where there is an underlying
file system that is used in a way that path elements in the URL designate
directories and files in that file system. I am aware that in many CMSs this
is not so but this is not the context of my question.

On my website, I have the habit to avoid file extensions in URLs where the
user of the page need not know them. For instance, it makes no difference
whether the file is .html, .php, or anything else delivering HTML code. Also,
I do not mark .txt files as such so that I can convert them to them to .html
without changing the URL. It is enough that the web server deliver all
content with the correct content type in the headers.

Now I have two ways to store web contents in the file system:

classical: the URL <domain>/a/b/c is served from file <doc-root>/a/b/c/index.ext
which a suitable file extension (as a means to determine the content type),
typically .html

abbreviated: the URL <domain>/a/b/c is served from file <doc-root>/a/b/c.ext
provided that there is no directory /a/b/c

The abbreviated method saves quite some directories with no other use than
holding one single file each, all of them with the same file name. It may
sometimes be error-prone when you copy the file index.html from one directory
to another where you may confuse which of the hundreds of files with that
name it is.

The classical method allows to treat each directory the same, no matter how
many files of what file type it contains. This kind of internal consistency
may well compensate the aforementioned inconveniences.

Are there any other reasons to prefer one of the methods over the other?

--
Helmut Richter

Arno Welzel

unread,
Jan 5, 2023, 12:18:07 PM1/5/23
to
Helmut Richter, 2023-01-02 15:35:

[...]
> Now I have two ways to store web contents in the file system:
>
> classical: the URL <domain>/a/b/c is served from file <doc-root>/a/b/c/index.ext
> which a suitable file extension (as a means to determine the content type),
> typically .html
>
> abbreviated: the URL <domain>/a/b/c is served from file <doc-root>/a/b/c.ext
> provided that there is no directory /a/b/c
[...]
> Are there any other reasons to prefer one of the methods over the other?

I would stick with <doc-root>/a/b/c/index.ext and not allow the
alternative <doc-root>/a/b/c.ext for the same URL to avoid confusion
what <doc-root>/a/b/c means in your filesystem.

--
Arno Welzel
https://arnowelzel.de

Helmut Richter

unread,
Jan 7, 2023, 11:23:28 AM1/7/23
to
Thank you. After doing some testing with different configurations, I come to
the same result: this feature is really confusing.

E.g.: if <doc-root>/a/b/c.html is called by the path /a/b/c, relative URLs
(href="xyz") are relative to the fictitious directory /a/b/c. If, however,
<doc-root>/a/b/c.html is called as /a/b/c.html, relative URLs are, of course,
relative to the real parent directory /a/b. Same file contents, same file
name, same place in the dir tree, different semantics. Moreover, you can even
call the same file as /a/b/c/index.html even though /a/b/c does not exist,
then the spurious file name is treated as PATH_INFO.

So it should indeed not be allowed, as you suggest. It took me a while to
find out why weird things like paths to nonexistent directories and files
were allowed in the first place, and what to do to disallow them. The main
point was the MultiViews option which brings about most of the mess.

I will now use the following options as defaults:

Options All {disallows MultiViews}
Options -Indexes
DirectoryIndex index.html index.txt {and perhaps others, as needed}
AcceptPathInfo Off {unless needed for good reasons}

Then the path in the URL must be a valid file path resulting in a directory
(then one of index.* must exist there) or in an existing file. In all other
cases, a 404 or 403 status is returned. Simple rules, less problems.

--
Helmut Richter
0 new messages