Request match "precision" in match

Remy Blank

unread,

Aug 31, 2008, 7:35:05 PM8/31/08

to trac...@googlegroups.com

While working on #4878 (Weird URL handling of additional characters on
the end), which asks that requests to e.g. "/timeline/bogus" either
return a 404 or a permanent redirect to "/timeline", I've been looking
at match patterns in match_request(). This has raised a few questions
for which I would like to get some feedback:

* Currently, many request handlers match very liberally. For example,
any request that matches "/admin.*" will be handled by the admin module,
including "/adminlskjdfkljsd". Is this intentional? If not, shouldn't
this be fixed?

* Many handlers allow a trailing /, for example "/newticket" and
"/newticket/". What is the reason for this? Noah has mentioned this
being a convenience for people entering URLs that don't know what they
are doing. Are there any other reasons?

* The "matching precision" is very variable between handlers. Some
handlers match only precise URLs (e.g. "/tickets/([0-9]+)$"), others are
more graceful (e.g. "/report(?:/([0-9]+))?", which will show the list of
available reports for "/report/bogus").

Is there a rule for how precise a match_request() should be? I see two
possibilities:

* Be strict in what a handler accepts. This has the advantage of
giving a more precise answer for malformed URLs (usually a "No handler
matched request to ..."). But it is less forgiving for user-entered
URLs, and might therefore have usability issues. If this option should
be followed, then a trailing / should probably not be accepted.

* Be forgiving, and return a reasonable result for any URL matching at
least the handler's root component. For example, in the case of the
timeline, show the timeline for any request matching "/timeline/.*". The
advantage is that users are less likely to be shown an error message.
But this is technically less correct.

Opinions? Which rule should I follow?

-- Remy

signature.asc

Christopher Lenz

unread,

Sep 2, 2008, 4:52:12 AM9/2/08

to trac...@googlegroups.com

On 01.09.2008, at 01:35, Remy Blank wrote:
> While working on #4878 (Weird URL handling of additional characters
> on the end), which asks that requests to e.g. "/timeline/bogus"
> either return a 404 or a permanent redirect to "/timeline", I've
> been looking at match patterns in match_request(). This has raised a
> few questions for which I would like to get some feedback:
>
> * Currently, many request handlers match very liberally. For
> example, any request that matches "/admin.*" will be handled by the
> admin module, including "/adminlskjdfkljsd". Is this intentional? If
> not, shouldn't this be fixed?

Not intentional AFAIK, should be fixed.

> * Many handlers allow a trailing /, for example "/newticket" and "/
> newticket/". What is the reason for this? Noah has mentioned this
> being a convenience for people entering URLs that don't know what
> they are doing. Are there any other reasons?

The problem here is that if you don't allow a trailing slash, you
should at least automatically redirect from e.g. /foo/ to /foo (most
of the web does this the other way around, but hey).

So if we get more strict here, we need to add some kinda of slash-
stripping redirector thing.

> * The "matching precision" is very variable between handlers. Some
> handlers match only precise URLs (e.g. "/tickets/([0-9]+)$"), others
> are more graceful (e.g. "/report(?:/([0-9]+))?", which will show the
> list of available reports for "/report/bogus").
>
> Is there a rule for how precise a match_request() should be? I see
> two possibilities:
>
> * Be strict in what a handler accepts. This has the advantage of
> giving a more precise answer for malformed URLs (usually a "No
> handler matched request to ..."). But it is less forgiving for user-
> entered URLs, and might therefore have usability issues. If this
> option should be followed, then a trailing / should probably not be
> accepted.
>
> * Be forgiving, and return a reasonable result for any URL matching
> at least the handler's root component. For example, in the case of
> the timeline, show the timeline for any request matching "/
> timeline/.*". The advantage is that users are less likely to be
> shown an error message. But this is technically less correct.
>
> Opinions? Which rule should I follow?

I very much prefer strict. The rule should be not to expose the same
resource/representation under multiple different URIs. So even if
there are valid convenience features such as allowing both /foo/ and /
foo, one needs to redirect (as in 301) to the other.

One issue here is that changes in this space may break URIs out there
on the web (bookmarks, search indices, links etc). That's something we
need to be very careful about: we're not just "breaking" the Trac
site, but the sites of all the Trac users out there. :P

Cheers,
Chris
--
Christopher Lenz
cmlenz at gmx.de
http://www.cmlenz.net/

Remy Blank

unread,

Sep 2, 2008, 5:48:55 AM9/2/08

to trac...@googlegroups.com

>> * Currently, many request handlers match very liberally. For
>> example, any request that matches "/admin.*" will be handled by the
>> admin module, including "/adminlskjdfkljsd". Is this intentional? If
>> not, shouldn't this be fixed?
>
> Not intentional AFAIK, should be fixed.

Ok, I'll do that.

>> * Many handlers allow a trailing /, for example "/newticket" and "/
>> newticket/". What is the reason for this? Noah has mentioned this
>> being a convenience for people entering URLs that don't know what
>> they are doing. Are there any other reasons?
>
> The problem here is that if you don't allow a trailing slash, you
> should at least automatically redirect from e.g. /foo/ to /foo (most
> of the web does this the other way around, but hey).

I think some web servers give a slightly different meaning to each.
/foo/ means "foo" is a directory, and /foo that "foo" is a "file". IIRC
Apache does it that way. But in the era of dynamic web services, this
doesn't make sense anymore, and /foo seems more natural.

> So if we get more strict here, we need to add some kinda of slash-
> stripping redirector thing.

Ok.

>> Opinions? Which rule should I follow?
>
> I very much prefer strict. The rule should be not to expose the same
> resource/representation under multiple different URIs. So even if
> there are valid convenience features such as allowing both /foo/ and /
> foo, one needs to redirect (as in 301) to the other.

This makes perfect sense.

> One issue here is that changes in this space may break URIs out there
> on the web (bookmarks, search indices, links etc). That's something we
> need to be very careful about: we're not just "breaking" the Trac
> site, but the sites of all the Trac users out there. :P

Breaking invalid URLs like "/amdinlsdkjflkjsd" should not be a problem.
Breaking e.g. "/timeline/bogus/postfix" might or might not be a problem,
I don't know. The only breakage we really care about is bookmarks and
links, as search indices tend to be rebuilt quite often.

I'll try to do a search for links to Trac sites, and see if any invalid
URLs pop up.

Thanks for your comments.
-- Remy

signature.asc

Noah Kantrowitz

unread,

Sep 2, 2008, 12:58:17 PM9/2/08

to trac...@googlegroups.com

I thought about adding this as a core system, but there are places where a
trailing / is probably legal. The biggest one is attachment URLs, since / is
a legal filename character on some OSes. Granted it isn't any major ones, so
probably it would be fine to add this at the level of the URL dispatcher.

--Noah

Remy Blank

unread,

Sep 2, 2008, 5:51:41 PM9/2/08

to trac...@googlegroups.com

Noah Kantrowitz wrote:
> I thought about adding this as a core system, but there are places where a
> trailing / is probably legal. The biggest one is attachment URLs, since / is
> a legal filename character on some OSes. Granted it isn't any major ones, so
> probably it would be fine to add this at the level of the URL dispatcher.

The attachment module treats e.g. "/attachment/wiki/Projects/Trac/" and
"/attachment/wiki/Projects/Trac" differently. The former shows the list
of attachments for the Projects/Trac page, and the second shows the
attachment with the name "Trac" attached to the page Projects.

Maybe the file name and the path should be reversed:

/attachment/wiki/MyFile/Projects/Trac

would open the attachment named "MyFile" for page "Projects/Trac".

-- Remy

signature.asc

Noah Kantrowitz

unread,

Sep 2, 2008, 6:02:52 PM9/2/08

to trac...@googlegroups.com

> -----Original Message-----
> From: trac...@googlegroups.com [mailto:trac...@googlegroups.com] On
> Behalf Of Remy Blank
> Sent: Tuesday, September 02, 2008 2:52 PM
> To: trac...@googlegroups.com
> Subject: [Trac-dev] Re: Request match "precision" in match_request()
>

This breaks hierarchical semantics and grouping though.

--Noah

Ted Gifford

unread,

Sep 2, 2008, 10:48:59 PM9/2/08

to trac...@googlegroups.com

On Tue, Sep 2, 2008 at 6:02 PM, Noah Kantrowitz <no...@coderanger.net> wrote:

>
> The attachment module treats e.g. "/attachment/wiki/Projects/Trac/" and
> "/attachment/wiki/Projects/Trac" differently. The former shows the list
> of attachments for the Projects/Trac page, and the second shows the
> attachment with the name "Trac" attached to the page Projects.
>
> Maybe the file name and the path should be reversed:
>
> /attachment/wiki/MyFile/Projects/Trac
>
> would open the attachment named "MyFile" for page "Projects/Trac".

This breaks hierarchical semantics and grouping though.

Another option would be to have a separate handler for the attachment list and individual attachments:

/attachmentlist/wiki/Projects/Trac/ --> redir to ..../Trac
/attachment/wiki/Projects/Trac/file/ --> redir to ..../file (hmmm)
/attachment/wiki/Projects/Trac/file --> single attachment: "file"

Do people deep link to the attachment list much? Also, the second redirection above has semantic troubles as well.

Ted

Remy Blank

unread,

Sep 3, 2008, 1:33:07 AM9/3/08

to trac...@googlegroups.com

Noah Kantrowitz wrote:
>> Maybe the file name and the path should be reversed:
>>
>> /attachment/wiki/MyFile/Projects/Trac
>>
>> would open the attachment named "MyFile" for page "Projects/Trac".
>
> This breaks hierarchical semantics and grouping though.

Yes, and it doesn't work anyway, as there's no way to specify that you
want the list of attachments. Note to self: don't post stupid ideas.

What would work is to have two "roots": /attachment for showing
individual attachments, and /attachments for showing lists of attachments.

Oh, I see Ted has already made this suggestion. Take this as a +1, then.

-- Remy

signature.asc

Remy Blank

unread,

Sep 3, 2008, 1:35:10 AM9/3/08

to trac...@googlegroups.com

Ted Gifford wrote:
> Another option would be to have a separate handler for the attachment
> list and individual attachments:
>
> /attachmentlist/wiki/Projects/Trac/ --> redir to ..../Trac
> /attachment/wiki/Projects/Trac/file/ --> redir to ..../file (hmmm)
> /attachment/wiki/Projects/Trac/file --> single attachment: "file"
>
> Do people deep link to the attachment list much? Also, the second
> redirection above has semantic troubles as well.

Could you elaborate why the second redirection could be problematic?
IIRC, wiki pages with a / at the end have it stripped, so there should
not be any in the DB.

-- Remy

signature.asc

Noah Kantrowitz

unread,

Sep 3, 2008, 1:45:18 AM9/3/08

to trac...@googlegroups.com

Given /attachment/wiki/Page/Foo/Bar which do you have

wiki page Page with attachment Foo/Bar
wiki page Page/Foo with attachment Bar
list of attachments on wiki page Page/Foo/Bar

The first option is probably safe to make illegal since / isn't
generally used in file names, but the ambiguity of the last two is the
problem. The best option is probably to use a different character as
the separator between parent and attachment or move the list. Both
have bad URL semantics so I am all ears if someone has a better option.

--Noah

Ted Gifford

unread,

Sep 3, 2008, 10:01:35 AM9/3/08

to trac...@googlegroups.com

I think the conflict is resolved if we use two handlers
(/attachment{,s}/), and then start a new convention:
/attachment/wiki/....../file always ends in the attachment. If if there
happens to be a wiki page of the same name/path it will be ignored. Make
the "no such attachment" error message helpful, and people won't even
have to email the list...one can always hope.

Ted

Remy Blank

unread,

Sep 5, 2008, 7:55:02 AM9/5/08

to trac...@googlegroups.com

Noah Kantrowitz wrote:
> I thought about adding this as a core system, but there are places where a
> trailing / is probably legal. The biggest one is attachment URLs, since / is
> a legal filename character on some OSes. Granted it isn't any major ones, so
> probably it would be fine to add this at the level of the URL dispatcher.

I have attached a patch to http://trac.edgewall.org/ticket/4878 that
does exactly that: it redirects URLs *for which no handler has been
found* by stripping the / at the end. This ensures that handlers that do
manage trailing slashes differently (like attachments) continue working
as before.