PHP processing steps to apply to a URL to make it safe

James Harris

unread,

Feb 17, 2016, 4:38:05 AM2/17/16

to

In line with the principle of a program vetting its input, this is about
vetting the supplied URL.

In order to process a URL or parts of a URL in PHP what processing ought
to be applied to it? I have been using some steps which I will write
below but I am becoming increasingly uncertain that I have covered all
the bases.

The main concern is safety - ensuring that the PHP code is protected
against anything that might otherwise trip it up. The second concern is
correctness in even corner cases such as odd browsers or extended
character sets.

I have been working with $_SERVER["REQUEST_URI"], in case that is relevant.

Steps so far:

urldecode to convert %nn etc

trim "/" from the ends in order to normalise

check each character is from an allowed set

check there are no // parts

check there are no .. parts

All needed? Any not needed? Latest firefox strips out any .. entries
before sending the URL but I am not sure that all earlier browsers would.

In addition to the above, could the URL be in Unicode form? I was
thinking to change to index through it with

$uri[n]

but I gather that will index bytes and not characters. Could the URL
string that PHP receives be stored as Unicode and should something like
mb_substr be used instead?

That's all the steps I have come up with so far. It seems a lot.
Presumably you guys have steps you use yourselves. Are all the things I
have listed necessary? Is there anything else that should be done to
process URLs (or components thereof) safely and correctly?

James

Arno Welzel

unread,

Feb 17, 2016, 5:05:53 AM2/17/16

to

James Harris schrieb am 2016-02-17 um 10:37:

> In line with the principle of a program vetting its input, this is about
> vetting the supplied URL.
>
> In order to process a URL or parts of a URL in PHP what processing ought
> to be applied to it? I have been using some steps which I will write
> below but I am becoming increasingly uncertain that I have covered all
> the bases.

Well - it depends for which reasons you want to process the URL.

> The main concern is safety - ensuring that the PHP code is protected
> against anything that might otherwise trip it up. The second concern is
> correctness in even corner cases such as odd browsers or extended
> character sets.

One of the main reasons for unsafe code in PHP is using eval() and SQL
statements with values you don't control on your own. You should also
never process XML without some sanity checks.

Further reading:
<http://phpsecurity.readthedocs.org/en/latest/Injection-Attacks.html>

> I have been working with $_SERVER["REQUEST_URI"], in case that is relevant.
>
> Steps so far:
>
> urldecode to convert %nn etc
>
> trim "/" from the ends in order to normalise

There is no "normal" URL - in fact the "/" at the end is part of the URL.

> check each character is from an allowed set

Why?

> check there are no // parts

Why?

> check there are no .. parts

Why?

> All needed? Any not needed? Latest firefox strips out any .. entries
> before sending the URL but I am not sure that all earlier browsers would.

And what would happen, if your script get's an URL with ".." and "//"
inside?

> In addition to the above, could the URL be in Unicode form? I was
> thinking to change to index through it with

See RFC 1808:

<https://tools.ietf.org/html/rfc1808>

[...]

> That's all the steps I have come up with so far. It seems a lot.
> Presumably you guys have steps you use yourselves. Are all the things I
> have listed necessary? Is there anything else that should be done to
> process URLs (or components thereof) safely and correctly?

As I said: It depends on what you want to achieve - do you make sure the
URL you get is valid? Do you want to parse the URL in any way?

--
Arno Welzel
http://arnowelzel.de
http://de-rec-fahrrad.de
http://fahrradzukunft.de

Jerry Stuckle

unread,

Feb 17, 2016, 8:20:00 AM2/17/16

to

What are you trying to "make safe"? A url is either good or bad. If
it's bad, it won't bring up a site. If it's good, it will bring up a
site, but that site may not be safe.

What are you actually trying to accomplish here? I'm not sure how a
supplied URL will affect your PHP code.

--
==================
Remove the "x" from my email address
Jerry Stuckle
jstu...@attglobal.net
==================

R.Wieser

unread,

Feb 17, 2016, 8:43:55 AM2/17/16

to

Jerry,

> What are you trying to "make safe"? A url is either good or bad.

I've got a name for you: "Bobby Tables". Google it. And XKCD has got a page
about him: https://xkcd.com/327/

Regards,
Rudy Wieser

-- Origional message:
Jerry Stuckle <jstu...@attglobal.net> schreef in berichtnieuws
na1rs4$9tp$1...@jstuckle.eternal-september.org...

Arno Welzel

unread,

Feb 17, 2016, 9:05:05 AM2/17/16

to

R.Wieser schrieb am 2016-02-17 um 14:43:

> Jerry,
>
>> What are you trying to "make safe"? A url is either good or bad.
>
> I've got a name for you: "Bobby Tables". Google it. And XKCD has got a page
> about him: https://xkcd.com/327/

And what do SQL injections have to do with "safe URL"?

R.Wieser

unread,

Feb 17, 2016, 9:17:46 AM2/17/16

to

Arno,

> And what do SQL injections have to do with "safe URL"?

Please step away from the computer *now*. Bring it back to where you bought
it and ask your money back.

In other words: Either you are trolling, or you should not be doing anything
with PHP.

Regards,
Rudy Wieser

-- Origional message:

Arno Welzel <use...@arnowelzel.de> schreef in berichtnieuws
56C47E0...@arnowelzel.de...

Arno Welzel

unread,

Feb 17, 2016, 9:47:34 AM2/17/16

to

R.Wieser schrieb am 2016-02-17 um 15:17:

> Arno,
>
>> And what do SQL injections have to do with "safe URL"?
>
> Please step away from the computer *now*. Bring it back to where you bought
> it and ask your money back.
>
> In other words: Either you are trolling, or you should not be doing anything
> with PHP.

How does an *URL* itself cause an SQL injection? At least a PHP script
has to use the provided values and use it within an SQL statement.

And maybe you should also see my other post
<56C445F6...@arnowelzel.de> which I just sent a couple hours ago
(before your post) where I refer to possible security risks - including
SQL injection.

Jerry Stuckle

unread,

Feb 17, 2016, 9:55:09 AM2/17/16

to

Which has absolutely nothing to do with my question.

And don't top post.

Jerry Stuckle

unread,

Feb 17, 2016, 9:56:09 AM2/17/16

to

On 2/17/2016 9:17 AM, R.Wieser wrote:
> Arno,
>
>> And what do SQL injections have to do with "safe URL"?
>
> Please step away from the computer *now*. Bring it back to where you bought
> it and ask your money back.
>
> In other words: Either you are trolling, or you should not be doing anything
> with PHP.
>
> Regards,
> Rudy Wieser
>

IOW, you have absolutely no idea what you're talking about. I suggest
YOU get away from computers. You are dangerous to the entire internet.

R.Wieser

unread,

Feb 17, 2016, 9:57:13 AM2/17/16

to

Arno,

> How does an *URL* itself cause an SQL injection?

How does a bug in a PHP script *cause* a malfunction ? And no, if you do
not allow it to reach execution it won't. Your point ?

> And maybe you should also see my other post

... and maybe you should have noticed I was replying to what *Jerry* said.

Regards,
Rudy Wieser

-- Origional message:
Arno Welzel <use...@arnowelzel.de> schreef in berichtnieuws

56C487FA...@arnowelzel.de...

James Harris

unread,

Feb 17, 2016, 10:04:43 AM2/17/16

to

On 17/02/2016 10:05, Arno Welzel wrote:
> James Harris schrieb am 2016-02-17 um 10:37:
>
>> In line with the principle of a program vetting its input, this is about
>> vetting the supplied URL.
>>
>> In order to process a URL or parts of a URL in PHP what processing ought
>> to be applied to it? I have been using some steps which I will write
>> below but I am becoming increasingly uncertain that I have covered all
>> the bases.
>
> Well - it depends for which reasons you want to process the URL.

It will go through a series of steps to change it and it will end up
identifying something in the file system or something in a database.

>> The main concern is safety - ensuring that the PHP code is protected
>> against anything that might otherwise trip it up. The second concern is
>> correctness in even corner cases such as odd browsers or extended
>> character sets.
>
> One of the main reasons for unsafe code in PHP is using eval() and SQL
> statements with values you don't control on your own. You should also
> never process XML without some sanity checks.

Acknowledged that they are big issues but IMO all inputs should be checked.

> Further reading:
> <http://phpsecurity.readthedocs.org/en/latest/Injection-Attacks.html>
>
>> I have been working with $_SERVER["REQUEST_URI"], in case that is relevant.
>>
>> Steps so far:
>>
>> urldecode to convert %nn etc
>>
>> trim "/" from the ends in order to normalise
>
> There is no "normal" URL - in fact the "/" at the end is part of the URL.

The docs for request_uri (which I'll use as shorthand for the full
expression) did not specify whether leading or trailing slashes would be
present/maintained.

>> check each character is from an allowed set
>
> Why?

My own insecurity, perhaps!

>> check there are no // parts
>
> Why?

Because such would designate an empty path component as between "the"
and "bad" in http://site.com/this/is/the//bad/part.

>> check there are no .. parts
>
> Why?

Because that indicates the parent directory and can be used to move 'up'
a level in the path hierarchy. It is thus insecure.

>> All needed? Any not needed? Latest firefox strips out any .. entries
>> before sending the URL but I am not sure that all earlier browsers would.
>
> And what would happen, if your script get's an URL with ".." and "//"
> inside?

At the moment I report them as errors.

>> In addition to the above, could the URL be in Unicode form? I was
>> thinking to change to index through it with
>
> See RFC 1808:
>
> <https://tools.ietf.org/html/rfc1808>

I looked through it but could not see anything about Unicode. The info
on relative/partial URLs was useful, however.

I since found that PHP has functions such as parse_url but that would
not help me because while it can extract the piece I am interested in I
also want to vet the whole string so extracting a part is not useful.

> [...]
>> That's all the steps I have come up with so far. It seems a lot.
>> Presumably you guys have steps you use yourselves. Are all the things I
>> have listed necessary? Is there anything else that should be done to
>> process URLs (or components thereof) safely and correctly?
>
> As I said: It depends on what you want to achieve - do you make sure the
> URL you get is valid? Do you want to parse the URL in any way?

The URL component (request_uri) will be mapped to a file or a database
entry in this case.

James

James Harris

unread,

Feb 17, 2016, 10:15:36 AM2/17/16

to

On 17/02/2016 13:19, Jerry Stuckle wrote:

...

> What are you trying to "make safe"? A url is either good or bad. If
> it's bad, it won't bring up a site. If it's good, it will bring up a
> site, but that site may not be safe.

In this case I use url rewriting to force requests to a PHP script. The
script then has to process the rest of the URL - basically all of the
URL after site:port.

> What are you actually trying to accomplish here? I'm not sure how a
> supplied URL will affect your PHP code.

At the moment I pick up $_SERVER["REQUEST_URI"]. That gives me the rest
of the URL after the site:port part and does not strip off any ; or ?
parts - which allows me to vet the URL including to ensure those parts
are absent.

Does that make more sense now?

James

Christoph M. Becker

unread,

Feb 17, 2016, 10:38:13 AM2/17/16

to

Arno Welzel wrote:

> How does an *URL* itself cause an SQL injection? At least a PHP script
> has to use the provided values and use it within an SQL statement.

Consider that *some* URL is supplied and expected as input *parameter*.
Unless the parameter is validated or sanitized, there could be a
security issue.

--
Christoph M. Becker

R.Wieser

unread,

Feb 17, 2016, 10:39:21 AM2/17/16

to

Jerry,

> Which has absolutely nothing to do with my question.

Really ? Than I suggest you re-read the line I started my reply to you with

Also:

> > I'm not sure how a supplied URL will affect your PHP code.

> And don't top post.

Lolz. You do not even know what "top posting" actually means, don't you.

And also, don't bottom-post (hey, if you may lay out rules than so may I
:-) )

Regards,
Rudy Wieser

-- Origional message:
Jerry Stuckle <jstu...@attglobal.net> schreef in berichtnieuws

na21en$utt$1...@jstuckle.eternal-september.org...

R.Wieser

unread,

Feb 17, 2016, 10:44:41 AM2/17/16

to

Jerry,

> IOW, you have absolutely no idea what you're talking about.

Ofcourse not. It was only by sheer luck that I happened to pick a link to a
small strip joking about the most common problem with non-sanitized input
....

Yeah, that must be it. Just luck. Riiiight...

Regards,
Rudy Wieser

-- Origional message:

Jerry Stuckle <jstu...@attglobal.net> schreef in berichtnieuws

na21gj$utt$2...@jstuckle.eternal-september.org...

Jerry Stuckle

unread,

Feb 17, 2016, 10:54:18 AM2/17/16

to

On 2/17/2016 10:04 AM, James Harris wrote:
> On 17/02/2016 10:05, Arno Welzel wrote:
>> James Harris schrieb am 2016-02-17 um 10:37:
>>
>>> In line with the principle of a program vetting its input, this is about
>>> vetting the supplied URL.
>>>
>>> In order to process a URL or parts of a URL in PHP what processing ought
>>> to be applied to it? I have been using some steps which I will write
>>> below but I am becoming increasingly uncertain that I have covered all
>>> the bases.
>>
>> Well - it depends for which reasons you want to process the URL.
>
> It will go through a series of steps to change it and it will end up
> identifying something in the file system or something in a database.
>

That can be a very complicated process and prone to errors.

>>> The main concern is safety - ensuring that the PHP code is protected
>>> against anything that might otherwise trip it up. The second concern is
>>> correctness in even corner cases such as odd browsers or extended
>>> character sets.
>>
>> One of the main reasons for unsafe code in PHP is using eval() and SQL
>> statements with values you don't control on your own. You should also
>> never process XML without some sanity checks.
>
> Acknowledged that they are big issues but IMO all inputs should be checked.
>

Yes, you need to check all input from the user. However, WHAT you check
and HOW you check it depend in a large way as to how the data will be used.

>> Further reading:
>> <http://phpsecurity.readthedocs.org/en/latest/Injection-Attacks.html>
>>
>>> I have been working with $_SERVER["REQUEST_URI"], in case that is
>>> relevant.
>>>
>>> Steps so far:
>>>
>>> urldecode to convert %nn etc
>>>
>>> trim "/" from the ends in order to normalise
>>
>> There is no "normal" URL - in fact the "/" at the end is part of the URL.
>
> The docs for request_uri (which I'll use as shorthand for the full
> expression) did not specify whether leading or trailing slashes would be
> present/maintained.
>

It has whatever was passed from the web server. PHP does not change it.

>>> check each character is from an allowed set
>>
>> Why?
>
> My own insecurity, perhaps!
>

It can only be in the charset allowed by URLs - or it wouldn't have
gotten this far.

>>> check there are no // parts
>>
>> Why?
>
> Because such would designate an empty path component as between "the"
> and "bad" in http://site.com/this/is/the//bad/part.
>

See above.

>>> check there are no .. parts
>>
>> Why?
>
> Because that indicates the parent directory and can be used to move 'up'
> a level in the path hierarchy. It is thus insecure.
>

Not past the DOCUMENT_ROOT. The web server will not allow it.

>>> All needed? Any not needed? Latest firefox strips out any .. entries
>>> before sending the URL but I am not sure that all earlier browsers
>>> would.
>>
>> And what would happen, if your script get's an URL with ".." and "//"
>> inside?
>
> At the moment I report them as errors.
>

They are perfectly valid.

>>> In addition to the above, could the URL be in Unicode form? I was
>>> thinking to change to index through it with
>>
>> See RFC 1808:
>>
>> <https://tools.ietf.org/html/rfc1808>
>
> I looked through it but could not see anything about Unicode. The info
> on relative/partial URLs was useful, however.
>

That's because only the ASCII subset of characters (and not all of them)
are allowed in a URL.

> I since found that PHP has functions such as parse_url but that would
> not help me because while it can extract the piece I am interested in I
> also want to vet the whole string so extracting a part is not useful.
>
>> [...]
>>> That's all the steps I have come up with so far. It seems a lot.
>>> Presumably you guys have steps you use yourselves. Are all the things I
>>> have listed necessary? Is there anything else that should be done to
>>> process URLs (or components thereof) safely and correctly?
>>
>> As I said: It depends on what you want to achieve - do you make sure the
>> URL you get is valid? Do you want to parse the URL in any way?
>
> The URL component (request_uri) will be mapped to a file or a database
> entry in this case.
>
> James
>

I still don't understand exactly what you're trying to do. Are all page
requests to your website to be redirected to this script? If so, you'll
have to validate each part of the path - not just the entire uri.

Jerry Stuckle

unread,

Feb 17, 2016, 10:56:51 AM2/17/16

to

But it would not cause SQL tables to disappear, unless it was used as
input to a SQL statement - in which case sanitization would be different.

Jerry Stuckle

unread,

Feb 17, 2016, 10:57:31 AM2/17/16

to

On 2/17/2016 9:57 AM, R.Wieser wrote:
> Arno,
>
>> How does an *URL* itself cause an SQL injection?
>
> How does a bug in a PHP script *cause* a malfunction ? And no, if you do
> not allow it to reach execution it won't. Your point ?
>
>> And maybe you should also see my other post
>
> ... and maybe you should have noticed I was replying to what *Jerry* said.
>

Yes, and showing your ignorance in doing so.

Jerry Stuckle

unread,

Feb 17, 2016, 11:00:32 AM2/17/16

to

On 2/17/2016 10:39 AM, R.Wieser wrote:
> Jerry,
>
>> Which has absolutely nothing to do with my question.
>
> Really ? Than I suggest you re-read the line I started my reply to you with
>

Really.

> Also:
>>> I'm not sure how a supplied URL will affect your PHP code.
>
>> And don't top post.
>
> Lolz. You do not even know what "top posting" actually means, don't you.
>
> And also, don't bottom-post (hey, if you may lay out rules than so may I
> :-) )
>
> Regards,
> Rudy Wieser
>

Oh, I know what top posting is. And I know idiots and trolls don't
follow general usenet conventions.

And in one post you've proven yourself to be both. But then that's
pretty common for you, isn't it?

Jerry Stuckle

unread,

Feb 17, 2016, 11:05:49 AM2/17/16

to

James,

Yes, it helps. But see my previous notes. You don't need to worry
about non-ASCII characters because they won't be in the URI. You know
that because you had to have a valid URL to get here. But it is also
perfectly valid to have things like .. in a URL. You just have to
ensure it doesn't allow going above the DOCUMENT_ROOT.

But you also have to ensure it doesn't go other places it shouldn't -
for instance, you may have protected directories in your DOCUMENT_ROOT
which would not normally be accessible.

You really have to separate the URL into it's individual components and
check each one individually.

R.Wieser

unread,

Feb 17, 2016, 11:10:49 AM2/17/16

to

Jerry,

> Yes, and showing your ignorance in doing so.

And you seem to revell in saying "you're wrong" without even *trying* to
explain why.

... that makes me think of a puffer fish: pumping himself up trying to bluff
his way around problems . :-)

And by all means, "put your money where your mouth is" and at least *try* to
explain how all of what I say is ignorant. I don't think you're up to it
(prove me wrong).

Regards,
Rudy Wieser

-- Origional message:

Jerry Stuckle <jstu...@attglobal.net> schreef in berichtnieuws

na253l$nkl$3...@jstuckle.eternal-september.org...

Jerry Stuckle

unread,

Feb 17, 2016, 12:11:24 PM2/17/16

to

On 2/17/2016 11:10 AM, R.Wieser wrote:
> Jerry,
>
>> Yes, and showing your ignorance in doing so.
>
> And you seem to revell in saying "you're wrong" without even *trying* to
> explain why.
>

If you had a brain in your head, you would understand that a URL has
nothing to do with deleting tables in a SQL database.

> ... that makes me think of a puffer fish: pumping himself up trying to bluff
> his way around problems . :-)
>
> And by all means, "put your money where your mouth is" and at least *try* to
> explain how all of what I say is ignorant. I don't think you're up to it
> (prove me wrong).
>
> Regards,
> Rudy Wieser
>
>

You obviously don't even understand SQL injection. You just know how to
paste the URL to a cartoon (which is completely unrelated to the OP's
question) in a message.

But then you've done similar things in other newsgroups and are well
known as a troll there.

Jerry Stuckle

unread,

Feb 17, 2016, 12:12:31 PM2/17/16

to

On 2/17/2016 10:44 AM, R.Wieser wrote:
> Jerry,
>
>> IOW, you have absolutely no idea what you're talking about.
>
> Ofcourse not. It was only by sheer luck that I happened to pick a link to a
> small strip joking about the most common problem with non-sanitized input
> ....
>
> Yeah, that must be it. Just luck. Riiiight...
>
> Regards,
> Rudy Wieser
>
>

And now you're trying to backpedal. You don't even know what SQL
injection is - or that it's completely unrelated to the OP's question.

But then you're well known for trolling in multiple newsgroups.

R.Wieser

unread,

Feb 17, 2016, 12:51:27 PM2/17/16

to

Jerry,

> If you had a brain in your head, you would understand that a URL has
> nothing to do with deleting tables in a SQL database.

If trhats the case than I count myself lucky *not* to have a brain
(especially not one like yours), as pretty much everyone else than you seems
to understand the problem with an URL, or any kind of unsanitized input that
could be fed to a SQL database. Hence that XKCD strip.

Regards,
Rudy Wieser

-- Origional message:
Jerry Stuckle <jstu...@attglobal.net> schreef in berichtnieuws

na29e0$a65$1...@jstuckle.eternal-september.org...

Jerry Stuckle

unread,

Feb 17, 2016, 12:55:47 PM2/17/16

to

On 2/17/2016 12:51 PM, R.Wieser wrote:
> Jerry,
>
>> If you had a brain in your head, you would understand that a URL has
>> nothing to do with deleting tables in a SQL database.
>
> If trhats the case than I count myself lucky *not* to have a brain
> (especially not one like yours), as pretty much everyone else than you seems
> to understand the problem with an URL, or any kind of unsanitized input that
> could be fed to a SQL database. Hence that XKCD strip.
>
> Regards,
> Rudy Wieser
>
>

Except the OP said NOTHING about feeding it to a SQL database. But
top-posting idiot trolls can't understand simple questions.

No go crawl back into your Windows newsgroups. Or don't they talk to
trolls, either?

Because I'm done feeding the troll.

R.Wieser

unread,

Feb 17, 2016, 1:04:25 PM2/17/16

to

Jerry,

> And now you're trying to backpedal.

Ofcourse. That must be it.

A suggestion though: look up "sarcasm" in the dictionary. You might be in
for a surprise.

> You don't even know what SQL injection is -

How would you know #1

> or that it's completely unrelated to the OP's question.

How would you know #2

... And I've *still* not seen any hint to you trying to support your own
position, *nor* anything tearing mine down.

My guess ? You do not *have* anything in that regard. You just keep on
bluffing away, like a broken record.

This game has gone on long enough though. Goodbye.

Regards,
Rudy Wieser

-- Origional message:
Jerry Stuckle <jstu...@attglobal.net> schreef in berichtnieuws

na29g8$a65$2...@jstuckle.eternal-september.org...

Arno Welzel

unread,

Feb 17, 2016, 1:06:46 PM2/17/16

to

R.Wieser schrieb am 2016-02-17 um 15:57:

> Arno,
>
>> How does an *URL* itself cause an SQL injection?
>
> How does a bug in a PHP script *cause* a malfunction ? And no, if you do
> not allow it to reach execution it won't. Your point ?

My point is, that an URL itself is just an URL - and your reference to
SQL injection in this context doesn't make any sense at all.

BTW: at least you should think about your way to quote posts.

Arno Welzel

unread,

Feb 17, 2016, 1:09:00 PM2/17/16

to

Sure - but the question of the OP was not "how can I validate/sanitize
input parameters" but "how can I make an URL safe" - and since an URL
itself is not "safe" or "unsafe" the OP has to explain what he wants to
achieve. Otherwise one can only recommend general guidelines how to
avoid security problems - but this has nothing to do with URLs itself at
all.

Arno Welzel

unread,

Feb 17, 2016, 1:14:28 PM2/17/16

to

James Harris schrieb am 2016-02-17 um 16:15:

> On 17/02/2016 13:19, Jerry Stuckle wrote:
>
> ...
>
>> What are you trying to "make safe"? A url is either good or bad. If
>> it's bad, it won't bring up a site. If it's good, it will bring up a
>> site, but that site may not be safe.
>
> In this case I use url rewriting to force requests to a PHP script. The
> script then has to process the rest of the URL - basically all of the
> URL after site:port.

For URL rewriting it is better to use the provided functions of your
webserver, e.g. Apache mod_rewrite etc..

>> What are you actually trying to accomplish here? I'm not sure how a
>> supplied URL will affect your PHP code.
>
> At the moment I pick up $_SERVER["REQUEST_URI"]. That gives me the rest
> of the URL after the site:port part and does not strip off any ; or ?
> parts - which allows me to vet the URL including to ensure those parts
> are absent.
>
> Does that make more sense now?

So - you want to take the REQUEST_URI, modify its contents and then use
the result to do another HTTP request or redirect?

In this case you should really think about using Apache rewriting. Or
does the "modify the requested URI" involve some local lookups to a
database etc.?

R.Wieser

unread,

Feb 17, 2016, 2:19:56 PM2/17/16

to

Arno,

> My point is, that an URL itself is just an URL

No, it isn't. Its rather easy to combine the addres itself with some data
into an URL, just as any run-of-the-mill HTTP can GET do. You do not even
need to know how to program or write HTML, you can do that in the adres bar
of any browser.

> and your reference to SQL injection in this context doesn't
> make any sense at all.

Are you sure ?

From the OP's first post:

[quote]

All needed? Any not needed? Latest firefox strips out any .. entries
before sending the URL but I am not sure that all earlier browsers would.

[/quote]

AFAICS that means he will be receiving URLs from untrusted sources ...

> BTW: at least you should think about your way to quote posts.

Why ? Whats *WRONG* with it. And no, I do not consider anyones
*preference* in this matter to be more important than mine. Explain
yourself and I will consider the arguments. Thats all I can promise.

By the way: I quite dislike top, smack-in-the-middle and bottom posting
alike, only to be topped by the ones where absolutily nothing is quoted (not
even the name of the person who the response is directed towards).

I seldom (if ever) make a remark about it though, as everyone has got his
own preferences.

As for my style ? Remove anything from the "-- Origional message:"* line
down, and my post is *still* readable. Compare that to any of the above.

*which should have been a "bit" of a hint ....

And for the record: you can consider the part following the "-- Origional
message:" line as an addendum. You mostly will not need it (especially not
when in a well-functioning newgroup like this one), but it can be used to
check if anything is quoted outof context, or, when finding the message
somewhere in the future and all on its own, to get a feeling what post is
actually reponding to.

So yes, I've been thinking about how I quote posts.

Regards,
Rudy Wieser

-- Origional message:
Arno Welzel <use...@arnowelzel.de> schreef in berichtnieuws

56C4B6AD...@arnowelzel.de...

Thomas 'PointedEars' Lahn

unread,

Feb 17, 2016, 5:26:36 PM2/17/16

to

James Harris wrote:

> In line with the principle of a program vetting its input, this is about
> vetting the supplied URL.
>
> In order to process a URL or parts of a URL in PHP what processing ought
> to be applied to it? I have been using some steps which I will write
> below but I am becoming increasingly uncertain that I have covered all
> the bases.
>

> The main concern is safety - ensuring that the PHP code is protected
> against anything that might otherwise trip it up. The second concern is
> correctness in even corner cases such as odd browsers or extended
> character sets.
>

> I have been working with $_SERVER["REQUEST_URI"], in case that is
> relevant.

It is relevant: Do not do that. Request parameter values are provided in
properly decoded form through specific superglobal arrays such as $_GET,
and the $_SERVER['PATH_INFO'] value.

<http://php.net/manual/en/reserved.variables.php>

> Steps so far:
>
> urldecode to convert %nn etc

Use rawurldecode(), if that.

> trim "/" from the ends in order to normalise

Nonsense.

> check each character is from an allowed set

Why?

> check there are no // parts

Why?

> check there are no .. parts

Why? That is _not_ a proper measure to make sure that code does not perform
filesystem access above the DOCUMENT_ROOT. It smells of unsafe
include/require. Fix the problem, not the symptom.

> All needed? Any not needed? Latest firefox strips out any .. entries
> before sending the URL but I am not sure that all earlier browsers would.

Please read the PHP Manual on security, and visit <https://owasp.org/> to
get yourself a minimum clue.

--
PointedEars
Zend Certified PHP Engineer
<http://www.zend.com/en/yellow-pages/ZEND024953> | Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.

Thomas 'PointedEars' Lahn

unread,

Feb 17, 2016, 5:33:40 PM2/17/16

to

James Harris wrote:

> On 17/02/2016 10:05, Arno Welzel wrote:
>> James Harris schrieb am 2016-02-17 um 10:37:
>>> check there are no // parts
>> Why?
>
> Because such would designate an empty path component as between "the"
> and "bad" in http://site.com/this/is/the//bad/part.

How did you get the idea that this would be a problem?

>>> check there are no .. parts
>>
>> Why?
>
> Because that indicates the parent directory and can be used to move 'up'
> a level in the path hierarchy. It is thus insecure.

Why do you use user input verbatim in your code in the first place?

>>> In addition to the above, could the URL be in Unicode form? I was
>>> thinking to change to index through it with
>>
>> See RFC 1808:
>>
>> <https://tools.ietf.org/html/rfc1808>
>
> I looked through it but could not see anything about Unicode.

It has been obsoleted by RFC 3986 eleven(!) years ago now, which contains
something about Unicode, in particular it defines UTF-8 percent-encoding.

> The info on relative/partial URLs was useful, however.

Do not rely on obsolete RFCs.

> The URL component (request_uri) will be mapped to a file or a database
> entry in this case.

BAD. Broken as designed.

Thomas 'PointedEars' Lahn

unread,

Feb 17, 2016, 5:35:37 PM2/17/16

to

Then you sanitize that particular parameter and escape it for output, using
the built-in functions, respectively; you do not futilely attempt to make
the request URI safe.

Thomas 'PointedEars' Lahn

unread,

Feb 17, 2016, 5:36:37 PM2/17/16

to

No, your approach is simply wrong.

James Harris

unread,

Feb 17, 2016, 7:19:38 PM2/17/16

to

On 17/02/2016 18:14, Arno Welzel wrote:
> James Harris schrieb am 2016-02-17 um 16:15:
>
>> On 17/02/2016 13:19, Jerry Stuckle wrote:
>>
>> ...
>>
>>> What are you trying to "make safe"? A url is either good or bad. If
>>> it's bad, it won't bring up a site. If it's good, it will bring up a
>>> site, but that site may not be safe.
>>
>> In this case I use url rewriting to force requests to a PHP script. The
>> script then has to process the rest of the URL - basically all of the
>> URL after site:port.
>
> For URL rewriting it is better to use the provided functions of your
> webserver, e.g. Apache mod_rewrite etc..

Why is that better?

I currently use URL rewriting but only to invoke the PHP script. In PHP
code the remainder of the URL is picked up and processed.

>>> What are you actually trying to accomplish here? I'm not sure how a
>>> supplied URL will affect your PHP code.
>>
>> At the moment I pick up $_SERVER["REQUEST_URI"]. That gives me the rest
>> of the URL after the site:port part and does not strip off any ; or ?
>> parts - which allows me to vet the URL including to ensure those parts
>> are absent.
>>
>> Does that make more sense now?
>
> So - you want to take the REQUEST_URI, modify its contents and then use
> the result to do another HTTP request or redirect?

I currently take the request URI, validate it, and use it to form the
prefix of a file name. That prefix is to be used to select a file. The
contents of the file will be converted to HTML.

> In this case you should really think about using Apache rewriting. Or
> does the "modify the requested URI" involve some local lookups to a
> database etc.?

I can make the modifications in code just now but I may later make them
based a specification read from a configuration file (so it seems to me
best to do the modification in PHP rather than with URL rewriting).

James

James Harris

unread,

Feb 17, 2016, 7:36:39 PM2/17/16

to

On 17/02/2016 22:26, Thomas 'PointedEars' Lahn wrote:
> James Harris wrote:

...

>> I have been working with $_SERVER["REQUEST_URI"], in case that is
>> relevant.
>
> It is relevant: Do not do that. Request parameter values are provided in
> properly decoded form through specific superglobal arrays such as $_GET,
> and the $_SERVER['PATH_INFO'] value.

No good. PATH_INFO does not contain the query string, AIUI, whereas
REQUEST_URI does.

...

>> Steps so far:
>>
>> urldecode to convert %nn etc
>
> Use rawurldecode(), if that.

I'll think about it. I may want the + decode.

>> trim "/" from the ends in order to normalise
>
> Nonsense.

Is it *guaranteed* that $_SERVER["REQUEST_URI"] will return a string
with a leading slash?

...

>> check there are no .. parts
>
> Why? That is _not_ a proper measure to make sure that code does not perform
> filesystem access above the DOCUMENT_ROOT. It smells of unsafe
> include/require. Fix the problem, not the symptom.

No, that's not the reason.

>> All needed? Any not needed? Latest firefox strips out any .. entries
>> before sending the URL but I am not sure that all earlier browsers would.
>
> Please read the PHP Manual on security, and visit <https://owasp.org/> to
> get yourself a minimum clue.

Thanks for the link.

James

James Harris

unread,

Feb 17, 2016, 8:10:04 PM2/17/16

to

On 17/02/2016 15:54, Jerry Stuckle wrote:
> On 2/17/2016 10:04 AM, James Harris wrote:
>> On 17/02/2016 10:05, Arno Welzel wrote:
>>> James Harris schrieb am 2016-02-17 um 10:37:

...

>>>> I have been working with $_SERVER["REQUEST_URI"], in case that is
>>>> relevant.
>>>>
>>>> Steps so far:
>>>>
>>>> urldecode to convert %nn etc
>>>>
>>>> trim "/" from the ends in order to normalise
>>>
>>> There is no "normal" URL - in fact the "/" at the end is part of the URL.
>>
>> The docs for request_uri (which I'll use as shorthand for the full
>> expression) did not specify whether leading or trailing slashes would be
>> present/maintained.
>>
>
> It has whatever was passed from the web server. PHP does not change it.

It seems right, then, to remove the slash from the front, if there is
one. I will, however, give the potential trailing slash some more
thought. I may be best to leave that in place, if there is one, and
report it as an error. Thanks for pointing that out.

>>>> check each character is from an allowed set
>>>
>>> Why?
>>
>> My own insecurity, perhaps!
>>
>
> It can only be in the charset allowed by URLs - or it wouldn't have
> gotten this far.

My bad. I should have said that I do the charset checking *after* using
urldecode(). The idea is as far as possible to vet the string that the
user sees in his browser's location box. I guess browsers differ but
from tests on recent Firefox and IE they display the glyphs in the
location bar but pass them encoded in the request URI. I need to check
the characters after converting them back to the glyphs.

>>>> check there are no // parts
>>>
>>> Why?
>>
>> Because such would designate an empty path component as between "the"
>> and "bad" in http://site.com/this/is/the//bad/part.
>>
>
> See above.

Sorry, not sure which bit above, but I don't want a URL with a double
slash in the path to work. Perhaps N slashes could be merged into a
single slash but that would make two URLs identify the same resource on
the site. That is not how this is intended to work. It's a bit
complicated but to try to explain

http://site.com/folder/file
http://site.com/folder//file

Those two URLs would identify a virtual folder and a virtual file on the
site. The PHP code maps them to a real resource - which could be in a
database or it could be a file. The URL structure is intended to be
independent of and to outlive any changes to how the files or resources
are stored. I therefore want to make sure that users use the single
correct URL from day 1.

>>>> check there are no .. parts
>>>
>>> Why?
>>
>> Because that indicates the parent directory and can be used to move 'up'
>> a level in the path hierarchy. It is thus insecure.
>>
>
> Not past the DOCUMENT_ROOT. The web server will not allow it.

Nevertheless, for the same reasons as above, the URL structure does not
necessarily mean a part of a filesystem. To extend the earlier example,

http://site.com/folder/file
http://site.com/folder/dummy/../file

If both of those specified a part of the file system then the two URLs
could refer to the same file. But it is not intended to be a file path.
The .. entry, therefore, needs to be reported as an error and not used.
It cannot be used to backspace over the dummy folder name that precedes it.

When I tested this, browsers automatically removed .. entries from their
location bars but I did get at least one browser to send .. by using the
percent code for a dot (%2c, IIRC) twice.

>>>> All needed? Any not needed? Latest firefox strips out any .. entries
>>>> before sending the URL but I am not sure that all earlier browsers
>>>> would.
>>>
>>> And what would happen, if your script get's an URL with ".." and "//"
>>> inside?
>>
>> At the moment I report them as errors.
>>
>
> They are perfectly valid.

I prohibit them for the reasons mentioned above.

...

>>>> That's all the steps I have come up with so far. It seems a lot.
>>>> Presumably you guys have steps you use yourselves. Are all the things I
>>>> have listed necessary? Is there anything else that should be done to
>>>> process URLs (or components thereof) safely and correctly?
>>>
>>> As I said: It depends on what you want to achieve - do you make sure the
>>> URL you get is valid? Do you want to parse the URL in any way?
>>
>> The URL component (request_uri) will be mapped to a file or a database
>> entry in this case.
>>
>> James
>>
>
> I still don't understand exactly what you're trying to do. Are all page
> requests to your website to be redirected to this script? If so, you'll
> have to validate each part of the path - not just the entire uri.

Yes, at least all the relevant ones will go to this script.

What do you mean, "validate each part of the path - not just the entire
uri"? To make an example,

http://site.com:port/folder1/folder2/file

In that, my script will only be activated if http://site.com:port is
correct. So AFAICS, I only need to validate what follows it.

James

James Harris

unread,

Feb 17, 2016, 8:12:53 PM2/17/16

to

On 17/02/2016 22:33, Thomas 'PointedEars' Lahn wrote:

...

> Why do you use user input verbatim in your code in the first place?

I don't. The whole point of this query is about how to vet user input
and I did explain that I use the user input to locate a resource.

James

James Harris

unread,

Feb 17, 2016, 8:18:52 PM2/17/16

to

On 17/02/2016 16:05, Jerry Stuckle wrote:

... [discussion about /../ snipped]

> But you also have to ensure it doesn't go other places it shouldn't -
> for instance, you may have protected directories in your DOCUMENT_ROOT
> which would not normally be accessible.

Yes, that is a good reason to outlaw .. entries - one of a few.

> You really have to separate the URL into it's individual components and
> check each one individually.

OK. I do check the path against the file system at the moment (as this
is currently identifying a file). I use is_dir() once the request URI
string has been vetted and changed. I'll think about whether I should do
more checking, though.

James

Jerry Stuckle

unread,

Feb 17, 2016, 9:12:10 PM2/17/16

to

On 2/17/2016 8:09 PM, James Harris wrote:
> On 17/02/2016 15:54, Jerry Stuckle wrote:
>> On 2/17/2016 10:04 AM, James Harris wrote:
>>> On 17/02/2016 10:05, Arno Welzel wrote:
>>>> James Harris schrieb am 2016-02-17 um 10:37:
>
> ...
>
>>>>> I have been working with $_SERVER["REQUEST_URI"], in case that is
>>>>> relevant.
>>>>>
>>>>> Steps so far:
>>>>>
>>>>> urldecode to convert %nn etc
>>>>>
>>>>> trim "/" from the ends in order to normalise
>>>>
>>>> There is no "normal" URL - in fact the "/" at the end is part of the
>>>> URL.
>>>
>>> The docs for request_uri (which I'll use as shorthand for the full
>>> expression) did not specify whether leading or trailing slashes would be
>>> present/maintained.
>>>
>>
>> It has whatever was passed from the web server. PHP does not change it.
>
> It seems right, then, to remove the slash from the front, if there is
> one. I will, however, give the potential trailing slash some more
> thought. I may be best to leave that in place, if there is one, and
> report it as an error. Thanks for pointing that out.
>

It is NOT an error to have a trailing slash in a URL.

You need to be careful here. You shouldn't be changing the rules as to
what is a valid or invalid URL.

>>>>> check there are no .. parts
>>>>
>>>> Why?
>>>
>>> Because that indicates the parent directory and can be used to move 'up'
>>> a level in the path hierarchy. It is thus insecure.
>>>
>>
>> Not past the DOCUMENT_ROOT. The web server will not allow it.
>
> Nevertheless, for the same reasons as above, the URL structure does not
> necessarily mean a part of a filesystem. To extend the earlier example,
>
> http://site.com/folder/file
> http://site.com/folder/dummy/../file
>
> If both of those specified a part of the file system then the two URLs
> could refer to the same file. But it is not intended to be a file path.
> The .. entry, therefore, needs to be reported as an error and not used.
> It cannot be used to backspace over the dummy folder name that precedes it.
>
> When I tested this, browsers automatically removed .. entries from their
> location bars but I did get at least one browser to send .. by using the
> percent code for a dot (%2c, IIRC) twice.
>

Yes, and you can get it with embedded objects, also, such as an image
which resides in a higher level directory (or a subdirectory therein).
Remember that embedded objects can be referenced relative to the
directory the page is loaded from.

>>>>> All needed? Any not needed? Latest firefox strips out any .. entries
>>>>> before sending the URL but I am not sure that all earlier browsers
>>>>> would.
>>>>
>>>> And what would happen, if your script get's an URL with ".." and "//"
>>>> inside?
>>>
>>> At the moment I report them as errors.
>>>
>>
>> They are perfectly valid.
>
> I prohibit them for the reasons mentioned above.
>
> ...
>

And you should not, for reasons mentioned above.

>>>>> That's all the steps I have come up with so far. It seems a lot.
>>>>> Presumably you guys have steps you use yourselves. Are all the
>>>>> things I
>>>>> have listed necessary? Is there anything else that should be done to
>>>>> process URLs (or components thereof) safely and correctly?
>>>>
>>>> As I said: It depends on what you want to achieve - do you make sure
>>>> the
>>>> URL you get is valid? Do you want to parse the URL in any way?
>>>
>>> The URL component (request_uri) will be mapped to a file or a database
>>> entry in this case.
>>>
>>> James
>>>
>>
>> I still don't understand exactly what you're trying to do. Are all page
>> requests to your website to be redirected to this script? If so, you'll
>> have to validate each part of the path - not just the entire uri.
>
> Yes, at least all the relevant ones will go to this script.
>
> What do you mean, "validate each part of the path - not just the entire
> uri"? To make an example,
>
> http://site.com:port/folder1/folder2/file
>
> In that, my script will only be activated if http://site.com:port is
> correct. So AFAICS, I only need to validate what follows it.
>
> James
>

Yes, and you need to validate each part following the :port, separately
and together.

What you are doing is NOT simple, and you need to be very careful.
Allow all legitimate URLs while rejecting invalid ones can be very
difficult.

Arno Welzel

unread,

Feb 18, 2016, 2:03:13 AM2/18/16

to

James Harris schrieb am 2016-02-17 um 16:04:

> On 17/02/2016 10:05, Arno Welzel wrote:
>> James Harris schrieb am 2016-02-17 um 10:37:
>>

>>> In line with the principle of a program vetting its input, this is about
>>> vetting the supplied URL.
>>>
>>> In order to process a URL or parts of a URL in PHP what processing ought
>>> to be applied to it? I have been using some steps which I will write
>>> below but I am becoming increasingly uncertain that I have covered all
>>> the bases.
>>

>> Well - it depends for which reasons you want to process the URL.
>
> It will go through a series of steps to change it and it will end up
> identifying something in the file system or something in a database.

Ok - but in either case you end up in looking up for some information
based on what you get from the URL.

This means: Not the URL is unsafe, but the code which does the
information lookup may be unsafe.

[...]

>> There is no "normal" URL - in fact the "/" at the end is part of the URL.
>
> The docs for request_uri (which I'll use as shorthand for the full
> expression) did not specify whether leading or trailing slashes would be
> present/maintained.

Yes - because it does not matter wethere there is a trailing slash or
not. Some URLs have a trailing slash, others don't. But both forms are
totally valid and there is no "normal" form with or without trailing slash.

>>> check each character is from an allowed set
>>
>> Why?
>
> My own insecurity, perhaps!

What do you suspect to happen if an "invalid" character is passed?
Again: Not URL itself is "unsafe" - just code which uses the information
may be unsafe.

>>> check there are no // parts
>>
>> Why?
>
> Because such would designate an empty path component as between "the"
> and "bad" in http://site.com/this/is/the//bad/part.

So what? This is no problem at all, except when you write code which
relies on the fact, that there will ever by something between slashes.

In fact, using "//" doesn't even change the result in many cases:

<http://arnowelzel.de/wp/en/a-week-with-the-pebble>
<http://arnowelzel.de/wp/en//a-week-with-the-pebble>
<http://arnowelzel.de//wp//en//a-week-with-the-pebble>

>>> check there are no .. parts
>>
>> Why?
>
> Because that indicates the parent directory and can be used to move 'up'
> a level in the path hierarchy. It is thus insecure.

It is only insecure, if the webserver does not sanitize the path. Apache
and nginx won't let you get outside the root directory this way.

However - if YOUR code will use that information to access a file using
the given "path" of the URL this may be a problem. But in this case you
should better use realpath() to check if the path is still within its
allowed limits.

[...]

>> As I said: It depends on what you want to achieve - do you make sure the
>> URL you get is valid? Do you want to parse the URL in any way?
>
> The URL component (request_uri) will be mapped to a file or a database
> entry in this case.

So why don't you just use the whole URI and use it in a prepared SELECT
statement? In that case it's just not important, what the URI contains.
Either there is an entry for that URI in the database or not. But it
doesn't matter what the URI itself contains if it's just treated as text
and used as parameter in a prepared statement.

Of course you should NOT use the URI and just add it as text in the
SELECT statement - use a pepared statement.

Arno Welzel

unread,

Feb 18, 2016, 2:05:34 AM2/18/16

to

Thomas 'PointedEars' Lahn schrieb am 2016-02-17 um 23:33:
> James Harris wrote:
>
>> On 17/02/2016 10:05, Arno Welzel wrote:
>>> James Harris schrieb am 2016-02-17 um 10:37:

[...]

>>>> In addition to the above, could the URL be in Unicode form? I was
>>>> thinking to change to index through it with
>>>
>>> See RFC 1808:
>>>
>>> <https://tools.ietf.org/html/rfc1808>
>>
>> I looked through it but could not see anything about Unicode.
>
> It has been obsoleted by RFC 3986 eleven(!) years ago now, which contains
> something about Unicode, in particular it defines UTF-8 percent-encoding.

That's the problem with RFCs - nothing in the old RFC indicates the
replacement ;-)

But thanks for the hint.

Arno Welzel

unread,

Feb 18, 2016, 2:18:14 AM2/18/16

to

R.Wieser schrieb am 2016-02-17 um 20:19:

> Arno,
>
>> My point is, that an URL itself is just an URL
>
> No, it isn't. Its rather easy to combine the addres itself with some data
> into an URL, just as any run-of-the-mill HTTP can GET do. You do not even
> need to know how to program or write HTML, you can do that in the adres bar
> of any browser.

And? It is then still just an URL.

>> and your reference to SQL injection in this context doesn't
>> make any sense at all.
>
> Are you sure ?

Yes. Because without a database connection there is no "SQL injection"
at all.

> From the OP's first post:
>
> [quote]
> All needed? Any not needed? Latest firefox strips out any .. entries
> before sending the URL but I am not sure that all earlier browsers would.
> [/quote]
>
> AFAICS that means he will be receiving URLs from untrusted sources ...

And? I still don't see "database" which is in any case required to
produce a thing like "SQL injection".

And even if a database is involved - sanitizing values or using prepared
statements to avoid SQL injections is *alway* neccessary and not only
for values which are build based on URLs.

>> BTW: at least you should think about your way to quote posts.
>
> Why ? Whats *WRONG* with it. And no, I do not consider anyones
> *preference* in this matter to be more important than mine. Explain
> yourself and I will consider the arguments. Thats all I can promise.

The fact that you leave nearly everything of the quoted post in place
even if you don't refer to the quoted text. It's also not useful to
quote the *whole* posting below your reply again since people read from
top to bottom and it's really confusing reading a reply first and then
the text to which the reply refers to. This is a bad habit coming from
e-mail clients using this "original message:" below the reply.

In addition: Signatures are never being quoted - that's why signatures
start with "-- " (look carefully and you will notice the space after the
"--"). This is the indication for properly working newsreaders where to
stop quoting at all.

> By the way: I quite dislike top, smack-in-the-middle and bottom posting
> alike, only to be topped by the ones where absolutily nothing is quoted (not
> even the name of the person who the response is directed towards).

That's why there a references and every newsreader is able to show you
which posts a reply refers to and show a threaded message list as well.

[...]

> So yes, I've been thinking about how I quote posts.

Thanks.

R.Wieser

unread,

Feb 18, 2016, 3:20:20 AM2/18/16

to

Arno,

> And? It is then still just an URL.

And flawed code is then still just code.

... as long as you do not use it in any way nothing will go wrong there too.

Also, this is the third time you put that forward. I'm not willing to
participate in that merry-go-round. You can stay on if you like though.
Enjoy yourself.

> And? I still don't see "database" which is in any case
> required to produce a thing like "SQL injection".

Arno, you're behaving like an idiot. You have, as you said yourself, no
clue to what the OP is up to, but at the same time you are *very* sure (at
least to me) that it definitily can't be anything related to SQL injection.
That simply does not compute.

> That's why there a references and every newsreader is
> able to show you which posts a reply refers to and show
> a threaded message list as well.

Yeah, that really works well with one of those bloody Google-groups posters
of that last (no quote, not even a name) group, and where the replied-to
post has long been removed (as in years ago) from the real newsgroup
servers.

Think a bit further than your own immediate needs / setup /environment
please. :-\

Regards,
Rudy Wieser

-- Origional message:
Arno Welzel <use...@arnowelzel.de> schreef in berichtnieuws

56C5702C...@arnowelzel.de...

James Harris

unread,

Feb 18, 2016, 5:29:30 AM2/18/16

to

On 18/02/2016 09:59, Tim Streater wrote:
> In article <56C56D36...@arnowelzel.de>, Arno Welzel
> <use...@arnowelzel.de> wrote:

...

>> That's the problem with RFCs - nothing in the old RFC indicates the
>> replacement ;-)
>

> Yes it does. Look at the top of the old RFC (assuming you didn't
> download it on stone tablets in AD 54) and it will have an "Obsoleted
> by:" line indicating which RFC replaces it.

AIUI, Arno is right in that RFCs may not be modified once published. I
think you mean that the "obsoleted by" header appears in certain
presentations of an RFC - e.g. an HTML presentation.

Because of the immutable nature of RFCs, such a line cannot be added to
the original. Compare these two.

https://www.ietf.org/rfc/rfc1808.txt
https://tools.ietf.org/html/rfc1808

James

Arno Welzel

unread,

Feb 18, 2016, 9:44:33 AM2/18/16

to

Tim Streater schrieb am 2016-02-18 um 10:59:

> In article <56C56D36...@arnowelzel.de>, Arno Welzel
> <use...@arnowelzel.de> wrote:
>

>> Thomas 'PointedEars' Lahn schrieb am 2016-02-17 um 23:33:
>>> James Harris wrote:
>>>
>>>> On 17/02/2016 10:05, Arno Welzel wrote:
>>>>> James Harris schrieb am 2016-02-17 um 10:37:
>> [...]
>>>>>> In addition to the above, could the URL be in Unicode form? I was
>>>>>> thinking to change to index through it with
>>>>>
>>>>> See RFC 1808:
>>>>>
>>>>> <https://tools.ietf.org/html/rfc1808>
>>>>
>>>> I looked through it but could not see anything about Unicode.
>>>
>>> It has been obsoleted by RFC 3986 eleven(!) years ago now, which contains
>>> something about Unicode, in particular it defines UTF-8 percent-encoding.
>>
>> That's the problem with RFCs - nothing in the old RFC indicates the
>> replacement ;-)
>

> Yes it does. Look at the top of the old RFC (assuming you didn't
> download it on stone tablets in AD 54) and it will have an "Obsoleted
> by:" line indicating which RFC replaces it.

I stand corrected - of course there is an indication.

Arno Welzel

unread,

Feb 18, 2016, 9:49:06 AM2/18/16

to

R.Wieser schrieb am 2016-02-18 um 09:20:

> Arno,
[...]

>> And? I still don't see "database" which is in any case
>> required to produce a thing like "SQL injection".
>

> Arno, you're behaving like an idiot. You have, as you said yourself, no
> clue to what the OP is up to, but at the same time you are *very* sure (at
> least to me) that it definitily can't be anything related to SQL injection.
> That simply does not compute.

I give up - you don't get it.

[Quote style]

> Think a bit further than your own immediate needs / setup /environment
> please. :-\

Your environment needs the FULL(!) old post including the signature to
be quoted below your reply completely? Why?

[Useless fullquote deleted]

R.Wieser

unread,

Feb 18, 2016, 10:25:13 AM2/18/16

to

Arno,

> I give up - you don't get it.

What *is* there to get for me ?

You've put that "its just an URL" stance forward three times, not trying to
explain anything about it, and not responding to my counter-example to it.
You give me *zero* chance to understand your position.

And as you are not responding to my counter-example, should I just assume
you have not got the slightest idea either ? I mean, if you do you would
be able to counter my counter example (which you don't) ... Pot, meet
kettle ?

> Your environment needs the FULL(!) old post including the
> signature to be quoted below your reply completely? Why?

I already explained that. Can't you even *read* ? Or do you simply refuse
to acknowledge anything that does not conform to your own ideas of how stuff
ought to work ?

And by the way, you also did not explain in any way how you on one hand do
not know what the OP is busy with, but on the other hand know for certain
what it definitily isn't. Yes, that did not pass me by unnoticed. The
absense of any reaction from you to it does tell me enough though.

Goodbye.

Regards,
Rudy Wieser

-- Origional message:
Arno Welzel <use...@arnowelzel.de> schreef in berichtnieuws

56C5D9D4...@arnowelzel.de...

Jerry Stuckle

unread,

Feb 18, 2016, 10:48:05 AM2/18/16

to

On 2/18/2016 9:48 AM, Arno Welzel wrote:
> R.Wieser schrieb am 2016-02-18 um 09:20:
>
>> Arno,
> [...]
>>> And? I still don't see "database" which is in any case
>>> required to produce a thing like "SQL injection".
>>
>> Arno, you're behaving like an idiot. You have, as you said yourself, no
>> clue to what the OP is up to, but at the same time you are *very* sure (at
>> least to me) that it definitily can't be anything related to SQL injection.
>> That simply does not compute.
>
> I give up - you don't get it.
>

Arno,

You're arguing with an idiot. He's been in similar arguments with
people more knowledgeable than he in other newsgroups, also.

Arno Welzel

unread,

Feb 18, 2016, 11:00:30 AM2/18/16

to

R.Wieser schrieb am 2016-02-18 um 16:25:

> Arno,
[...]

>> Your environment needs the FULL(!) old post including the
>> signature to be quoted below your reply completely? Why?
>
> I already explained that. Can't you even *read* ? Or do you simply refuse
> to acknowledge anything that does not conform to your own ideas of how stuff
> ought to work ?

It's not *my* idea how to quote in usenet postings!

Further reading:

<https://www.netmeister.org/news/learn2quote.html>
<http://tools.ietf.org/html/rfc1849#section-4.3.2>
<http://tools.ietf.org/html/rfc3676>

> And by the way, you also did not explain in any way how you on one hand do
> not know what the OP is busy with, but on the other hand know for certain
> what it definitily isn't. Yes, that did not pass me by unnoticed. The
> absense of any reaction from you to it does tell me enough though.

Well - I assumed only what is known. And at the time the OP asked he
asked for "processing an URL to make it safe" and he did NOT mention
"creating SQL queries based on anything passed with the URL".

Why you think of "SQL injection" if someone talks about how to process
URLs is your own problem.

Thomas 'PointedEars' Lahn

unread,

Feb 18, 2016, 3:47:35 PM2/18/16

to

Arno Welzel wrote:

> Thomas 'PointedEars' Lahn schrieb am 2016-02-17 um 23:33:
>> James Harris wrote:
>>> On 17/02/2016 10:05, Arno Welzel wrote:
>>>> James Harris schrieb am 2016-02-17 um 10:37:
> [...]
>>>>> In addition to the above, could the URL be in Unicode form? I was
>>>>> thinking to change to index through it with
>>>>
>>>> See RFC 1808:
>>>>
>>>> <https://tools.ietf.org/html/rfc1808>
>>> I looked through it but could not see anything about Unicode.
>> It has been obsoleted by RFC 3986 eleven(!) years ago now, which contains
>> something about Unicode, in particular it defines UTF-8 percent-encoding.
>
> That's the problem with RFCs - nothing in the old RFC indicates the
> replacement ;-)

No, you have overlooked the “Obsoleted by:” line that is provided
*specifically* by HTML versions of obsolete RFCs (which is why I also prefer
to refer to the HTML versions).

Besides, the PHP Manual refers to RFC 3986 explicitly in the documentation
of the corresponding functions: <http://php.net/url>

> But thanks for the hint.

You’re welcome.

Thomas 'PointedEars' Lahn

unread,

Feb 18, 2016, 3:48:45 PM2/18/16

to

But ISTM that you are using and sanitizing the wrong kind of user input.

Thomas 'PointedEars' Lahn

unread,

Feb 18, 2016, 4:03:13 PM2/18/16

to

James Harris wrote:

> On 17/02/2016 22:26, Thomas 'PointedEars' Lahn wrote:
>> James Harris wrote:
>>> I have been working with $_SERVER["REQUEST_URI"], in case that is
>>> relevant.
>> It is relevant: Do not do that. Request parameter values are provided in
>> properly decoded form through specific superglobal arrays such as $_GET,
>> and the $_SERVER['PATH_INFO'] value.
>
> No good. PATH_INFO does not contain the query string, AIUI, whereas
> REQUEST_URI does.
>
> ...

There is no need for a "query string" if you use PATH_INFO. Whether the
latter is feasible with your server setup and use-case I do not know yet.

>>> trim "/" from the ends in order to normalise
>> Nonsense.
>
> Is it *guaranteed* that $_SERVER["REQUEST_URI"] will return a string
> with a leading slash?

Yes, see RFCs 1945 (HTTP/1.0), 2616 (HTTP/1.1), and 7540 (HTTP/2). Your
server-side PHP script would not be executed if the request URI would not
refer to it in some way.

But trimming “/” from the end*s* is a different issue as that includes
*trailing* slashes.

>>> check there are no .. parts
>>
>> Why? That is _not_ a proper measure to make sure that code does not
>> perform
>> filesystem access above the DOCUMENT_ROOT. It smells of unsafe
>> include/require. Fix the problem, not the symptom.
>
> No, that's not the reason.

OK, you need to explain your use-case then if you would like an informed
opinion from me. I do not have time to look for it in the rest of the
thread and perhaps not find it there.

>>> All needed? Any not needed? Latest firefox strips out any .. entries
>>> before sending the URL but I am not sure that all earlier browsers
>>> would.
>> Please read the PHP Manual on security, and visit <https://owasp.org/> to
>> get yourself a minimum clue.
>
> Thanks for the link.

You’re welcome.

Thomas 'PointedEars' Lahn

unread,

Feb 18, 2016, 4:26:58 PM2/18/16

to

Arno Welzel wrote:

> R.Wieser schrieb am 2016-02-18 um 16:25:
>> Arno,
> [...]
>>> Your environment needs the FULL(!) old post including the
>>> signature to be quoted below your reply completely? Why?
>> I already explained that. Can't you even *read* ? Or do you simply
>> refuse to acknowledge anything that does not conform to your own ideas of
>> how stuff ought to work ?
>
> It's not *my* idea how to quote in usenet postings!

> […]

Wasted effort. It’s an anti-social pseudo-anonymous address munger posting
via a provider whose terms of use imply in Article 5, § 3 that address
munging is not allowed. Because of that, I would not have seen those
postings had you not quoted them.

James Harris

unread,

Feb 20, 2016, 2:07:49 PM2/20/16

to

On 18/02/2016 02:12, Jerry Stuckle wrote:
> On 2/17/2016 8:09 PM, James Harris wrote:
>> On 17/02/2016 15:54, Jerry Stuckle wrote:
>>> On 2/17/2016 10:04 AM, James Harris wrote:
>>>> On 17/02/2016 10:05, Arno Welzel wrote:
>>>>> James Harris schrieb am 2016-02-17 um 10:37:

...

>>>>>> check there are no // parts
>>>>>
>>>>> Why?
>>>>
>>>> Because such would designate an empty path component as between "the"
>>>> and "bad" in http://site.com/this/is/the//bad/part.
>>>>
>>>
>>> See above.
>>
>> Sorry, not sure which bit above, but I don't want a URL with a double
>> slash in the path to work. Perhaps N slashes could be merged into a
>> single slash but that would make two URLs identify the same resource on
>> the site. That is not how this is intended to work. It's a bit
>> complicated but to try to explain
>>
>> http://site.com/folder/file
>> http://site.com/folder//file
>>
>> Those two URLs would identify a virtual folder and a virtual file on the
>> site. The PHP code maps them to a real resource - which could be in a
>> database or it could be a file. The URL structure is intended to be
>> independent of and to outlive any changes to how the files or resources
>> are stored. I therefore want to make sure that users use the single
>> correct URL from day 1.
>>
>
> You need to be careful here. You shouldn't be changing the rules as to
> what is a valid or invalid URL.

Sorry. I think I have been at fault in not being clear that although I
am getting data from a URL my application is really using the URL path
as a unique key. What the URL contains after the site name is taken as a
hierarchical sequence of key values. In

http://site.com/folder/subfolder/file

the "folder" is a key within the site; the "subfolder" is a key within
the first key, etc.

As such, any // element is invalid because it specifies a null key.
Also, /../ is invalid. It does not indicate a parent folder but the ".."
key. And a trailing slash is (or at least could be seen as) invalid
because it falsely indicates that another key is to follow.

So I am applying rules on top of those that apply to URLs, not taking or
having to vet all URL rules as they stand.

James

James Harris

unread,

Feb 20, 2016, 2:29:21 PM2/20/16

to

On 18/02/2016 07:02, Arno Welzel wrote:
> James Harris schrieb am 2016-02-17 um 16:04:
>
>> On 17/02/2016 10:05, Arno Welzel wrote:
>>> James Harris schrieb am 2016-02-17 um 10:37:
>>>
>>>> In line with the principle of a program vetting its input, this is about
>>>> vetting the supplied URL.
>>>>
>>>> In order to process a URL or parts of a URL in PHP what processing ought
>>>> to be applied to it? I have been using some steps which I will write
>>>> below but I am becoming increasingly uncertain that I have covered all
>>>> the bases.
>>>
>>> Well - it depends for which reasons you want to process the URL.
>>
>> It will go through a series of steps to change it and it will end up
>> identifying something in the file system or something in a database.
>
> Ok - but in either case you end up in looking up for some information
> based on what you get from the URL.
>
> This means: Not the URL is unsafe, but the code which does the
> information lookup may be unsafe.

Yes, though in those terms I don't think there is ever any unsafe input,
only unsafe code which fails to handle the input properly.

...

>>>> check each character is from an allowed set
>>>
>>> Why?
>>
>> My own insecurity, perhaps!
>
> What do you suspect to happen if an "invalid" character is passed?
> Again: Not URL itself is "unsafe" - just code which uses the information
> may be unsafe.

Well, as was pointed out to me, after passing the request_uri through
(raw)urldecode the resulting string might not have ASCII coding and so
would not be suitable for byte-wise processing. That is a bit of a gotcha.

...

>>> As I said: It depends on what you want to achieve - do you make sure the
>>> URL you get is valid? Do you want to parse the URL in any way?
>>
>> The URL component (request_uri) will be mapped to a file or a database
>> entry in this case.
>
> So why don't you just use the whole URI and use it in a prepared SELECT
> statement? In that case it's just not important, what the URI contains.

Yes, I suppose I could do that for file access too. I was/am just wary
of presenting an input string to any function without vetting that
string first, even with a prepared statement as you mentioned.

I *can* see the prepared-statement option should work. I am less sure
what would happen if I presented an unvetted string to a file- or
directory-access function.

Thanks to everyone for the advice. The best option seems to be along the
lines of:

1. rawurldecode

2. check that the result is ASCII coded

3. add a leading slash if there is not one

4. split by slash characters

5. check that each part is nonblank and has a suitable form for a key

I might add a check that all letters are lower case. That would help
generate a consistent response irrespective of the OS the script were to
run on.

James

James Harris

unread,

Feb 20, 2016, 2:38:24 PM2/20/16

to

On 18/02/2016 21:03, Thomas 'PointedEars' Lahn wrote:
> James Harris wrote:
>
>> On 17/02/2016 22:26, Thomas 'PointedEars' Lahn wrote:
>>> James Harris wrote:
>>>> I have been working with $_SERVER["REQUEST_URI"], in case that is
>>>> relevant.
>>> It is relevant: Do not do that. Request parameter values are provided in
>>> properly decoded form through specific superglobal arrays such as $_GET,
>>> and the $_SERVER['PATH_INFO'] value.
>>
>> No good. PATH_INFO does not contain the query string, AIUI, whereas
>> REQUEST_URI does.
>>
>> ...
>
> There is no need for a "query string" if you use PATH_INFO. Whether the
> latter is feasible with your server setup and use-case I do not know yet.

I need to ensure that there is no query string.

>>>> trim "/" from the ends in order to normalise
>>> Nonsense.
>>
>> Is it *guaranteed* that $_SERVER["REQUEST_URI"] will return a string
>> with a leading slash?
>
> Yes, see RFCs 1945 (HTTP/1.0), 2616 (HTTP/1.1), and 7540 (HTTP/2). Your
> server-side PHP script would not be executed if the request URI would not
> refer to it in some way.

Thanks for the links but IIRC I found in at least one test that a
request for the home page as in

http://site.com

will not have a leading slash on the $_SERVER["REQUEST_URI"].

On balance, I think I will keep the check for a leading slash.

In fact, I found that adding a leading slash if PHP did not pass me one
made for easier later processing - especially on PHP functions which did
not behave well given an empty string.

James

Thomas 'PointedEars' Lahn

unread,

Feb 20, 2016, 3:08:03 PM2/20/16

to

James Harris wrote:

> Sorry. I think I have been at fault in not being clear that although I
> am getting data from a URL my application is really using the URL path
> as a unique key. What the URL contains after the site name is taken as a
> hierarchical sequence of key values. In
>
> http://site.com/folder/subfolder/file

Please use the “example” TLD for examples instead:

http://site.example/folder/subfolder/file

See also <http://tools.ietf.org/html/rfc2606>.

> the "folder" is a key within the site; the "subfolder" is a key within
> the first key, etc.
>
> As such, any // element is invalid because it specifies a null key.
> Also, /../ is invalid. It does not indicate a parent folder but the ".."
> key. And a trailing slash is (or at least could be seen as) invalid
> because it falsely indicates that another key is to follow.
>
> So I am applying rules on top of those that apply to URLs, not taking or
> having to vet all URL rules as they stand.

I stand corrected with regard to my earlier statement that it would be
guaranteed that $_SERVER['REQUEST_URI'] would always start with a “/”.
I have confirmed that it can also be an absolute URI instead:

----------------------------------------------------------------------
$ echo '<?= $_SERVER["REQUEST_URI"] . "\n" ?>' > tmp/server.php
$ php -S localhost:1337 -t tmp/ &
[1] 25339
$ PHP 5.6.14-0+deb8u1 Development Server started at Sat Feb 20 21:03:50 2016
Listening on http://localhost:1337
Document root is /home/pelinux/tmp
Press Ctrl-C to quit.

$ telnet localhost 1337
Trying ::1...
Connected to localhost.
Escape character is '^]'.
GET http://localhost:1337/server.php HTTP/1.0

HTTP/1.0 200 OK
Connection: close
X-Powered-By: PHP/5.6.14-0+deb8u1
Content-type: text/html; charset=UTF-8

http://localhost:1337/server.php
Connection closed by foreign host.
----------------------------------------------------------------------

See also <http://tools.ietf.org/html/rfc2616#section-5.1.2>.

But what reason do you have to think that someone would make such an HTTP
request to your server? Also, what reason do you have to think that someone
would make an HTTP request to your server that contains “..”, i.e. that this
would not be resolved by the HTTP client already?

Thomas 'PointedEars' Lahn

unread,

Feb 20, 2016, 3:17:32 PM2/20/16

to

James Harris wrote:

> On 18/02/2016 21:03, Thomas 'PointedEars' Lahn wrote:
>> James Harris wrote:
>>> On 17/02/2016 22:26, Thomas 'PointedEars' Lahn wrote:

>>>> […] Do not [use $_SERVER["REQUEST_URI"]]. Request parameter values are

>>>> provided in properly decoded form through specific superglobal arrays
>>>> such as $_GET, and the $_SERVER['PATH_INFO'] value.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>>> No good. PATH_INFO does not contain the query string, AIUI, whereas

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>>> REQUEST_URI does.
>>>
>>> ...
>> There is no need for a "query string" if you use PATH_INFO. Whether the
>> latter is feasible with your server setup and use-case I do not know yet.
>
> I need to ensure that there is no query string.

1. You are contradicting your earlier statement.

2. Why?

3. Then you should definitely use PATH_INFO instead of REQUEST_URI.

>>>>> trim "/" from the ends in order to normalise
>>>> Nonsense.
>>> Is it *guaranteed* that $_SERVER["REQUEST_URI"] will return a string
>>> with a leading slash?
>> Yes, see RFCs 1945 (HTTP/1.0), 2616 (HTTP/1.1), and 7540 (HTTP/2). Your
>> server-side PHP script would not be executed if the request URI would not
>> refer to it in some way.
> Thanks for the links but IIRC I found in at least one test that a
> request for the home page as in
>
> http://site.com
>
> will not have a leading slash on the $_SERVER["REQUEST_URI"].

Yes, see <news:20227200....@PointedEars.de>; I stand corrected.

> On balance, I think I will keep the check for a leading slash.

Which HTTP client produced this request URI? What was the HTTP server?

> In fact, I found that adding a leading slash if PHP did not pass me one
> made for easier later processing - especially on PHP functions which did
> not behave well given an empty string.

“Talk is cheap. Show me the code.”
—Linus Torvalds

James Harris

unread,

Feb 20, 2016, 3:36:18 PM2/20/16

to

On 20/02/2016 20:07, Thomas 'PointedEars' Lahn wrote:
> James Harris wrote:
>
>> Sorry. I think I have been at fault in not being clear that although I
>> am getting data from a URL my application is really using the URL path
>> as a unique key. What the URL contains after the site name is taken as a
>> hierarchical sequence of key values. In
>>
>> http://site.com/folder/subfolder/file
>
> Please use the “example” TLD for examples instead:
>
> http://site.example/folder/subfolder/file

I see there is http://example.com as well. I might use that instead as
people could find the .example TLD a bit odd.

> See also <http://tools.ietf.org/html/rfc2606>.

Thanks.

>> the "folder" is a key within the site; the "subfolder" is a key within
>> the first key, etc.
>>
>> As such, any // element is invalid because it specifies a null key.
>> Also, /../ is invalid. It does not indicate a parent folder but the ".."
>> key. And a trailing slash is (or at least could be seen as) invalid
>> because it falsely indicates that another key is to follow.
>>
>> So I am applying rules on top of those that apply to URLs, not taking or
>> having to vet all URL rules as they stand.

...

> But what reason do you have to think that someone would make such an HTTP
> request to your server? Also, what reason do you have to think that someone
> would make an HTTP request to your server that contains “..”, i.e. that this
> would not be resolved by the HTTP client already?

I don't think they would. I think they might be able to. IMO vetting
inputs is not about what sensible people might do but about what those
with bad intent could do. And there are so many different versions of
browser out there that we cannot test on anything like all of them.

James

Thomas 'PointedEars' Lahn

unread,

Feb 20, 2016, 3:37:25 PM2/20/16

to

James Harris wrote:

> Well, as was pointed out to me, after passing the request_uri through
> (raw)urldecode the resulting string might not have ASCII coding and so

ASCII _encoding_, and it *certainly* is not encoded using US-ASCII.
AISB, US-ASCII is a *7*-bit encoding.

> would not be suitable for byte-wise processing. That is a bit of a gotcha.

<http://php.net/manual/en/ref.mbstring.php>

> 2. check that the result is ASCII coded

Superfluous if you can handle different encodings. Unicode support in
databases and filenames, for example, is common nowadays. PHP is oblivious
to the character encoding of a string (you need to tell it), which only is a
problem if you need to access individual characters (and then see above for
the solution).

> 3. add a leading slash if there is not one

To what end?

> 4. split by slash characters

That would leave an empty first element of the resulting array, …

> 5. check that each part is nonblank and has a suitable form for a key

… so this test would fail.

I still think that your approach is wrong, but unless you state your use-
case in more detail, I cannot be certain.

> I might add a check that all letters are lower case. That would help
> generate a consistent response irrespective of the OS the script were to
> run on.

So you are mapping URI paths to files. Know then that it is not so much a
matter of the operating system, but of the *file* system how filenames
are handled. Also, it is inherently unsafe. Why are you doing this?

Thomas 'PointedEars' Lahn

unread,

Feb 20, 2016, 3:40:26 PM2/20/16

to

James Harris wrote:

> On 20/02/2016 20:07, Thomas 'PointedEars' Lahn wrote:

>> But what reason do you have to think that someone would make [an HTTP
>> request with an absolute URI] to your server? Also, what reason do you

>> have to think that someone would make an HTTP request to your server that
>> contains “..”, i.e. that this would not be resolved by the HTTP client
>> already?
>
> I don't think they would. I think they might be able to. IMO vetting
> inputs is not about what sensible people might do but about what those
> with bad intent could do. And there are so many different versions of
> browser out there that we cannot test on anything like all of them.

The “people with bad intent” would literally GET *nothing* in this case.
What is the harm in that?

James Harris

unread,

Feb 20, 2016, 3:56:30 PM2/20/16

to

On 20/02/2016 20:17, Thomas 'PointedEars' Lahn wrote:
> James Harris wrote:
>
>> On 18/02/2016 21:03, Thomas 'PointedEars' Lahn wrote:
>>> James Harris wrote:
>>>> On 17/02/2016 22:26, Thomas 'PointedEars' Lahn wrote:
>>>>> […] Do not [use $_SERVER["REQUEST_URI"]]. Request parameter values are
>>>>> provided in properly decoded form through specific superglobal arrays
>>>>> such as $_GET, and the $_SERVER['PATH_INFO'] value.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> No good. PATH_INFO does not contain the query string, AIUI, whereas
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> REQUEST_URI does.
>>>>
>>>> ...
>>> There is no need for a "query string" if you use PATH_INFO. Whether the
>>> latter is feasible with your server setup and use-case I do not know yet.
>>
>> I need to ensure that there is no query string.
>
> 1. You are contradicting your earlier statement.
>
> 2. Why?
>
> 3. Then you should definitely use PATH_INFO instead of REQUEST_URI.

I don't think there is a contradiction. Let me give an example with this URL

http://example.com/folder/file?greet=hello;lang=en#top

If that was presented to the server I would need to see the following string

/folder/file?greet=hello&lang=en

In my PHP code I would report that as invalid because the query string
is unacceptable.

I need to see the query string so that I can report a request containing
one as invalid. The ?, = and & characters would all be reported as
errors. Current report:

Invalid character in URL: "?"
Invalid character in URL: "="
Invalid character in URL: "&"
Invalid character in URL: "="

I could make the reporting a bit more user-friendly but it does the job
for now.

BTW, is there a $_SERVER element that would return the #top part as well?

>>>>>> trim "/" from the ends in order to normalise
>>>>> Nonsense.
>>>> Is it *guaranteed* that $_SERVER["REQUEST_URI"] will return a string
>>>> with a leading slash?
>>> Yes, see RFCs 1945 (HTTP/1.0), 2616 (HTTP/1.1), and 7540 (HTTP/2). Your
>>> server-side PHP script would not be executed if the request URI would not
>>> refer to it in some way.
>> Thanks for the links but IIRC I found in at least one test that a
>> request for the home page as in
>>
>> http://site.com
>>
>> will not have a leading slash on the $_SERVER["REQUEST_URI"].
>
> Yes, see <news:20227200....@PointedEars.de>; I stand corrected.
>
>> On balance, I think I will keep the check for a leading slash.
>
> Which HTTP client produced this request URI? What was the HTTP server?

I don't know now if that was correct. I just tried a couple of likely
browsers but neither had that effect. The server was Apache.

James

James Harris

unread,

Feb 20, 2016, 4:02:51 PM2/20/16

to

On 20/02/2016 20:40, Thomas 'PointedEars' Lahn wrote:
> James Harris wrote:
>
>> On 20/02/2016 20:07, Thomas 'PointedEars' Lahn wrote:
>>> But what reason do you have to think that someone would make [an HTTP
>>> request with an absolute URI] to your server? Also, what reason do you
>>> have to think that someone would make an HTTP request to your server that
>>> contains “..”, i.e. that this would not be resolved by the HTTP client
>>> already?
>>
>> I don't think they would. I think they might be able to. IMO vetting
>> inputs is not about what sensible people might do but about what those
>> with bad intent could do. And there are so many different versions of
>> browser out there that we cannot test on anything like all of them.
>
> The “people with bad intent” would literally GET *nothing* in this case.
> What is the harm in that?

I am not sure what that means. We may be talking about different things.
No problem.

James

James Harris

unread,

Feb 20, 2016, 4:37:56 PM2/20/16

to

On 20/02/2016 20:37, Thomas 'PointedEars' Lahn wrote:
> James Harris wrote:
>
>> Well, as was pointed out to me, after passing the request_uri through
>> (raw)urldecode the resulting string might not have ASCII coding and so
>
> ASCII _encoding_, and it *certainly* is not encoded using US-ASCII.
> AISB, US-ASCII is a *7*-bit encoding.

I know. As with a previous post of yours that I haven't got round to
replying to yet I was referring to the report from mb_detect_encoding().
It reports ASCII or UTF-8 in my tests. I can choose to permit only ASCII
strings.

>> would not be suitable for byte-wise processing. That is a bit of a gotcha.
>
> <http://php.net/manual/en/ref.mbstring.php>

Thanks. I have a lot to read.

...

>> 3. add a leading slash if there is not one
>
> To what end?

I thought that I got a response from one test where the result did not
have a leading slash. I didn't make a note of the details so am not so
sure now but it does not matter much. Forcing there to be a leading
slash costs next to nothing, guards against the possibility of that
happening, and makes later processing (including certain misbehaving PHP
functions) simpler.

>> 4. split by slash characters
>
> That would leave an empty first element of the resulting array, …
>
>> 5. check that each part is nonblank and has a suitable form for a key
>
> … so this test would fail.

Yes. I was not as precise in my description as I was in the code.

> I still think that your approach is wrong, but unless you state your use-
> case in more detail, I cannot be certain.
>
>> I might add a check that all letters are lower case. That would help
>> generate a consistent response irrespective of the OS the script were to
>> run on.
>
> So you are mapping URI paths to files. Know then that it is not so much a
> matter of the operating system, but of the *file* system how filenames
> are handled. Also, it is inherently unsafe. Why are you doing this?

Not exactly. Let me take a small step backwards to try to explain better.

The idea is, first, that the entire URL is a globally unique key for the
data being addressed. Except as defined by certain rules which
complicate this a bit but are not relevant here there should not be
multiple URLs which map to the same data. I therefore vet the URLs that
my PHP script sees to make sure that they do not contain any redundant
information, and I do not allow directory manipulations such as are
often allowed for filesystem access (especially dot and double-dot entries).

Second, the elements of the URL are not filesystem locations. They are
keys. My current code maps those keys to directories but it could just
as well map them to something else like a database.

Third, the URLs are to be permanent or as long-lasting as is feasible. A
given URL should remain valid even if the underlying storage system is
changed. Basically, the unique URLs and the keys they contain are
intended to outlive the implementation method.

IMO it is best initially to require that the keys (which are presented
to the PHP script by means of the URL) be simple. Restrictions can then
be relaxed without breaking existing URLs.

Do you still think that the approach is unsafe?

James

Thomas 'PointedEars' Lahn

unread,

Feb 20, 2016, 5:49:05 PM2/20/16

to

James Harris wrote:

> […] Let me give an example with this URL

>
> http://example.com/folder/file?greet=hello;lang=en#top
>
> If that was presented to the server I would need to see the following
> string
>
> /folder/file?greet=hello&lang=en

You would not, unless either

a) the used index file of the document root,
b) “folder”, or
c) “folder/file”

were PHP programs, and as for the latter two you had enabled content
negotiation. But then, either

a) $_SERVER['PATH_INFO'] === '/folder/file',
b) $_SERVER['PATH_INFO'] === '/file', or
c) you had found the file already.

Because you would not rewrite *every* request to the same PHP program, would
you?

> In my PHP code I would report that as invalid because the query string
> is unacceptable.
>
> I need to see the query string so that I can report a request containing

> one as invalid. […]

Again, why? If you are only interested in the path, what does it matter if
there is also a query _part_?

And if you are using $_SERVER['PATH_INFO'] for mapping, if you must exclude
the case that people also specified a query part you can still test for
$_SERVER['QUERY_STRING'] or a set and non-empty $_GET array. Although it
can be done, I can still see no reason to sanitize $_SERVER['REQUEST_URI'].

> BTW, is there a $_SERVER element that would return the #top part as well?

No, as that is not a part of the *request* URI.

--
PointedEars
Zend Certified PHP Engineer
<http://www.zend.com/en/yellow-pages/ZEND024953> | Twitter: @PointedEars2

Please do not cc me. / Bitte keine Kopien per E-Mail.a

James Harris

unread,

Feb 21, 2016, 4:37:51 AM2/21/16

to

On 20/02/2016 22:48, Thomas 'PointedEars' Lahn wrote:
> James Harris wrote:
>
>> […] Let me give an example with this URL
>>
>> http://example.com/folder/file?greet=hello;lang=en#top
>>
>> If that was presented to the server I would need to see the following
>> string
>>
>> /folder/file?greet=hello&lang=en
>
> You would not, unless either
>
> a) the used index file of the document root,
> b) “folder”, or
> c) “folder/file”
>
> were PHP programs, and as for the latter two you had enabled content
> negotiation. But then, either
>
> a) $_SERVER['PATH_INFO'] === '/folder/file',
> b) $_SERVER['PATH_INFO'] === '/file', or
> c) you had found the file already.
>
> Because you would not rewrite *every* request to the same PHP program, would
> you?

Pretty much, yes. All of this is about processing such URLs in PHP code.

>> In my PHP code I would report that as invalid because the query string
>> is unacceptable.
>>
>> I need to see the query string so that I can report a request containing
>> one as invalid. […]
>
> Again, why? If you are only interested in the path, what does it matter if
> there is also a query _part_?

As I explained in another post, the URL is treated for this as a unique
key. Aside from certain rules for matching, it would be wrong in this
application for two URLs to map to the same data.

> And if you are using $_SERVER['PATH_INFO'] for mapping, if you must exclude
> the case that people also specified a query part you can still test for
> $_SERVER['QUERY_STRING'] or a set and non-empty $_GET array. Although it
> can be done, I can still see no reason to sanitize $_SERVER['REQUEST_URI'].

Yes, that's true, I could check that the query string is absent. Good point.

>> BTW, is there a $_SERVER element that would return the #top part as well?
>
> No, as that is not a part of the *request* URI.

OK.

James

Thomas 'PointedEars' Lahn

unread,

Feb 21, 2016, 8:42:36 AM2/21/16

to

James Harris wrote:

> On 20/02/2016 22:48, Thomas 'PointedEars' Lahn wrote:
>> Because you would not rewrite *every* request to the same PHP program,
>> would you?
>
> Pretty much, yes. All of this is about processing such URLs in PHP code.

That is a stupid idea.

Thomas 'PointedEars' Lahn

unread,

Feb 21, 2016, 9:02:09 AM2/21/16

to

James Harris wrote:

> On 20/02/2016 20:37, Thomas 'PointedEars' Lahn wrote:
>> James Harris wrote:
>>> I might add a check that all letters are lower case. That would help
>>> generate a consistent response irrespective of the OS the script were to
>>> run on.
>> So you are mapping URI paths to files. Know then that it is not so much
>> a matter of the operating system, but of the *file* system how filenames
>> are handled. Also, it is inherently unsafe. Why are you doing this?
> Not exactly. Let me take a small step backwards to try to explain better.
>
> The idea is, first, that the entire URL is a globally unique key for the
> data being addressed.

That is implicit, you do not need PHP for it.

> Except as defined by certain rules which complicate this a bit but are not
> relevant here there should not be multiple URLs which map to the same
> data.

Also implicit, hence Uniform Resource *Identifier*.

> I therefore vet the URLs that my PHP script sees to make sure that they do
> not contain any redundant information,

Such as?

> and I do not allow directory manipulations such as are
> often allowed for filesystem access (especially dot and double-dot
> entries).

You would not have that problem if you would not access the filesystem
through PHP in the first place.

> Second, the elements of the URL are not filesystem locations. They are
> keys. My current code maps those keys to directories but it could just
> as well map them to something else like a database.

That does not mean that every request URI needs to be rewritten to the same
PHP program.

> Third, the URLs are to be permanent or as long-lasting as is feasible. A
> given URL should remain valid even if the underlying storage system is
> changed. Basically, the unique URLs and the keys they contain are
> intended to outlive the implementation method.

Server-side redirection can take care of that, too.

<https://www.w3.org/QA/Tips/reback>

> IMO it is best initially to require that the keys (which are presented
> to the PHP script by means of the URL) be simple. Restrictions can then
> be relaxed without breaking existing URLs.
>
> Do you still think that the approach is unsafe?

More than that; I think it is nonsense. But as you keep being nebulous I am
getting the impression that I am wasting my precious time with this.

<http://meta.stackexchange.com/a/66378/178570>

James Harris

unread,

Feb 21, 2016, 10:02:48 AM2/21/16

to

On 21/02/2016 14:02, Thomas 'PointedEars' Lahn wrote:

...

> More than that; I think it is nonsense. But as you keep being nebulous I am
> getting the impression that I am wasting my precious time with this.

You have recently been talking about things which are not relevant to
what I asked so while I thank you for the earlier points you made which
were useful I agree with you that it's best for you not to waste any
more of your time on this topic.

James

James Harris

unread,

Feb 21, 2016, 10:03:33 AM2/21/16

to

On 21/02/2016 13:42, Thomas 'PointedEars' Lahn wrote:
> James Harris wrote:
>
>> On 20/02/2016 22:48, Thomas 'PointedEars' Lahn wrote:
>>> Because you would not rewrite *every* request to the same PHP program,
>>> would you?
>>
>> Pretty much, yes. All of this is about processing such URLs in PHP code.
>
> That is a stupid idea.

No, it's the right idea for my application.

James

Jerry Stuckle

unread,

Feb 21, 2016, 2:09:36 PM2/21/16

to

One of the rare times I agree with Pointed Head. This is a stupid idea.

--

James Harris

unread,

Feb 21, 2016, 2:35:01 PM2/21/16

to

On 21/02/2016 19:09, Jerry Stuckle wrote:
> On 2/21/2016 10:03 AM, James Harris wrote:
>> On 21/02/2016 13:42, Thomas 'PointedEars' Lahn wrote:
>>> James Harris wrote:
>>>
>>>> On 20/02/2016 22:48, Thomas 'PointedEars' Lahn wrote:
>>>>> Because you would not rewrite *every* request to the same PHP program,
>>>>> would you?
>>>>
>>>> Pretty much, yes. All of this is about processing such URLs in PHP code.
>>>
>>> That is a stupid idea.
>>
>> No, it's the right idea for my application.

>

> One of the rare times I agree with Pointed Head. This is a stupid idea.

OK, I'll bite. Why?

James

Jerry Stuckle

unread,

Feb 21, 2016, 2:49:58 PM2/21/16

to

For all the reasons others and I have mentioned previously. Read back
through the thread.

James Harris

unread,

Feb 21, 2016, 2:57:17 PM2/21/16

to

On 21/02/2016 19:49, Jerry Stuckle wrote:
> On 2/21/2016 2:34 PM, James Harris wrote:
>> On 21/02/2016 19:09, Jerry Stuckle wrote:

...

>>> One of the rare times I agree with Pointed Head. This is a stupid idea.
>>
>> OK, I'll bite. Why?

> For all the reasons others and I have mentioned previously. Read back
> through the thread.

I read every post. I wouldn't ask a question and then not read the replies.

James

Thomas 'PointedEars' Lahn

unread,

Feb 21, 2016, 6:10:46 PM2/21/16

to

James Harris wrote:

> On 21/02/2016 19:09, Jerry Stuckle wrote:
>> On 2/21/2016 10:03 AM, James Harris wrote:
>>> On 21/02/2016 13:42, Thomas 'PointedEars' Lahn wrote:
>>>> James Harris wrote:
>>>>> On 20/02/2016 22:48, Thomas 'PointedEars' Lahn wrote:
>>>>>> Because you would not rewrite *every* request to the same PHP
>>>>>> program, would you?
>>>>> Pretty much, yes. All of this is about processing such URLs in PHP
>>>>> code.
>>>> That is a stupid idea.
>>> No, it's the right idea for my application.

>> […] This is a stupid idea.

>
> OK, I'll bite. Why?

You want to run PHP on top of Web server software (Apache) and have PHP do
what Web server software is supposed to do instead.

James Harris

unread,

Feb 21, 2016, 6:46:19 PM2/21/16

to

On 21/02/2016 23:10, Thomas 'PointedEars' Lahn wrote:
> James Harris wrote:
>
>> On 21/02/2016 19:09, Jerry Stuckle wrote:
>>> On 2/21/2016 10:03 AM, James Harris wrote:
>>>> On 21/02/2016 13:42, Thomas 'PointedEars' Lahn wrote:
>>>>> James Harris wrote:
>>>>>> On 20/02/2016 22:48, Thomas 'PointedEars' Lahn wrote:
>>>>>>> Because you would not rewrite *every* request to the same PHP
>>>>>>> program, would you?
>>>>>> Pretty much, yes. All of this is about processing such URLs in PHP
>>>>>> code.
>>>>> That is a stupid idea.
>>>> No, it's the right idea for my application.
>>> […] This is a stupid idea.
>>
>> OK, I'll bite. Why?
>
> You want to run PHP on top of Web server software (Apache) and have PHP do
> what Web server software is supposed to do instead.

Before I explain the details is this an objection to do with
performance? If Apache and PHP could both carry out some or all of the
work why exactly would you choose Apache over PHP?

To explain why PHP is needed (and I think this was one of the points we
discussed), yes, the requests get directed to a PHP script but then the
PHP script transforms the URL in ways that Apache cannot.

Specifically, given the URL http://example.com/B/C/D:

* B is the partition. The PHP script uses the partition name as a folder
and looks in that folder for a configuration file. Each partition will
be able to have a separate configuration.

* The configuration file tells the script how to transform the rest of
the URL.

* The PHP script transforms the C/D part of the URL into a set of keys
and uses those to locate a resource.

If nothing else, Apache cannot do this as the configuration file format
will not be meaningful to Apache.

--
James

Jerry Stuckle

unread,

Feb 21, 2016, 8:29:45 PM2/21/16

to

Obviously you didn't, or you didn't understand. There have been many
objections to what you're trying to do by several people.

James Harris

unread,

Feb 22, 2016, 2:30:47 AM2/22/16

to

On 22/02/2016 01:29, Jerry Stuckle wrote:
> On 2/21/2016 2:56 PM, James Harris wrote:
>> On 21/02/2016 19:49, Jerry Stuckle wrote:
>>> On 2/21/2016 2:34 PM, James Harris wrote:
>>>> On 21/02/2016 19:09, Jerry Stuckle wrote:
>>
>> ...
>>
>>>>> One of the rare times I agree with Pointed Head. This is a stupid
>>>>> idea.
>>>>
>>>> OK, I'll bite. Why?
>>
>>> For all the reasons others and I have mentioned previously. Read back
>>> through the thread.
>>
>> I read every post. I wouldn't ask a question and then not read the replies.

> Obviously you didn't,

Yes, I did.

> or you didn't understand.

Well, it seems that someone didn't understand.

> There have been many
> objections to what you're trying to do by several people.

As a matter of courtesy I have tried to respond to all the questions I
was asked during the discussion as it went ahead. I don't remember
anyone having remaining issues with my responses. I don't see what more
a person can do to respond to your objections.

James

Jerry Stuckle

unread,

Feb 22, 2016, 8:13:48 AM2/22/16

to

You're hopeless. I'm not going to waste any more time trying to help you.

James Harris

unread,

Feb 22, 2016, 10:02:12 AM2/22/16

to

You're arrogant. A lot of your comments are not helpful but are about
your opinions. I'm not going to waste any more time replying to your
efforts to keep this thread going.

Not my style but that seems to be how you like to communicate.

James

Jerry Stuckle

unread,

Feb 22, 2016, 10:31:10 AM2/22/16

to

No, you just don't take advice from those who know more than you. And
I'm including virtually everyone who has replied to you in this thread.
You have taken none of their advice.

No, I'm not arrogant. I just know it's a waste of time to try to teach
a pig to sing.

Gordon Burditt

unread,

Feb 29, 2016, 10:30:07 PM2/29/16

to

A URL is a string. If you don't do anything potentially dangerous
with a string, there is no security issue. Contrary to popular
opinion, there is no computer equivalent to the "killer joke", which
kills people who hear the joke, if the computer doesn't process it.

You haven't said whether or not the URL points to YOUR servers or
not. Example: the vast majority of the URLs in Google's search
engine point to somewhere besides Google. That's the whole point
of a worldwide search engine, right? It's also not at all uncommon
for ads on a website to have URLs that point to the advertiser's
website, not the website you're viewing. A PHP page might be set up
to select a random ad, output a link to it, and log the page view,
so subsequent runs prefer ads with fewer views.

Some potentially dangerous things to do with a string in PHP, especially
if it contains user-supplied data, or comes from a database that might
have unvetted user-supplied data:
(1) Use parts of the string to reference the file system.
(bypassing web server permission restrictions) Quoting doesn't
really work here - you need to reject attempts to access outside
the section of the filesystem.
(2) Use parts of the string in a SQL query without proper quoting.
(SQL injection)
(3) Use the string in content passed to eval() without proper quoting.
(Executing arbitrary code)
(4) Use the string in content passed to a shell without proper quoting.
(Executing arbitrary code, `rm -rf /` being the standard bad example)
WARNING: DO NOT USE THIS POST AS A SHELL "HERE" DOCUMENT.
(5) Use the string as an email address passed to a mail transport service.
(Spamming, injecting extra destinations into email headers.)
(6) Output the string to a web page. (Possible XSS attack.)
(7) Validate the string using only Javascript, which can be turned off,
or with HTML (such as input length limits), which can be bypassed,
say, by telnet to the HTTP port, with a URL manually typed in.
(Bots tend not to use real browsers anyway.) The string should also
be validated on the server side, although letting the user know of a
problem BEFORE pressing submit is more user-friendly.
(8) Using a non-constant string (subject to variable substitution) in
a filesystem (or, *MUCH WORSE*, URL) reference, such as include,
require, etc.
(9) Using even a constant string that refers to a website you don't
control in a reference that executes the output as code, such as
include, require, etc. DNS spoofing or playing games with ARP
could make even references to *YOUR* sites dangerous.
(10) It's generally a bad idea to alter data in response to a HTTP GET
request, such as transferring money, ordering merchandise, or
deleting records (think about what a webcrawler might do to your
database!). Use HTTP POST for that. Logging hits and page view
counts is an exception.
(11) The data fetched from a URL should be treated as user-supplied.
Lots of other stuff I forgot to mention.

For example:
http://www.google.com/news/../../../../../etc/passwd
is not dangerous because:
(1) It does not refer to YOUR server, so it's not YOUR security problem.
(2) If it *DOES* refer to YOUR server, rejecting the string doesn't
improve the situation much if a direct request to your web server
processes it anyway. However, you need to avoid introducing new
problems of this type by copying a string that is part of a URL
reference to a filesystem reference.

> Sorry. I think I have been at fault in not being clear that although I
> am getting data from a URL my application is really using the URL path
> as a unique key.

You appear to be using it as part of a *reference into your
filesystem*, which is dangerous (but it can be made safe with
appropriate checking), and that should suggest what checks are
necessary. You also may have to deal with the "unique key" issue
of not really being unique if your filesystem is case-insensitive
but the URL is case-sensitive.

If the URL is supposed to refer to YOUR site, you can put arbitrary
restrictions on what is valid. For example, you might allow / (if
it's first, and only one of them), and 0, O, o, i, I, L, l, 1, !,
and | (Note: look carefully - there are no repeat characters in
that list, which intentionally contains characters that are hard
to distinguish from each other) (and absolutely *NO* other characters).
This might not make a whole lot of sense, but it's allowed. That
ends the problem with ../ . It's no longer just a URL, it's a
reference into your filesystem. Or database, or whatever.
../ may be a problem in a filesystem reference but not in a database reference.

> What the URL contains after the site name is taken as a
> hierarchical sequence of key values. In

Sequence of key values, or filesystem reference? There's a difference.
A sequence of key values doesn't have an issue with ../ other than that
it's probably an invalid key. It has other meanings in the filesystem.

> http://site.com/folder/subfolder/file

pittendrigh

unread,

Mar 22, 2016, 8:23:20 AM3/22/16

to

Haven't logged in for a decade. Not surprised to find it's deja vu all over again.