[PSR-7] URI encoding clarification

108 views
Skip to first unread message

Michael Dowling

unread,
Feb 25, 2015, 9:22:02 PM2/25/15
to php...@googlegroups.com
The UriInterface doesn't mention anything about how the data in the URI should be encoded. I would assume that calling withPath() and using characters that are not allowed in the path would be percent encoded. I would assume the same thing for query, host, and fragment, meaning if I cast it to a string then it's a valid URI, and if I call getPath() (and others) it returns the properly percent-encoded value.

Because this isn't explicitly called out, I think could leave room for inconsistencies in implementations. The way that ZF2 handles this (and my PSR-7 implementation as well) is to percent encode characters that aren't allowed, and not double-encode any characters. This means that you are free to percent encode the value before you give it to the URI using one of the with() methods, or you can just rely on the URI to do the encoding for you. I would propose that the URI strictly follows RFC 3986 and not encode things that are allowed (i.e., don't just use rawurlencode() on each section, but maybe a regex based on the different reserved and unreserved characters for each part of a URI).

See RFC 3986 for more information: http://www.ietf.org/rfc/rfc3986.txt (specifically section 2.1

For example, here's how the query and path are filtered in my implementation: https://github.com/guzzle/psr7/blob/master/src/Uri.php#L491

Because this is just a clarification in behavior and a docblock change, I don't think this is significant enough to knock this out of review, and I think it will help greatly with interoperability.

-Michael

Beau Simensen

unread,
Feb 26, 2015, 2:24:00 AM2/26/15
to php...@googlegroups.com
On Wednesday, February 25, 2015 at 8:22:02 PM UTC-6, Michael Dowling wrote:
Because this is just a clarification in behavior and a docblock change, I don't think this is significant enough to knock this out of review, and I think it will help greatly with interoperability.

I'm of the mindset that this would qualify as a clarification provided there turns out to be nothing controversial in how we have to define the encoding. For example, if enforcing RFC 3986 is objectionable to some, I can see a case for this moving away from simply being a clarification.

Larry Garfield

unread,
Feb 26, 2015, 1:24:03 PM2/26/15
to php...@googlegroups.com
I haven't looked at the specifics yet, but in general "more carefully
enforce the RFC that this interface is modeling" sounds like a safe an
unobjectionable thing to do.

--Larry Garfield

Matthew Weier O'Phinney

unread,
Feb 26, 2015, 3:29:23 PM2/26/15
to php...@googlegroups.com
Agreed; this is in the same category as ensuring that
multipart/form-data POST requests are represented in ParsedBody; it's
not a technical change, but a slight tightening of the intention.



--
Matthew Weier O'Phinney
mweiero...@gmail.com
https://mwop.net/

Matthew Weier O'Phinney

unread,
Mar 3, 2015, 5:15:26 PM3/3/15
to php...@googlegroups.com
I've created pull requests for this:

- https://github.com/php-fig/fig-standards/pull/449
- https://github.com/php-fig/http-message/pull/27

Michael and Evert, can you please review?

tob...@gmail.com

unread,
Mar 4, 2015, 12:05:53 PM3/4/15
to php...@googlegroups.com
Btw, I already tried to initiate the discussing about encoding (which is part of normalization) in https://github.com/php-fig/fig-standards/issues/426

Matthew Weier O'Phinney

unread,
Mar 4, 2015, 2:11:43 PM3/4/15
to php...@googlegroups.com
On Wed, Mar 4, 2015 at 9:05 AM, <tob...@gmail.com> wrote:
> Btw, I already tried to initiate the discussing about encoding (which is
> part of normalization) in
> https://github.com/php-fig/fig-standards/issues/426

You mention percent-encoding in passing in that, but it's more fully
addressed in the PR I've submitted.

You have two other concerns in the issue you opened, however, that do
not have anything to do with encoding, but do touch on normalization
concerns:

- whether an empty path or lack of a path should return '' or '/'.
- whether getPath() should resolve relative paths (i.e, paths that
have '..' notation)

Regarding the second, I strongly feel this is something to be done in
a utility library; the URI should reflect how it was provided (with
the exception of percent-encoding reserved characters). If you want it
resolved, pass the path to a utility function first, and pass the
resolved path to the instance.

Regarding the first, as you note in the issue, phly/http actually has
the second behavior (returning '/' for empty path), but PSR-7 says to
return an empty string. I found with phly/conduit that it was far
easier to assume I have a path, than to need to check for an empty
string. As such, I'm of the mind that we should specify this
normalization as part of the specification, but, as noted on the
issue, I'm unsure if you're recommending that or not.

If you agree, I'll get a PR to make that clarification in place.
> --
> You received this message because you are subscribed to the Google Groups
> "PHP Framework Interoperability Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to php-fig+u...@googlegroups.com.
> To post to this group, send email to php...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/php-fig/bdd21fdc-f99b-438c-bed2-946ce2677c75%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages