Confusion with PSR-7 UriInterface

53 views
Skip to first unread message

pipfros...@gmail.com

unread,
Mar 30, 2019, 12:50:18 PM3/30/19
to PHP Framework Interoperability Group
Blocker Confusion -

The interface method:

    public function withPath($path);

requires

    * @throws \InvalidArgumentException for invalid paths.

However in https://tools.ietf.org/html/rfc3986#section-3.3 I do not see what constitutes an invalid path.

Is that a bug in the interface or it it expected that the class validate the path according to the RFCs specific to each scheme?

The problem I have with the latter interpretation is twofold:

A) It makes it impossible to create a class that is generic in scope for scheme, which means if class A implements the interface and class B implements the interface you can't just swap class A for B in use because they may implement different schemes.

B) The "withScheme($scheme)" method requires producing an identical instance where only the scheme has changed, but that may then result in a new instance where the path is not valid for that particular scheme because all the other properties of the class must be retained.

So... are we really suppose to validate the path according to the RFC for the scheme or would it be better to leave validation of the the path part of the URI to tools specific to the scheme by the software that uses the class to build a URI?

Matthew Weier O'Phinney

unread,
Mar 31, 2019, 2:05:07 PM3/31/19
to php...@googlegroups.com
On Sat, Mar 30, 2019 at 11:50 AM <pipfros...@gmail.com> wrote:
Blocker Confusion -

The interface method:

    public function withPath($path);

requires

    * @throws \InvalidArgumentException for invalid paths.

However in https://tools.ietf.org/html/rfc3986#section-3.3 I do not see what constitutes an invalid path.

That section defines the ABNF (Augmented Backus-Naur Form, a specification for defining syntax rules), and defines what character sequences are allowed for paths. When we indicate that the path must be valid, we are indicating it must be valid per the ABNF defined in that section.

It is specifically NOT requiring that you validate that the path is valid _for the domain or the scheme_, only that it is well-formed.
 
Is that a bug in the interface or it it expected that the class validate the path according to the RFCs specific to each scheme?

The path segment of a URI has a single ABNF, as defined in the section you link. I'm not quite sure what you're inferring here, but I'm wondering if perhaps you're looking at a different specification?

The problem I have with the latter interpretation is twofold:

A) It makes it impossible to create a class that is generic in scope for scheme, which means if class A implements the interface and class B implements the interface you can't just swap class A for B in use because they may implement different schemes.


B) The "withScheme($scheme)" method requires producing an identical instance where only the scheme has changed, but that may then result in a new instance where the path is not valid for that particular scheme because all the other properties of the class must be retained.

So... are we really suppose to validate the path according to the RFC for the scheme or would it be better to leave validation of the the path part of the URI to tools specific to the scheme by the software that uses the class to build a URI?

PSR-7 only defines URIs for the HTTP and HTTPS schemes. This was a very deliberate choice, as we felt defining something that was generic enough to fit any possible scheme was out-of-scope for a specification focusing on HTTP messages. As such, the ABNF for path specifications is the same, and there's no reason to revalidate the path.

There was some talk at one point of creating a general specification for URI interface(s), but nobody has moved forward as far as a proposal at this point.

--
he/him

pipfros...@gmail.com

unread,
Apr 4, 2019, 7:44:53 AM4/4/19
to PHP Framework Interoperability Group
Okay thanks. Mine implementation is going to be a general implementation, but with http/https/ftp/sftp the same path rules apply so what I'm doing is throwing an exception if changing to scheme where the path no longer validates.

One question still remains on key=value query normalization, I can't seem to find an RFC on it or even a best practices for when there are duplicate keys.
It seems some give priority to first, some give priority to last, some serialize them as a key=[value1,...,valueN] comma delimited array, some only do the array thing if the were presented as key=[value1]&key=[value2]

Is there any kind of RFC or best practices guide for dealing with key=value pairs in the context of GET query?

Obviously for the email scheme one can't serialize them as an array, but for HTTP/HTTPS they potentially could be and sometimes are, I just don't know if that is a frowned upon practice or not or what is *suppose* to be done. Currently I just throw an InvalidArgument exception for duplicate keys but if there are normalization rules I would rather follow them.

Matthew Weier O'Phinney

unread,
Apr 4, 2019, 10:40:05 AM4/4/19
to php...@googlegroups.com
On Thu, Apr 4, 2019 at 6:44 AM <pipfros...@gmail.com> wrote:
Okay thanks. Mine implementation is going to be a general implementation, but with http/https/ftp/sftp the same path rules apply so what I'm doing is throwing an exception if changing to scheme where the path no longer validates.

One question still remains on key=value query normalization, I can't seem to find an RFC on it or even a best practices for when there are duplicate keys.
It seems some give priority to first, some give priority to last, some serialize them as a key=[value1,...,valueN] comma delimited array, some only do the array thing if the were presented as key=[value1]&key=[value2]

Is there any kind of RFC or best practices guide for dealing with key=value pairs in the context of GET query?

Obviously for the email scheme one can't serialize them as an array, but for HTTP/HTTPS they potentially could be and sometimes are, I just don't know if that is a frowned upon practice or not or what is *suppose* to be done. Currently I just throw an InvalidArgument exception for duplicate keys but if there are normalization rules I would rather follow them.

IIRC, IETF 3986 says that how they are normalized is up to the consumer (which could be the server, php-fpm, a library, etc.)

Maybe see what PHP's parse_str() does?
 

On Saturday, March 30, 2019 at 9:50:18 AM UTC-7, pipfros...@gmail.com wrote:
Blocker Confusion -

The interface method:

    public function withPath($path);

requires

    * @throws \InvalidArgumentException for invalid paths.

However in https://tools.ietf.org/html/rfc3986#section-3.3 I do not see what constitutes an invalid path.

Is that a bug in the interface or it it expected that the class validate the path according to the RFCs specific to each scheme?

The problem I have with the latter interpretation is twofold:

A) It makes it impossible to create a class that is generic in scope for scheme, which means if class A implements the interface and class B implements the interface you can't just swap class A for B in use because they may implement different schemes.

B) The "withScheme($scheme)" method requires producing an identical instance where only the scheme has changed, but that may then result in a new instance where the path is not valid for that particular scheme because all the other properties of the class must be retained.

So... are we really suppose to validate the path according to the RFC for the scheme or would it be better to leave validation of the the path part of the URI to tools specific to the scheme by the software that uses the class to build a URI?

--
You received this message because you are subscribed to the Google Groups "PHP Framework Interoperability Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to php-fig+u...@googlegroups.com.
To post to this group, send email to php...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/php-fig/85e91d49-aa8d-4567-92c4-3649d42ba2b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dan Hunsaker

unread,
Apr 4, 2019, 5:09:46 PM4/4/19
to PHP Framework Interoperability Group
When browsers build query strings for, say, multiselect inputs, they simply repeat the key with the new value. That's why, when sending forms to PHP, you have to set the name of such fields to end in `[]` - the way PHP deserializes query strings only preserves the last value unless you tell it to stuff them in an array. So while there isn't exactly a "canonical" way to serialize duplicate keys, I'd probably follow the browsers' lead and add them all as `key=value1&key=value2&key=value3&...`, and let the developer worry about making sure to use `key[]` for the names when necessary.

Reply all
Reply to author
Forward
0 new messages