URIs

338 views
Skip to first unread message

Chris Wilkinson

unread,
Jan 21, 2013, 8:17:02 AM1/21/13
to php...@googlegroups.com
Hi,

On my HTTP message proposal I only had the URL as a string, which is simple to implement but not very useful for the end user.

Most of the libraries that I looked at had ways of "doing things" with it:

Has an object/interface

Methods on request

Property on request (ie just a string)

There are also some libraries (albeit not widely adopted) that are just URI objects:


As a result, I think it would be useful to have a URL interface for the HTTP message PSR to use (so setUrl() would accept a URL object or a string, and getUrl() would return a URL object).

Since they are part of the defined URI standard though, I think it would be even better to have a URI PSR "done properly".

I've had a go at creating an interface (which is split into two sub-interfaces - the main one is down the bottom!):


Don't worry about exact syntax etc yet, I'm sure there's things wrong, things that I've overlooked (hence not a PR yet). It also doesn't handle normalization, resolution, and relativization (maybe it shouldn't directly?); URI templates and IRIs haven't be considered (yet!) either (Requests in the only library above that has the latter).

There isn't an actual UrlInterface there either (there could be one extending HierarchicalUriInterface, but I haven't worked out if it's actually useful to have one separated or not).

What do you think to this idea though? Too much? Or is this an area worth completing?

Chris

Paul Jones

unread,
Jan 21, 2013, 9:48:07 AM1/21/13
to php...@googlegroups.com

On Jan 21, 2013, at 7:17 AM, Chris Wilkinson wrote:

> What do you think to this idea though? Too much? Or is this an area worth completing?
First off, and once again, great research!

It is my opinion that the HTTP interface doesn't strictly require a URI/URL interface. It seems to me that passing a plain string is sufficient. (Creating a separate PSR for URI/URL manipulation might be wise; it would then be straightforward to manipulate a URI object and cast it to string when setting it into the HTTP interface.)


-- pmj

Paul Dragoonis

unread,
Jan 21, 2013, 9:49:26 AM1/21/13
to php...@googlegroups.com
I agree with string first, and fancy objects later. Thanks.



--
You received this message because you are subscribed to the Google Groups "PHP Framework Interoperability Group" group.
To post to this group, send email to php...@googlegroups.com.
To unsubscribe from this group, send email to php-fig+u...@googlegroups.com.

John Patrick Gerdeman

unread,
Jan 21, 2013, 4:13:54 PM1/21/13
to php...@googlegroups.com
Looking at the examples and the gist, I think UriInterface can be boiled down to what is already provided by parse_url and parse_str. That is a __toString() function and accessors for:

scheme
host
port
user
pass
path
query
fragment

where query is an array of query parameters. The compound parts, i.e hierarchical part, authority and scheme-specific part are easily deducible from this. I'd also suggest a normalize function (see rfc 3986 and wikipedia), which would make it possible to compare uris.

Derived from your gist https://gist.github.com/4589180

Could you elaborate on the reason why - in your gist - one shouldn't use UriInterface directly?

Ryan McCue

unread,
Jan 21, 2013, 7:13:31 PM1/21/13
to php...@googlegroups.com
Chris Wilkinson wrote:
> Don't worry about exact syntax etc yet, I'm sure there's things wrong,
> things that I've overlooked (hence not a PR yet). It also doesn't handle
> normalization, resolution, and relativization (maybe it shouldn't
> directly?); URI templates and IRIs haven't be considered (yet!) either
> (Requests in the only library above that has the latter).

The fact that Requests has IRI support is more a historical quirk than
anything: the person that wrote it basically wrote directly off the IRI
spec. It's actually made things harder in terms of creating URIs from
it, and there's not really any need, so I'd say it's not something we need.

That said, Requests_IRI is there for the taking if anyone wants it. :)

--
Ryan McCue
<http://ryanmccue.info/>

Larry Garfield

unread,
Jan 21, 2013, 7:15:45 PM1/21/13
to php...@googlegroups.com
On 1/21/13 3:13 PM, John Patrick Gerdeman wrote:
> Looking at the examples and the gist, I think UriInterface can be boiled
> down to what is already provided by parse_url
> <http://php.net/manual/en/function.parse-url.php> and parse_str
> <http://php.net/manual/en/function.parse-str.php>. That is a
> __toString() function and accessors for:
>
> scheme
> host
> port
> user
> pass
> path
> query
> fragment
>
> where query is an array of query parameters. The compound parts, i.e
> hierarchical part, authority and scheme-specific part are easily
> deducible from this. I'd also suggest a normalize function (see rfc 3986
> <http://tools.ietf.org/html/rfc3986#section-6> and wikipedia
> <http://en.wikipedia.org/wiki/URL_normalization>), which would make it
> possible to compare uris.
>
> Derived from your gist https://gist.github.com/4589180
>
> Could you elaborate on the reason why - in your gist - one shouldn't use
> UriInterface directly?

Using the terminology from parse_url() seems sensible, unless that is
inconsistent with the actual HTTP spec. Speaking of, is anyone
following HTTP 2.0 at all? Is there anything going on there that would
affect this effort? (I have no idea, personally.)

My concern with "strings are fine for now" is that if by "for now" we
mean "a released PSR", then it means we have to either make a follow-up
Http PSR after the URI PSR is sorted out, or just assume that you'll
always pass a string URI to the Http objects. However, if that Http
object needs that data parsed out, it then has to reparse it itself,
even though the parsed data was already available. That's wasteful and
error prone.

If by "for now" we mean "let's work on these in parallel and then merge
them before a for-reals PSR is published", then I think that's quite
sensible.

--Larry Garfield

Ryan McCue

unread,
Jan 21, 2013, 7:21:44 PM1/21/13
to php...@googlegroups.com
John Patrick Gerdeman wrote:
> Looking at the examples and the gist, I think UriInterface can be boiled
> down to what is already provided by parse_url
I think one thing that definitely needs to be in UriInterface is a way
to make relative URIs absolute. It's a fairly heavy parsing process and
it's best handled by UriInterface rather than at the Http level.

Chris Wilkinson

unread,
Jan 22, 2013, 4:00:12 AM1/22/13
to php...@googlegroups.com
On Monday, 21 January 2013 21:13:54 UTC, John Patrick Gerdeman wrote:
Looking at the examples and the gist, I think UriInterface can be boiled down to what is already provided by parse_url and parse_str. That is a __toString() function and accessors for:

scheme
host
port
user
pass
path
query
fragment

where query is an array of query parameters. The compound parts, i.e hierarchical part, authority and scheme-specific part are easily deducible from this. I'd also suggest a normalize function (see rfc 3986 and wikipedia), which would make it possible to compare uris.

Derived from your gist https://gist.github.com/4589180

This covers some URIs (to be honest the ones that we are actually interested in, aka URLs), but unless it covers them all I don't think it should be called UriInterface. If it were to be just a single UrlInterface that would work.

I didn't include separate user/password parts as that's deprecated (see http://tools.ietf.org/html/rfc3986#section-3.2.1). I think it should be down to the implementer if they want it.
 
Also, I had getHierarchicalPart() on HierarchicalUriInterface to overwrite the docblock from UriInterface (all URIs have it, not just "hierarchical" ones). I've just added it to my OpaqueUriInterface as well for clarity.

On Monday, 21 January 2013 21:13:54 UTC, John Patrick Gerdeman wrote:
Could you elaborate on the reason why - in your gist - one shouldn't use UriInterface directly?

I created the two sub-interfaces as URIs are either one or the other. This means that a HTTP request interface can require/provide the right URI form (as opposed to be open to any URI), and, inversely, if you need a URN you can also require/provide the right form (you would still have to check the scheme though).
 
On Tuesday, 22 January 2013 00:15:45 UTC, Larry Garfield wrote:
Speaking of, is anyone following HTTP 2.0 at all?  Is there anything
going on there that would affect this effort?  (I have no idea, personally.) 

Not this part as far as I'm aware. There's some header changes which will affect the HTTP message proposal.

Chris

John Patrick Gerdeman

unread,
Jan 22, 2013, 11:05:27 AM1/22/13
to php...@googlegroups.com
Larry Garfield:



Using the terminology from parse_url() seems sensible, unless that is 
inconsistent with the actual HTTP spec.  

The RFC defines Scheme, Authority, Path, Query and Fragment. Where Authority consists of User Information, Host and Port. User Information consists of User and Password. So the terminology of the smallest items is identical.

My concern with "strings are fine for now" is that if by "for now" we 
mean "a released PSR", then it means we have to either make a follow-up 
Http PSR after the URI PSR is sorted out, or just assume that you'll 
always pass a string URI to the Http objects.  

Or you pass something which is castable as string. But, yeah, either way you lose the object after the cast.
 
Chris Wilkinson:
This covers some URIs (to be honest the ones that we are actually interested in, aka URLs), but unless it covers them all I don't think it should be called UriInterface. If it were to be just a single UrlInterface that would work.

Which ones are we missing? I think we got them all. At least that is what I think after looking at the examples 

1.1.2. Examples

The following example URIs illustrate several URI schemes and variations in their common syntax components: ftp://ftp.is.co.za/rfc/rfc1808.txt http://www.ietf.org/rfc/rfc2396.txt ldap://[2001:db8::7]/c=GB?objectClass?one mailto:John...@example.com news:comp.infosystems.www.servers.unix tel:+1-816-555-1212 telnet://192.0.2.16:80/ urn:oasis:names:specification:docbook:dtd:xml:4.1.2
 
 
I didn't include separate user/password parts as that's deprecated (see http://tools.ietf.org/html/rfc3986#section-3.2.1). I think it should be down to the implementer if they want it.
 
Nevertheless its in the RFC and leaving it to the implementer seems wrong. Rather an implementer SHOULD issue an E_NOTICE.

 
Also, I had getHierarchicalPart() on HierarchicalUriInterface to overwrite the docblock from UriInterface (all URIs have it, not just "hierarchical" ones). I've just added it to my OpaqueUriInterface as well for clarity
You're right they're in the RFC. They're just not the pieces of finest granularity. Anyway arguing that username/password should stay because it's in the RFC and then turning around and saying the compounds shouldn't although they're in the RFC is skewed. So they should  be added.

I just don't want this to end up with an interface for each compound. I'd rather have one complete Interface.
 
I created the two sub-interfaces as URIs are either one or the other. This means that a HTTP request interface can require/provide the right URI form (as opposed to be open to any URI), and, inversely, if you need a URN you can also require/provide the right form (you would still have to check the scheme though).
I think I understand what you mean. Shouldn't those special cases be handled by the implementor? If the URI-interface is complete (i.e. all granulars and compounds), wouldn't the other two interfaces be reduced to empty marker interfaces?

John

Chris Wilkinson

unread,
Jan 23, 2013, 5:44:05 AM1/23/13
to php...@googlegroups.com
On Tuesday, 22 January 2013 16:05:27 UTC, John Patrick Gerdeman wrote:
Which ones are we missing? I think we got them all. At least that is what I think after looking at the examples 
 
I'm confused why there is still a HierarchicalUriInterface when parts of it are on the UriInterface (did you mean to remove it too?).

I more mean that it wouldn't handle opaque URIs well. Say I create a URN object, this has to have setHost() methods etc, which is wrong. Being able to implement OpaqueUriInterface would mean that I only have to implement the parts that my URI can actually have.

Nevertheless its in the RFC and leaving it to the implementer seems wrong. Rather an implementer SHOULD issue an E_NOTICE.

I'm not sure that a PSR should include something that requires that errors are triggered. While I think it's better not to split them I don't think it matters that much, if there's a consensus to split it then that's ok. :)
 
(For the record, 8 of the libraries I looked at do split them, 4 don't.)

I just don't want this to end up with an interface for each compound. I'd rather have one complete Interface.

To me a single interface is less useful, (as above) it means that implementers have to handle all potential URIs (to follow the PSR correctly) even if they only ever want one form (eg HTTP client only ever needs a hierarchical URI). Since URIs are either one form or the other (and can't change) implementing sub-interfaces makes things easier (so the HTTP client library only has to implement the hierarchical form).
 
I think I understand what you mean. Shouldn't those special cases be handled by the implementor? If the URI-interface is complete (i.e. all granulars and compounds), wouldn't the other two interfaces be reduced to empty marker interfaces?

If everything is moved to UriInterface then the sub-interfaces are redundant, you would have to have a hasAuthority() or isOpaque() method or something to be able to determine if it's the right type.

Chris

Chris Wilkinson

unread,
Jan 25, 2013, 4:59:02 AM1/25/13
to php...@googlegroups.com
Had a though about setScheme() this morning, this should be moved to HierarchicalUriInterface. This means that it's no longer defined for OpaqueUriInterface, which is better: they are always absolute, so it gives implementations the option of putting it on the constructor, or ignoring it completely. As a result, a URN implementation isn't forced to have a method which makes no sense for it (it must always be urn).

I'll create a PR over the weekend for the proposal itself, obviously it's still up for debate whether to have 3 interfaces or just 1 etc.

Chris

John Patrick Gerdeman

unread,
Jan 26, 2013, 3:34:25 AM1/26/13
to php...@googlegroups.com
On Wednesday, January 23, 2013 11:44:05 AM UTC+1, Chris Wilkinson wrote:

I more mean that it wouldn't handle opaque URIs well. Say I create a URN object, this has to have setHost() methods etc, which is wrong. Being able to implement OpaqueUriInterface would mean that I only have to implement the parts that my URI can actually have.

URN is one (of hundreds) specific scheme, URI being a superset of all schemes. If I understand you correctly OpaqueUriInterface should actually be URNInterface? Wouldn't we then have to define Interfaces for all schemes, at least the most common ones?

From the RFC
   The URI syntax defines a grammar that is a superset of all
   valid URIs, allowing an implementation to parse the common components
   of a URI reference without knowing the scheme-specific requirements  of every possible identifier.  
 
To me a single interface is less useful, (as above) it means that implementers have to handle all potential URIs (to follow the PSR correctly) even if they only ever want one form (eg HTTP client only ever needs a hierarchical URI). Since URIs are either one form or the other (and can't change) implementing sub-interfaces makes things easier (so the HTTP client library only has to implement the hierarchical form).
 
Here we disagree :) IMO a UriInterface should cover all of the URI definition. This time I didn't forget to updated the gist accordingly https://gist.github.com/4589180
 

John 

Chris Wilkinson

unread,
Jan 26, 2013, 4:26:43 AM1/26/13
to php...@googlegroups.com
On Saturday, 26 January 2013 08:34:25 UTC, John Patrick Gerdeman wrote:
URN is one (of hundreds) specific scheme, URI being a superset of all schemes. If I understand you correctly OpaqueUriInterface should actually be URNInterface? Wouldn't we then have to define Interfaces for all schemes, at least the most common ones?
 
No, a URN would be just an implementation of OpaqueUriInterface.
 
To me a single interface is less useful, (as above) it means that implementers have to handle all potential URIs (to follow the PSR correctly) even if they only ever want one form (eg HTTP client only ever needs a hierarchical URI). Since URIs are either one form or the other (and can't change) implementing sub-interfaces makes things easier (so the HTTP client library only has to implement the hierarchical form).
 
Here we disagree :) IMO a UriInterface should cover all of the URI definition. This time I didn't forget to updated the gist accordingly https://gist.github.com/4589180

I'll hold off on writing a PR and wait to see what others think.

Chris

Chris Wilkinson

unread,
Apr 4, 2013, 4:00:45 PM4/4/13
to php...@googlegroups.com
As this discussion has gone quiet, I've decided to write a PR which is now at https://github.com/php-fig/fig-standards/pull/104

It's similar to my original gist, the main change is that I've removed the setters (it's up to implementers to handle creation, and when modifying a URI later on you are actually creating a new one). I have added normalise(), resolve() and relativize() to the hierarchical form to cover common use cases.

I've tried to heavily document the interfaces as it's a well-defined external standard (the PR itself could definitely be improved/expanded). A few bits are borrowed (*cough*) from the Java docs and would need to be rewritten a bit, but the ideas are the same. I'm sure there's things not quite right/clear enough though.

Thoughts?

Chris

Larry Garfield

unread,
Apr 14, 2013, 3:37:57 PM4/14/13
to php...@googlegroups.com
Chris, thank you for picking this back up!  This is the sort of thing we should be discussing, according to a majority of our membership. :-)

- 1.1: The description of Opaque and Hierarchical URIs need examples.  From the descriptions in the bullet points... I have no idea what they are. :-)  It's not until 2 paragraphs later ("For example, a HTTP client...") that i have any idea what you're talking about.

- 1.2: I'm not comfortable with leaving setters up to implementations.  If these objects are intended to be immutable, say so and ban post-constructor changes.  If not, then we do need to standardize the manipulation operations as well.  Otherwise if I need to touch even a single property I'm suddenly coupled to a particular implementation.  Reading through the rest of the spec, I think a value object makes sense along with a SHOULD NOT to discourage (but not forbid) adding mutators.

- Providing the meat of the implementation as part of the docblock is inconsistent with what we did in PSR-3 and what is being discussed for PSR-Cache.  I'm not against it necessarily; Drupal does that a lot, for instance.  But I know not everyone likes that.  (Fabien in particular is not a fan, IIRC.)  I think a lot of the extensive details, like the charts and samples, should be pulled out to spec sections rather than baked into the interfaces.

- "URIs in string form have the syntax": the example here specifically says hierarchical-part, which implicitly excludes opaque URIs.  From my read of the RFC, however, opaque URIs should have the same special meaning for ? and # and such.  So shouldn't that be "body" or something?

Hm, no, reading the RFC they use hier-part too.  That's silly. :-)

- The docblocks on toDecodedString() and toEncodedString() are very good, however, as they relate specifically to the behavior of that method.  Nicely done!

- For getScheme(), are you sure that's case-insensitive?  I thought schemes were case sensitive, in that http:// and HTTP:// were different things.

- Should geScheme() return NULL, or empty string?  Because if the scheme isn't there, isn't that the same as it being an empty string?  (NULL is a byatch to code around, so I try to avoid it whenever I can.)

- Is my understanding correct that getHierarchicalPart() on "http://example.com/gir/zim?dib=dab" would be "example.com/gir/zim"?  An example here could help, especially since HTTP is likely to be the most common URI used.  That would slightly duplicate the big docblock at the top, but as stated above I think it's best to break that part out of the interface to the RFC proper.  (A consistent example for all of the getters would be good.)

- The docblock for getQueryAsArray() implies that it does not support nested arrays.  That's extremely common in PHP for form handling, so I think we want to be clear on how that should be handled.  "It can't be" is an answer that would torpedo this spec's usefulness. :-(  It should also explicitly say that in case of no query, array() is returned rather than NULL.

- I don't understand why I'd want the preceding // on the hierarchical part in getHierarhicalPart().  Is that what the RFC says to do?  Because I don't know how useful I'd find that as a developer.

- Is it possible to break getUserInfo() into getUser() and getPassword()?  Or in addition to?  Or is it expected that a using system needs to parse those out of the UserInfo string itself?  (As long as it wouldn't violate the spec, I'd prefer to include those extra methods for ease of use.)

- getPath() should explicitly specify if the path includes the leading / or not.  Even if the RFC says, we should reiterate here because that's where people will look to say "Wait, am I going to get a / back or no?"

- In the normalize() docblock, I'd pair the examples up with their before/after forms rather than having a before list and after list.  Same for resolve() and relativize().  The examples are good, but better organization would make them clearer.

- The resolve() and relativize() methods have a lot of conditionals in their docblocks.  That to me is a code smell.  See below.

- UnexpectedValueException 's docblock should have a single line short summary.  The rest of the text should be a longdesc.  Also, "if a value does not match with a set of values" took me 3 reads to understand. :-)  It looks like it's copied from the base PHP exception.  Given that it's only used in one place, getQueryAsArray(), I think we can make it much more explicit: InvalidQueryString extends \UnexpectedValue Exception.

On the whole, this reads like a direct port of the RFC to PHP.  I therefore really really like it. :-)  The language is overall very precise and to the point, which is good.  And it lays a good foundation for getting back to to the HTTP work, which I think is going to be some of our most important PSRs.  Well done, Chris!

The only significant pushback I'd offer is that it feels like HierarhicalUriInterface needs to be split into two: A complete URI interface and a relative URI.  That would, I think, greatly simplify the descriptions (and therefore implementations) of resolve() and relativize().  That would also then replace isAbsolute() with simply an instanceof check.

If there's a specific reason why they can't be split let me know, because it feels to me like they should.  There's too much conditional logic embedded in those descriptions otherwise.

--Larry Garfield

Chris Wilkinson

unread,
Apr 15, 2013, 3:21:40 PM4/15/13
to php...@googlegroups.com
Hi Larry,

Thanks for the comments. I've put some response below and will make some changes over the next couple of days.


- 1.1: The description of Opaque and Hierarchical URIs need examples.  From the descriptions in the bullet points... I have no idea what they are. :-)  It's not until 2 paragraphs later ("For example, a HTTP client...") that i have any idea what you're talking about.

Yep, parts definitely need improvement - glad some bits are vaguely up to scratch though. ;)
  
- Providing the meat of the implementation as part of the docblock is inconsistent with what we did in PSR-3 and what is being discussed for PSR-Cache.  I'm not against it necessarily; Drupal does that a lot, for instance.  But I know not everyone likes that.  (Fabien in particular is not a fan, IIRC.)  I think a lot of the extensive details, like the charts and samples, should be pulled out to spec sections rather than baked into the interfaces.

Yeah it probably could be reduced, but I do think that clarity is important here. Does there need to be a bylaw on what should go on the interface's docblocks and what shouldn't? (Is that even possible to easily define?)
 
- "URIs in string form have the syntax": the example here specifically says hierarchical-part, which implicitly excludes opaque URIs.  From my read of the RFC, however, opaque URIs should have the same special meaning for ? and # and such.  So shouldn't that be "body" or something?

Hm, no, reading the RFC they use hier-part too.  That's silly. :-)

Indeed. :)

(Might be an idea to try and clear that up somehow? I don't think the terminology should be changed, so some kind of note might be in order.)
 
- For getScheme(), are you sure that's case-insensitive?  I thought schemes were case sensitive, in that http:// and HTTP:// were different things.

The scheme and host are both case-insensitive in the RFC.
 
- Is my understanding correct that getHierarchicalPart() on "http://example.com/gir/zim?dib=dab" would be "example.com/gir/zim"?  An example here could help, especially since HTTP is likely to be the most common URI used.  That would slightly duplicate the big docblock at the top, but as stated above I think it's best to break that part out of the interface to the RFC proper.  (A consistent example for all of the getters would be good.)

Good idea! (It that's plus the double slash, so //example.com/gir/zim).
 
- The docblock for getQueryAsArray() implies that it does not support nested arrays.  That's extremely common in PHP for form handling, so I think we want to be clear on how that should be handled.  "It can't be" is an answer that would torpedo this spec's usefulness. :-(  It should also explicitly say that in case of no query, array() is returned rather than NULL.

Good idea on both counts. I agree that it definitely needs to handle nested arrays.
 
- I don't understand why I'd want the preceding // on the hierarchical part in getHierarhicalPart().  Is that what the RFC says to do?  Because I don't know how useful I'd find that as a developer.

It is part of it (the form is scheme ":" hier-part [ "?" query ] [ "#" fragment ]). As for making it less useful... not sure what to do about that. Contradicting the RFC's definition isn't really a good idea...
 
- Is it possible to break getUserInfo() into getUser() and getPassword()?  Or in addition to?  Or is it expected that a using system needs to parse those out of the UserInfo string itself?  (As long as it wouldn't violate the spec, I'd prefer to include those extra methods for ease of use.)

Having a password is deprecated in the RFC. If there is a consensus to separate it out, then we can go with it, but I would question whether it is actually used. (I have been through various implementations are more do include it than not however - might be due to it's inclusion in parse_url())
 
- getPath() should explicitly specify if the path includes the leading / or not.  Even if the RFC says, we should reiterate here because that's where people will look to say "Wait, am I going to get a / back or no?"

Good idea; it will be returned as it is part of the path (assuming there is a path of course!).
 
- UnexpectedValueException 's docblock should have a single line short summary.  The rest of the text should be a longdesc.  Also, "if a value does not match with a set of values" took me 3 reads to understand. :-)  It looks like it's copied from the base PHP exception.  Given that it's only used in one place, getQueryAsArray(), I think we can make it much more explicit: InvalidQueryString extends \UnexpectedValue Exception.

Yep it is just the base description; I agree with making it more explicit, but as it's not invalid it should be something like NonArrayQueryStringException (I'm sure there's a better name that's eluding me though!).

I have also been wondering, should all FIG exceptions actually be interfaces? (I think I did see someone mention it on here somewhere before too, but I can't remember if a discussion followed.)
 
The only significant pushback I'd offer is that it feels like HierarhicalUriInterface needs to be split into two: A complete URI interface and a relative URI.  That would, I think, greatly simplify the descriptions (and therefore implementations) of resolve() and relativize().  That would also then replace isAbsolute() with simply an instanceof check.

If there's a specific reason why they can't be split let me know, because it feels to me like they should.  There's too much conditional logic embedded in those descriptions otherwise.

I did think about that too, it makes sense especially if they're to be treated as immutable. I didn't include it as I wasn't sure whether people would actually like further separation (I haven't found anywhere else that has actually split URIs into the two forms, let alone splitting one of them again). I would personally like it though.

- In the normalize() docblock, I'd pair the examples up with their before/after forms rather than having a before list and after list.  Same for resolve() and relativize().  The examples are good, but better organization would make them clearer. 
- The resolve() and relativize() methods have a lot of conditionals in their docblocks.  That to me is a code smell.

These are the pieces borrowed from Javadoc, so they do need rewriting anyhow. I have been able to remove opaque URIs from them already though from having the 2 separate sub-interfaces.

One change I've been thinking of making is to allow plain strings on relativize() etc. That would need the introduction of a InvalidHierarchicalUriSyntaxException for cases where it's not valid. Is it worth including a few other exceptions that aren't actually used in the interfaces (such as InvalidOpaqueUriSyntaxException)?

Chris

Derrick Nelson

unread,
Jun 11, 2013, 1:48:32 PM6/11/13
to php...@googlegroups.com
In response to the usefulness of the leading "//" on the hierarchical part, I would point you guys to the fact that URLs can be scheme-relative in HTML (e.g. "//example.com/foo/bar") in order to keep them functional across http/https.  I'm fairly sure this format takes a backseat to fully relative URLs (e.g. "/foo/bar"), and it has some legacy browser support quirks, but it's there, nonetheless.
Reply all
Reply to author
Forward
0 new messages