> The accessors is_absolute and is_opaque have been renamed absolute and opaque in order to be more consistent.
More consistent with what?
I think the is_ prefixes are much clearer.
A URI library might certainly be useful. Some comments:
> The accessors is_absolute and is_opaque have been renamed absolute and opaque in order to be more consistent.
More consistent with what?
I think the is_ prefixes are much clearer.
> Example Usage
> std::network::uri uri("http://www.example.com/glynos/?key=value#frag");
> assert(*uri.scheme() == "http");
The return type is optional<string_ref>
I think using string_ref is great, but using optional is not so great. Is there a need to distinguish between null and empty?
The danger of undefined behavior seems too high. For example, I think it should've been assert(uri.scheme() && *uri.scheme() == "http"); if you use optional.
--
On Wed, Jan 23, 2013 at 10:05:29AM +0100, Olaf van der Spek wrote:
> On Wed, Jan 23, 2013 at 10:01 AM, Christof Meerwald <cme...@cmeerw.org> wrote:
> > Sorry, null was the wrong term, RFC 3986 talks about empty andfor the fragment part: any web browser behaves differently between
> > undefined, for example in 5.3 "Note that we are careful to preserve
> > the distinction between a component that is undefined, meaning that
> > its separator was not present in the reference, and a component that
> > is empty, meaning that the separator was present and was immediately
> > followed by the next component separator or the end of the reference."
> >
> > It probably needs to be reviewed if the modelling of what can be
> > undefined matches the RFC (I think a path can be empty, but not
> > undefined).
> Do you know of any real world code that uses that distinction?
> What % of use cases require that distinction?
having an empty fragment or no fragment.
for the query part: any HTTP proxy server (and probably also web
browser) will want to know if there is an empty query part or none at
all (as it will affect caching behaviour).
For the other parts the distinction is probably only needed if you
wanted to implemented operations like normalize or resolve yourself.
On Wed, Jan 23, 2013 at 10:26 PM, Christof Meerwald <cme...@cmeerw.org> wrote:
> On Wed, Jan 23, 2013 at 08:46:25PM +0100, Olaf van der Spek wrote:
>> On Wed, Jan 23, 2013 at 10:47 AM, Christof Meerwald <cme...@cmeerw.org> wrote:
>> > For the other parts the distinction is probably only needed if you
>> > wanted to implemented operations like normalize or resolve yourself.
>> So for the majority of use cases the distinction isn't necessary?
>
> Probably, but not having that information available would seem like a
> very arbitrary limitation.
The info would be available via bool has_*() properties.
>> >> Could the issue by solved by having bool has_*() properties?
>> >
>> > It could be, but I am not convinced that it would be a better
>> > approach.
>> Why not?
>> It nicely avoids lots of potential for big trouble.
>
> If optional gets standardised and on the other hand the URI library
> uses a different approach to model optional components because it
> "nicely avoids lots of potential for big trouble" then I would claim
> that something is wrong with "optional" (I haven't looked in detail at
> optional, but I would expect that it does what the name suggests).
I don't agree. Optional is a fine tool, but that tool is not
appropriate in all cases.
Your original proposal did not use optional. Why did you move to optional?
Does it benefit existing users of the class? Does it make their code
simpler? Or more complex?
On 24/01/2013 7:32 PM, "Olaf van der Spek" <olafv...@gmail.com> wrote:
>
> On Thu, Jan 24, 2013 at 9:22 AM, Dean Michael Berris <dbe...@google.com> wrote:
> >>> You need a way to differentiate between two states: something having a value
> >>> and something not having a value. There is a difference between having an
> >>> empty value and not having a value at all.
> >>
> >> That's my question: does the majority of client code make that distinction?
> >
> > My implementation of an HTTP client already does.
>
> Where do I find your code?
>
> >> And what approach would result in simpler client code, both if the
> >> code makes the distinction and if it doesn't.
> >
> > Consistency wise this is important.
>
> That doesn't answer the question, does it?
>
It does. Consistent interfaces makes simpler client code. There's no point in limiting the design when there's a perfectly acceptable and reasonable implementation.
Also note that these are URI instances not just specific to HTTP. So you cannot make the assumption that all other schemes take the same approach as HTTP in dealing with empty or nonexistent paths.
> >> AFAIK the original proposal returned a string_ref-like value, so
> >> existing code doesn't have this problem.
> >>
> >
> > Existing code uses boost::optional<>. Look at the released version of
> > the library in cpp-netlib version 0.9.4.
>
> Ah. But your original proposal didn't, right?
>
The original returns the range including the delimiters. This was deemed unnecessary as optional string refs would model the situation better. I tend to agree.
On Wed, Jan 23, 2013 at 11:33 PM, Dean Michael Berris
<dbe...@google.com> wrote:
> I'm opposed to adding more members to the URI class unless it's
> absolutely necessary. std::optional<> does this nicely for us
> conveying the correct semantics. It keeps the URI class simple and
> easy to understand and easy to keep a mental model of.
But ...
On Wed, Jan 23, 2013 at 10:39 PM, Olaf van der Spek
<olafv...@gmail.com> wrote:
> Does it benefit existing users of the class? Does it make their code
> simpler? Or more complex?
On Thu, Jan 24, 2013 at 1:11 AM, Nicol Bolas <jmck...@gmail.com> wrote:
>> I don't agree. Optional is a fine tool, but that tool is not
>> appropriate in all cases.
>
>
> Fair enough. So what exactly is inappropriate about this use case?
> You need a way to differentiate between two states: something having a value
> and something not having a value. There is a difference between having an
> empty value and not having a value at all.
That's my question: does the majority of client code make that distinction?
On Wed, Jan 23, 2013 at 12:33:02AM -0800, Vincent Jacquet wrote:
> I've found a lot of occurences of "empty" in RFC3986 ("the path may be empty
> (no characters)", page 16; "the registered name is empty (zero length)",
> page 21... ) but no occurences of the word "null".
> Could you please tell us which RFC you are referring to and where is the
> distinction between null and empty is made?
Sorry, null was the wrong term, RFC 3986 talks about empty and
undefined, for example in 5.3 "Note that we are careful to preserve
the distinction between a component that is undefined, meaning that
its separator was not present in the reference, and a component that
is empty, meaning that the separator was present and was immediately
followed by the next component separator or the end of the reference."
It probably needs to be reviewed if the modelling of what can be
undefined matches the RFC (I think a path can be empty, but not
undefined).
Also, unless I am mistaken, the normalization, transformation and recomposition use cases should be encapsulated by the uri type.If I were to make a function to parse the uri and return a tuple with different components, I'd go with optional<>. But aren't we talking about a uri *type*?The filesytem's path class returns path for the different accessors (parent_path(), filename(), stem(), extension(), ...).So, may be question is not whether to return string_ref or optional<string_ref>, but uri?It would then be easy to douri source = "foo://example.com:8042/over/there?name=ferret#nose";uri q = source.query();uri target = uri("http://www.example.com/") + q;target would be "http://www.example.com/?name=ferret");q would be "?name=ferret"Yes, there is the "?", because "name=ferret" cannot be parse to a uri, where "?name=ferret" can.With the current proposal, how would I make "q"?uri q = source.query() ? uri("?" + *source.query()) : uri("");
I think we all agree that knowing whether a component of the uri is defined matters.It matter when the uri should be normalized (true for Scheme-Based Normalization [§6.2.3], but apparently not for Syntax-Based Normalization [§6.2.2]), when a uri reference should be transformed to a target uri [§5.2.2], when parsed uri components should be recomposed to a uri [§5.3].But, when we need to parse the query string or retrieve the fragment, should we (always) process differently not defined and empty ?Also, unless I am mistaken, the normalization, transformation and recomposition use cases should be encapsulated by the uri type.If I were to make a function to parse the uri and return a tuple with different components, I'd go with optional<>. But aren't we talking about a uri *type*?The filesytem's path class returns path for the different accessors (parent_path(), filename(), stem(), extension(), ...).So, may be question is not whether to return string_ref or optional<string_ref>, but uri?It would then be easy to douri source = "foo://example.com:8042/over/there?name=ferret#nose";uri q = source.query();uri target = uri("http://www.example.com/") + q;target would be "http://www.example.com/?name=ferret");q would be "?name=ferret"Yes, there is the "?", because "name=ferret" cannot be parse to a uri, where "?name=ferret" can.With the current proposal, how would I make "q"?uri q = source.query() ? uri("?" + *source.query()) : uri("");
enforcement of the RFC -- or more to the point, if it wanted to use it
It can use the URI type directly if it wanted to use the correct
as part of its internal APIs.
I realize the proposal actually supports relative URI's (I must admit
I let Glyn define that part) and that this is even now useful in the
context of an HTTP server. Of course it has to make assumptions on the
scheme and in that context it's definitely possible.
Speaking of *uri*, I sincerly think there is something wrong with accessor returning an optional<string_ref>: the string_ref part!I want to manipulate uri, not its components, because the manipulation of the components of a uri is scheme dependent, and I do not want to write scheme dependent code all over the place.It may be a little extreme but please consider:
"W3C recommends that all web servers support semicolon separators in addition to ampersand separators[6] to allow application/x-www-form-urlencoded query strings in URLs within HTML documents without having to entity escape ampersands." (see <http://en.wikipedia.org/wiki/Query_string>).
I didn't knew that until 5 minutes ago. If I had started to parse the query string into a multi_map, my parsing would have been incomplete.
IMVHO the design of the uri type should not encourage to access the components as a string (or string_ref or optional<string_ref>), because it is way too tempting to "handle it myself". Ideally, it should give me a scheme object that would interpret its parts.
No, I always wanted to return a reference to the underlying string. Returning a uri object means an additional string copy.Regards,Glyn
Note that if string_ref itself distinquished between null and empty,
we wouldn't need optional at all.
ie a string_ref which is pointer + length would be:
[ 0xabcd1234, 0 ] = points to one-past-question-mark, has length 0
[ 0, 0 ] = no question-mark in URI
"this is a scheme-specific detail and the proposed uri is limited to being generic".How do you implement scheme_based_normalization or protocol_based_normalization then ?