A URI Library for C++

511 views
Skip to first unread message

Olaf van der Spek

unread,
Jan 22, 2013, 4:05:07 PM1/22/13
to std-pr...@isocpp.org, gly...@acm.org, dbe...@google.com

A URI library might certainly be useful. Some comments:


> The accessors is_absolute and is_opaque have been renamed absolute and opaque in order to be more consistent.


More consistent with what?

I think the is_ prefixes are much clearer.


> Example Usage

> std::network::uri uri("http://www.example.com/glynos/?key=value#frag");

> assert(*uri.scheme() == "http");


The return type is optional<string_ref>

I think using string_ref is great, but using optional is not so great. Is there a need to distinguish between null and empty?

The danger of undefined behavior seems too high. For example, I think it should've been assert(uri.scheme() && *uri.scheme() == "http"); if you use optional.

Zhihao Yuan

unread,
Jan 22, 2013, 5:00:14 PM1/22/13
to std-pr...@isocpp.org, gly...@acm.org, dbe...@google.com
On Tue, Jan 22, 2013 at 3:05 PM, Olaf van der Spek <olafv...@gmail.com> wrote:
> The danger of undefined behavior seems too high. For example, I think it
> should've been assert(uri.scheme() && *uri.scheme() == "http"); if you use
> optional.

An hard-coded assertion needs no "testing". A product code can
deploy some testing or just:

foo(uri.scheme().get_value_or(""));

--
Zhihao Yuan, ID lichray
The best way to predict the future is to invent it.
___________________________________________________
4BSD -- http://4bsd.biz/

Christof Meerwald

unread,
Jan 22, 2013, 5:05:12 PM1/22/13
to std-pr...@isocpp.org, gly...@acm.org, dbe...@google.com
On Tue, Jan 22, 2013 at 01:05:07PM -0800, Olaf van der Spek wrote:
> A URI library might certainly be useful. Some comments:
> > The accessors is_absolute and is_opaque have been renamed absolute and
> opaque in order to be more consistent.
> More consistent with what?I think the is_ prefixes are much clearer.
> > Example Usage> std::network::uri
> uri("http://www.example.com/glynos/?key=value#frag");> assert(*uri.scheme()
> == "http");
> The return type is optional<string_ref>I think using string_ref is great,
> but using optional is not so great. Is there a need to distinguish between
> null and empty?

Yes, the RFC makes a distinction between null and empty and therefore
the library should also provide that level of information.


Christof

--

http://cmeerw.org sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org xmpp:cmeerw at cmeerw.org

Klaim - Joël Lamotte

unread,
Jan 22, 2013, 6:55:58 PM1/22/13
to std-pr...@isocpp.org

On Tue, Jan 22, 2013 at 10:05 PM, Olaf van der Spek <olafv...@gmail.com> wrote:

> The accessors is_absolute and is_opaque have been renamed absolute and opaque in order to be more consistent.


More consistent with what?

I think the is_ prefixes are much clearer.


With the rest of the standard library I suppose. All booleans accessors in the standard library types don't use is_ prefix. Like empty() or size() functions.

Joel Lamotte

Olaf van der Spek

unread,
Jan 22, 2013, 7:27:18 PM1/22/13
to std-pr...@isocpp.org
size isn't a boolean property. is_open (iostream) is, though.
So the standard library isn't entirely consistent itself.


Olaf

Beman Dawes

unread,
Jan 22, 2013, 8:29:30 PM1/22/13
to std-pr...@isocpp.org, gly...@acm.org, dbe...@google.com
On Tue, Jan 22, 2013 at 4:05 PM, Olaf van der Spek <olafv...@gmail.com> wrote:
> A URI library might certainly be useful.

See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3507.html

--Beman

Vincent Jacquet

unread,
Jan 23, 2013, 3:33:02 AM1/23/13
to std-pr...@isocpp.org, gly...@acm.org, dbe...@google.com, cme...@cmeerw.org
I've found a lot of occurences of "empty" in RFC3986 ("the path may be empty (no characters)", page 16; "the registered name is empty (zero length)", page 21... ) but no occurences of the word "null".
Could you please tell us which RFC you are referring to and where is the distinction between null and empty is made?

Vincent

Christof Meerwald

unread,
Jan 23, 2013, 4:01:48 AM1/23/13
to std-pr...@isocpp.org, Vincent Jacquet, gly...@acm.org, dbe...@google.com
On Wed, Jan 23, 2013 at 12:33:02AM -0800, Vincent Jacquet wrote:
> I've found a lot of occurences of "empty" in RFC3986 ("the path may be empty
> (no characters)", page 16; "the registered name is empty (zero length)",
> page 21... ) but no occurences of the word "null".
> Could you please tell us which RFC you are referring to and where is the
> distinction between null and empty is made?

Sorry, null was the wrong term, RFC 3986 talks about empty and
undefined, for example in 5.3 "Note that we are careful to preserve
the distinction between a component that is undefined, meaning that
its separator was not present in the reference, and a component that
is empty, meaning that the separator was present and was immediately
followed by the next component separator or the end of the reference."

It probably needs to be reviewed if the modelling of what can be
undefined matches the RFC (I think a path can be empty, but not
undefined).

Olaf van der Spek

unread,
Jan 23, 2013, 4:05:29 AM1/23/13
to std-pr...@isocpp.org, Vincent Jacquet, gly...@acm.org, dbe...@google.com
On Wed, Jan 23, 2013 at 10:01 AM, Christof Meerwald <cme...@cmeerw.org> wrote:
> On Wed, Jan 23, 2013 at 12:33:02AM -0800, Vincent Jacquet wrote:
>> I've found a lot of occurences of "empty" in RFC3986 ("the path may be empty
>> (no characters)", page 16; "the registered name is empty (zero length)",
>> page 21... ) but no occurences of the word "null".
>> Could you please tell us which RFC you are referring to and where is the
>> distinction between null and empty is made?
>
> Sorry, null was the wrong term, RFC 3986 talks about empty and
> undefined, for example in 5.3 "Note that we are careful to preserve
> the distinction between a component that is undefined, meaning that
> its separator was not present in the reference, and a component that
> is empty, meaning that the separator was present and was immediately
> followed by the next component separator or the end of the reference."
>
> It probably needs to be reviewed if the modelling of what can be
> undefined matches the RFC (I think a path can be empty, but not
> undefined).

Do you know of any real world code that uses that distinction?
What % of use cases require that distinction?

Could the issue by solved by having bool has_*() properties?
--
Olaf

Christof Meerwald

unread,
Jan 23, 2013, 4:47:13 AM1/23/13
to std-pr...@isocpp.org
for the fragment part: any web browser behaves differently between
having an empty fragment or no fragment.

for the query part: any HTTP proxy server (and probably also web
browser) will want to know if there is an empty query part or none at
all (as it will affect caching behaviour).

For the other parts the distinction is probably only needed if you
wanted to implemented operations like normalize or resolve yourself.


> Could the issue by solved by having bool has_*() properties?

It could be, but I am not convinced that it would be a better
approach.

Daniel Krügler

unread,
Jan 23, 2013, 5:08:04 AM1/23/13
to std-pr...@isocpp.org
2013/1/23 Christof Meerwald <cme...@cmeerw.org>:
I agree. My gut feeling is that this is actually a problem of the
optional interface. I'm currently suggesting to add a function that
retrieves its content in a checked manner, since calling operator* for
a disengaged value would be undefined.

- Daniel

Vincent Jacquet

unread,
Jan 23, 2013, 5:54:35 AM1/23/13
to std-pr...@isocpp.org
Who is responsible of knowing if a component of the uri is defined ? The uri or the component ?

Klaim - Joël Lamotte

unread,
Jan 23, 2013, 7:34:20 AM1/23/13
to std-pr...@isocpp.org
Another question: 
Maybe I missed that part but it is not clear to me how can I check that a string contain a valid uri without 
trying to build a uri object that would throw an exception?
For example if I want to do that check as part of my program's logic, I don't want that check to be done
through an exception handling mecanism.

Joel Lamotte

Christof Meerwald

unread,
Jan 23, 2013, 7:53:46 AM1/23/13
to std-pr...@isocpp.org
there is a noexcept make_uri factory function:

// factory functions
template <class String>
uri make_uri(const String &u, std::error_code &e) noexcept;

This factory function is provided in order to be able to construct a
uri object without throwing an exception. The error code is stored
in the std::error_code object, if there is a syntax error.

Daniel Krügler

unread,
Jan 23, 2013, 8:15:53 AM1/23/13
to std-pr...@isocpp.org
2013/1/23 Christof Meerwald <cme...@cmeerw.org>:
> On Wed, Jan 23, 2013 at 01:34:20PM +0100, Klaim - Joël Lamotte wrote:
>> Another question:
>> Maybe I missed that part but it is not clear to me how can I check that a
>> string contain a valid uri without
>> trying to build a uri object that would throw an exception?
>> For example if I want to do that check as part of my program's logic, I
>> don't want that check to be done
>> through an exception handling mecanism.
>
> there is a noexcept make_uri factory function:
>
> // factory functions
> template <class String>
> uri make_uri(const String &u, std::error_code &e) noexcept;
>
> This factory function is provided in order to be able to construct a
> uri object without throwing an exception. The error code is stored
> in the std::error_code object, if there is a syntax error.

I think this function cannot be safely declared as noexcept, because
uri is a type that acquires (memory) resources and this can lead to an
out-of-memory situation that throws an exception, which will have the
effect of an immediate termination because of the noexcept. This
problem was also fixed in the filesystem library, e.g. have a look at

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3505.html#canonical

where the error_code overload has also not been marked as noexcept anymore.

- Daniel

Glyn Matthews

unread,
Jan 23, 2013, 12:36:38 PM1/23/13
to std-pr...@isocpp.org
Olaf,


On 22 January 2013 22:05, Olaf van der Spek <olafv...@gmail.com> wrote:

A URI library might certainly be useful. Some comments:


> The accessors is_absolute and is_opaque have been renamed absolute and opaque in order to be more consistent.


More consistent with what?

I think the is_ prefixes are much clearer.


As someone else said on this thread, they are more consistent with other examples in the standard library (e.g. std::string::empty()).
 

> Example Usage

> std::network::uri uri("http://www.example.com/glynos/?key=value#frag");

> assert(*uri.scheme() == "http");


The return type is optional<string_ref>

I think using string_ref is great, but using optional is not so great. Is there a need to distinguish between null and empty?

The danger of undefined behavior seems too high. For example, I think it should've been assert(uri.scheme() && *uri.scheme() == "http"); if you use optional.

--

I agree, the example in the document should explicitly show the check for validity of the part.

Regards,
Glyn

Glyn Matthews

unread,
Jan 23, 2013, 12:37:22 PM1/23/13
to std-pr...@isocpp.org
On 23 January 2013 10:47, Christof Meerwald <cme...@cmeerw.org> wrote:
On Wed, Jan 23, 2013 at 10:05:29AM +0100, Olaf van der Spek wrote:
> On Wed, Jan 23, 2013 at 10:01 AM, Christof Meerwald <cme...@cmeerw.org> wrote:
> > Sorry, null was the wrong term, RFC 3986 talks about empty and
> > undefined, for example in 5.3 "Note that we are careful to preserve
> > the distinction between a component that is undefined, meaning that
> > its separator was not present in the reference, and a component that
> > is empty, meaning that the separator was present and was immediately
> > followed by the next component separator or the end of the reference."
> >
> > It probably needs to be reviewed if the modelling of what can be
> > undefined matches the RFC (I think a path can be empty, but not
> > undefined).
> Do you know of any real world code that uses that distinction?
> What % of use cases require that distinction?

for the fragment part: any web browser behaves differently between
having an empty fragment or no fragment.

for the query part: any HTTP proxy server (and probably also web
browser) will want to know if there is an empty query part or none at
all (as it will affect caching behaviour).

For the other parts the distinction is probably only needed if you
wanted to implemented operations like normalize or resolve yourself.


Which you may want to do with the uri in this proposal, if you want scheme or protocol-based normalization:


Regards,
Glyn

Olaf van der Spek

unread,
Jan 23, 2013, 2:46:25 PM1/23/13
to std-pr...@isocpp.org
On Wed, Jan 23, 2013 at 10:47 AM, Christof Meerwald <cme...@cmeerw.org> wrote:
> On Wed, Jan 23, 2013 at 10:05:29AM +0100, Olaf van der Spek wrote:
>> On Wed, Jan 23, 2013 at 10:01 AM, Christof Meerwald <cme...@cmeerw.org> wrote:
>> > Sorry, null was the wrong term, RFC 3986 talks about empty and
>> > undefined, for example in 5.3 "Note that we are careful to preserve
>> > the distinction between a component that is undefined, meaning that
>> > its separator was not present in the reference, and a component that
>> > is empty, meaning that the separator was present and was immediately
>> > followed by the next component separator or the end of the reference."
>> >
>> > It probably needs to be reviewed if the modelling of what can be
>> > undefined matches the RFC (I think a path can be empty, but not
>> > undefined).
>> Do you know of any real world code that uses that distinction?
>> What % of use cases require that distinction?
>
> for the fragment part: any web browser behaves differently between
> having an empty fragment or no fragment.
>
> for the query part: any HTTP proxy server (and probably also web
> browser) will want to know if there is an empty query part or none at
> all (as it will affect caching behaviour).

Does it? Seems kinda unnatural. But I'm no expert on that matter.

> For the other parts the distinction is probably only needed if you
> wanted to implemented operations like normalize or resolve yourself.

So for the majority of use cases the distinction isn't necessary?

>> Could the issue by solved by having bool has_*() properties?
>
> It could be, but I am not convinced that it would be a better
> approach.

Why not?
It nicely avoids lots of potential for big trouble.

Olaf

Christof Meerwald

unread,
Jan 23, 2013, 4:26:15 PM1/23/13
to std-pr...@isocpp.org
On Wed, Jan 23, 2013 at 08:46:25PM +0100, Olaf van der Spek wrote:
> On Wed, Jan 23, 2013 at 10:47 AM, Christof Meerwald <cme...@cmeerw.org> wrote:
> > For the other parts the distinction is probably only needed if you
> > wanted to implemented operations like normalize or resolve yourself.
> So for the majority of use cases the distinction isn't necessary?

Probably, but not having that information available would seem like a
very arbitrary limitation.

> >> Could the issue by solved by having bool has_*() properties?
> >
> > It could be, but I am not convinced that it would be a better
> > approach.
> Why not?
> It nicely avoids lots of potential for big trouble.

If optional gets standardised and on the other hand the URI library
uses a different approach to model optional components because it
"nicely avoids lots of potential for big trouble" then I would claim
that something is wrong with "optional" (I haven't looked in detail at
optional, but I would expect that it does what the name suggests).

Daniel Krügler

unread,
Jan 23, 2013, 4:27:53 PM1/23/13
to std-pr...@isocpp.org
2013/1/23 Christof Meerwald <cme...@cmeerw.org>:
> If optional gets standardised and on the other hand the URI library
> uses a different approach to model optional components because it
> "nicely avoids lots of potential for big trouble" then I would claim
> that something is wrong with "optional" (I haven't looked in detail at
> optional, but I would expect that it does what the name suggests).

I completely agree. The URI proposal shouldn't add functionality that
either is or should be provided by a referred to component (like
optional in this case).

- Daniel

Olaf van der Spek

unread,
Jan 23, 2013, 4:39:14 PM1/23/13
to std-pr...@isocpp.org
On Wed, Jan 23, 2013 at 10:26 PM, Christof Meerwald <cme...@cmeerw.org> wrote:
> On Wed, Jan 23, 2013 at 08:46:25PM +0100, Olaf van der Spek wrote:
>> On Wed, Jan 23, 2013 at 10:47 AM, Christof Meerwald <cme...@cmeerw.org> wrote:
>> > For the other parts the distinction is probably only needed if you
>> > wanted to implemented operations like normalize or resolve yourself.
>> So for the majority of use cases the distinction isn't necessary?
>
> Probably, but not having that information available would seem like a
> very arbitrary limitation.

The info would be available via bool has_*() properties.

>> >> Could the issue by solved by having bool has_*() properties?
>> >
>> > It could be, but I am not convinced that it would be a better
>> > approach.
>> Why not?
>> It nicely avoids lots of potential for big trouble.
>
> If optional gets standardised and on the other hand the URI library
> uses a different approach to model optional components because it
> "nicely avoids lots of potential for big trouble" then I would claim
> that something is wrong with "optional" (I haven't looked in detail at
> optional, but I would expect that it does what the name suggests).

I don't agree. Optional is a fine tool, but that tool is not
appropriate in all cases.

Your original proposal did not use optional. Why did you move to optional?
Does it benefit existing users of the class? Does it make their code
simpler? Or more complex?

--
Olaf

Christof Meerwald

unread,
Jan 23, 2013, 5:05:51 PM1/23/13
to std-pr...@isocpp.org
It's not my proposal (all the credit goes to Glyn Matthews and Dean
Michael Berris) - I am just voicing my opinion (sorry if you got a
different impression)...

When the proposal was changed to not include the '?'-delimiter in the
query part, I pointed out that it wouldn't be possible to distinguish
between an empty and missing query part any more. Therefore some other
way was needed to model that: using optional or adding has_ methods.
And I voiced my preference for using optional.

Dean Michael Berris

unread,
Jan 23, 2013, 5:33:40 PM1/23/13
to std-pr...@isocpp.org
I tend to only watch this list for interesting discussions and I
wanted to let Glyn address the points as he's mostly written the paper
with input from me. That said, let me address these points.
Yes. The "has_*" methods introduce additional members that don't
really make sense to have.

std::optional<std::string_ref> is perfect in this use case. Here's the
list of reasons:

- If the URI has a part, it should return a reference to that part of
the string.
- If the URI does not have that part explicitly defined, you return an
uninitialised std::optional<>.
- If the URI does have that part explicitly defined but empty, you
return a std::optional<> that had a default-constructed
std::string_ref.

It's also good for value semantics.

I'm opposed to adding more members to the URI class unless it's
absolutely necessary. std::optional<> does this nicely for us
conveying the correct semantics. It keeps the URI class simple and
easy to understand and easy to keep a mental model of.

Nicol Bolas

unread,
Jan 23, 2013, 7:11:06 PM1/23/13
to std-pr...@isocpp.org


On Wednesday, January 23, 2013 1:39:14 PM UTC-8, Olaf van der Spek wrote:
On Wed, Jan 23, 2013 at 10:26 PM, Christof Meerwald <cme...@cmeerw.org> wrote:
> On Wed, Jan 23, 2013 at 08:46:25PM +0100, Olaf van der Spek wrote:
>> On Wed, Jan 23, 2013 at 10:47 AM, Christof Meerwald <cme...@cmeerw.org> wrote:
>> > For the other parts the distinction is probably only needed if you
>> > wanted to implemented operations like normalize or resolve yourself.
>> So for the majority of use cases the distinction isn't necessary?
>
> Probably, but not having that information available would seem like a
> very arbitrary limitation.

The info would be available via bool has_*() properties.

>> >> Could the issue by solved by having bool has_*() properties?
>> >
>> > It could be, but I am not convinced that it would be a better
>> > approach.
>> Why not?
>> It nicely avoids lots of potential for big trouble.
>
> If optional gets standardised and on the other hand the URI library
> uses a different approach to model optional components because it
> "nicely avoids lots of potential for big trouble" then I would claim
> that something is wrong with "optional" (I haven't looked in detail at
> optional, but I would expect that it does what the name suggests).

I don't agree. Optional is a fine tool, but that tool is not
appropriate in all cases.

Fair enough. So what exactly is inappropriate about this use case?

You need a way to differentiate between two states: something having a value and something not having a value. There is a difference between having an empty value and not having a value at all.

This is exactly the scenario that `optional` is made to cover. If there is an empty string, you return an active optional that contains an empty string. If there is no string at all, you return an inactive optional. So what's wrong with using it here, besides a lack of familiarity with the idiom?

Your original proposal did not use optional. Why did you move to optional?
Does it benefit existing users of the class? Does it make their code
simpler? Or more complex?

It makes the standard library as a whole more cohesive, as we now have an idiomatic way to tell the difference between "empty value" and "no value at all." `optional` should be used throughout the standard library in such places.

Olaf van der Spek

unread,
Jan 24, 2013, 3:20:34 AM1/24/13
to std-pr...@isocpp.org
On Wed, Jan 23, 2013 at 11:33 PM, Dean Michael Berris
<dbe...@google.com> wrote:
> I'm opposed to adding more members to the URI class unless it's
> absolutely necessary. std::optional<> does this nicely for us
> conveying the correct semantics. It keeps the URI class simple and
> easy to understand and easy to keep a mental model of.

But ...

On Wed, Jan 23, 2013 at 10:39 PM, Olaf van der Spek
<olafv...@gmail.com> wrote:
> Does it benefit existing users of the class? Does it make their code
> simpler? Or more complex?


On Thu, Jan 24, 2013 at 1:11 AM, Nicol Bolas <jmck...@gmail.com> wrote:
>> I don't agree. Optional is a fine tool, but that tool is not
>> appropriate in all cases.
>
>
> Fair enough. So what exactly is inappropriate about this use case?

> You need a way to differentiate between two states: something having a value
> and something not having a value. There is a difference between having an
> empty value and not having a value at all.

That's my question: does the majority of client code make that distinction?
And what approach would result in simpler client code, both if the
code makes the distinction and if it doesn't.
AFAIK the original proposal returned a string_ref-like value, so
existing code doesn't have this problem.



--
Olaf

Dean Michael Berris

unread,
Jan 24, 2013, 3:22:53 AM1/24/13
to std-pr...@isocpp.org
On Thu, Jan 24, 2013 at 7:20 PM, Olaf van der Spek <olafv...@gmail.com> wrote:
> On Wed, Jan 23, 2013 at 11:33 PM, Dean Michael Berris
> <dbe...@google.com> wrote:
>> I'm opposed to adding more members to the URI class unless it's
>> absolutely necessary. std::optional<> does this nicely for us
>> conveying the correct semantics. It keeps the URI class simple and
>> easy to understand and easy to keep a mental model of.
>
> But ...
>
> On Wed, Jan 23, 2013 at 10:39 PM, Olaf van der Spek
> <olafv...@gmail.com> wrote:
>> Does it benefit existing users of the class? Does it make their code
>> simpler? Or more complex?
>
>
> On Thu, Jan 24, 2013 at 1:11 AM, Nicol Bolas <jmck...@gmail.com> wrote:
>>> I don't agree. Optional is a fine tool, but that tool is not
>>> appropriate in all cases.
>>
>>
>> Fair enough. So what exactly is inappropriate about this use case?
>
>> You need a way to differentiate between two states: something having a value
>> and something not having a value. There is a difference between having an
>> empty value and not having a value at all.
>
> That's my question: does the majority of client code make that distinction?

My implementation of an HTTP client already does.

> And what approach would result in simpler client code, both if the
> code makes the distinction and if it doesn't.

Consistency wise this is important.

> AFAIK the original proposal returned a string_ref-like value, so
> existing code doesn't have this problem.
>

Existing code uses boost::optional<>. Look at the released version of
the library in cpp-netlib version 0.9.4.

Olaf van der Spek

unread,
Jan 24, 2013, 3:32:04 AM1/24/13
to std-pr...@isocpp.org
On Thu, Jan 24, 2013 at 9:22 AM, Dean Michael Berris <dbe...@google.com> wrote:
>>> You need a way to differentiate between two states: something having a value
>>> and something not having a value. There is a difference between having an
>>> empty value and not having a value at all.
>>
>> That's my question: does the majority of client code make that distinction?
>
> My implementation of an HTTP client already does.

Where do I find your code?

>> And what approach would result in simpler client code, both if the
>> code makes the distinction and if it doesn't.
>
> Consistency wise this is important.

That doesn't answer the question, does it?

>> AFAIK the original proposal returned a string_ref-like value, so
>> existing code doesn't have this problem.
>>
>
> Existing code uses boost::optional<>. Look at the released version of
> the library in cpp-netlib version 0.9.4.

Ah. But your original proposal didn't, right?

--
Olaf

Dean Michael Berris

unread,
Jan 24, 2013, 3:39:16 AM1/24/13
to std-pr...@isocpp.org


On 24/01/2013 7:32 PM, "Olaf van der Spek" <olafv...@gmail.com> wrote:
>
> On Thu, Jan 24, 2013 at 9:22 AM, Dean Michael Berris <dbe...@google.com> wrote:
> >>> You need a way to differentiate between two states: something having a value
> >>> and something not having a value. There is a difference between having an
> >>> empty value and not having a value at all.
> >>
> >> That's my question: does the majority of client code make that distinction?
> >
> > My implementation of an HTTP client already does.
>
> Where do I find your code?
>

cpp-netlib.org

> >> And what approach would result in simpler client code, both if the
> >> code makes the distinction and if it doesn't.
> >
> > Consistency wise this is important.
>
> That doesn't answer the question, does it?
>

It does. Consistent interfaces makes simpler client code. There's no point in limiting the design when there's a perfectly acceptable and reasonable implementation.

Also note that these are URI instances not just specific to HTTP. So you cannot make the assumption that all other schemes take the same approach as HTTP in dealing with empty or nonexistent paths.

> >> AFAIK the original proposal returned a string_ref-like value, so
> >> existing code doesn't have this problem.
> >>
> >
> > Existing code uses boost::optional<>. Look at the released version of
> > the library in cpp-netlib version 0.9.4.
>
> Ah. But your original proposal didn't, right?
>

The original returns the range including the delimiters. This was deemed unnecessary as optional string refs would model the situation better. I tend to agree.

Olaf van der Spek

unread,
Jan 24, 2013, 3:48:42 AM1/24/13
to std-pr...@isocpp.org
On Thu, Jan 24, 2013 at 9:39 AM, Dean Michael Berris <dbe...@google.com> wrote:
>> > My implementation of an HTTP client already does.
>>
>> Where do I find your code?
>>
>
> cpp-netlib.org

Found that site, found github, but really can't find your
"implementation of an HTTP client"

>> >> And what approach would result in simpler client code, both if the
>> >> code makes the distinction and if it doesn't.
>> >
>> > Consistency wise this is important.
>>
>> That doesn't answer the question, does it?
>>
>
> It does. Consistent interfaces makes simpler client code. There's no point

Consistency isn't the only thing that counts. Nobody will say that
consistency is bad, but often it's a trade-off.

> in limiting the design when there's a perfectly acceptable and reasonable
> implementation.
>
> Also note that these are URI instances not just specific to HTTP. So you
> cannot make the assumption that all other schemes take the same approach as
> HTTP in dealing with empty or nonexistent paths.

I'm not making that assumption.

>> Ah. But your original proposal didn't, right?
>>
>
> The original returns the range including the delimiters.

Even for host, scheme and path?
Anyway, let's have a look at client code, please.

Olaf

Nicol Bolas

unread,
Jan 24, 2013, 4:33:12 AM1/24/13
to std-pr...@isocpp.org


On Thursday, January 24, 2013 12:20:34 AM UTC-8, Olaf van der Spek wrote:
On Wed, Jan 23, 2013 at 11:33 PM, Dean Michael Berris
<dbe...@google.com> wrote:
> I'm opposed to adding more members to the URI class unless it's
> absolutely necessary. std::optional<> does this nicely for us
> conveying the correct semantics. It keeps the URI class simple and
> easy to understand and easy to keep a mental model of.

But ...

On Wed, Jan 23, 2013 at 10:39 PM, Olaf van der Spek
<olafv...@gmail.com> wrote:
> Does it benefit existing users of the class? Does it make their code
> simpler? Or more complex?


On Thu, Jan 24, 2013 at 1:11 AM, Nicol Bolas <jmck...@gmail.com> wrote:
>> I don't agree. Optional is a fine tool, but that tool is not
>> appropriate in all cases.
>
>
> Fair enough. So what exactly is inappropriate about this use case?

> You need a way to differentiate between two states: something having a value
> and something not having a value. There is a difference between having an
> empty value and not having a value at all.

That's my question: does the majority of client code make that distinction?

Does it have to? Does it matter what the majority of client code does?

And more importantly, do we want them to?

I admit that I'm not experienced in networking stuff, so I don't know the general sense of how this is written or how often the data will not be present. But there is clearly a three-state paradigm at play. And the way to handle a 3-state return-by-value (present and not empty, present and empty, not present) like this is with `optional<T>`. This is one of the prime use-cases of optional<T>, one of the defining reasons why we need the feature.

optional<T> should be idiomatic C++ for any return-by-value with this 3-state paradigm. How frequently one expects the value to be present or absent is irrelevant; this is the idiom that should be used. If the user doesn't want to care whether the value is present or absent, they can use the features of optional<T> to retrieve a default value. This will be explicitly spelled out in their code, thus making it easy for the reader to see what's happening.

Olaf van der Spek

unread,
Jan 24, 2013, 4:43:21 AM1/24/13
to std-pr...@isocpp.org
On Thu, Jan 24, 2013 at 10:33 AM, Nicol Bolas <jmck...@gmail.com> wrote:
>> That's my question: does the majority of client code make that
>> distinction?
>
>
> Does it have to? Does it matter what the majority of client code does?

Yes
If say 90% of the cases don't make that distinction then IMO that code
should be as simple as possible. Optimize for the common case.
Hence my suggestion to look at some client code.

> And more importantly, do we want them to?
>
> optional<T> should be idiomatic C++ for any return-by-value with this
> 3-state paradigm. How frequently one expects the value to be present or
> absent is irrelevant; this is the idiom that should be used. If the user
> doesn't want to care whether the value is present or absent, they can use
> the features of optional<T> to retrieve a default value. This will be
> explicitly spelled out in their code, thus making it easy for the reader to
> see what's happening.

That feature of optional is about to be removed from the current
optional proposal.

Shouldn't the primary goal be to have client code as simple as
possible (in most cases)?

--
Olaf

Dean Michael Berris

unread,
Jan 24, 2013, 5:01:56 AM1/24/13
to std-pr...@isocpp.org
On Thu, Jan 24, 2013 at 8:43 PM, Olaf van der Spek <olafv...@gmail.com> wrote:
> On Thu, Jan 24, 2013 at 10:33 AM, Nicol Bolas <jmck...@gmail.com> wrote:
>>> That's my question: does the majority of client code make that
>>> distinction?
>>
>>
>> Does it have to? Does it matter what the majority of client code does?
>
> Yes
> If say 90% of the cases don't make that distinction then IMO that code
> should be as simple as possible. Optimize for the common case.
> Hence my suggestion to look at some client code.
>

You're not making an argument here. You're making rhetorical questions.

Code should be as simple as possible. How could it be more simple than
optional<T>? It's obviously the right idiom and the right type.

>> And more importantly, do we want them to?
>>
>> optional<T> should be idiomatic C++ for any return-by-value with this
>> 3-state paradigm. How frequently one expects the value to be present or
>> absent is irrelevant; this is the idiom that should be used. If the user
>> doesn't want to care whether the value is present or absent, they can use
>> the features of optional<T> to retrieve a default value. This will be
>> explicitly spelled out in their code, thus making it easy for the reader to
>> see what's happening.
>
> That feature of optional is about to be removed from the current
> optional proposal.
>

What feature? That an optional can be checked whether it's set?

optional<string_ref> maybe = uri.path();
string path = maybe? string(*maybe) : string("/");

Why is this not simple?

> Shouldn't the primary goal be to have client code as simple as
> possible (in most cases)?
>

You haven't shown me code that's "much more simple" than code using
optional. Line per line optional<> is simpler than any other
alternative conveying the same concept.

Vincent Jacquet

unread,
Jan 24, 2013, 5:09:08 AM1/24/13
to std-pr...@isocpp.org
"A path is always defined for a URI, though the defined path may be empty (zero length)" [§3.3, p23].

Therefore, this accessor should not return optional<string_ref>. So we all accessors returning optional<string_ref>, but one...

Isn't that a problem in itself?

Olaf van der Spek

unread,
Jan 24, 2013, 5:23:46 AM1/24/13
to std-pr...@isocpp.org
On Thu, Jan 24, 2013 at 11:01 AM, Dean Michael Berris
<dbe...@google.com> wrote:
> On Thu, Jan 24, 2013 at 8:43 PM, Olaf van der Spek <olafv...@gmail.com> wrote:
>> On Thu, Jan 24, 2013 at 10:33 AM, Nicol Bolas <jmck...@gmail.com> wrote:
>>>> That's my question: does the majority of client code make that
>>>> distinction?
>>>
>>>
>>> Does it have to? Does it matter what the majority of client code does?
>>
>> Yes
>> If say 90% of the cases don't make that distinction then IMO that code
>> should be as simple as possible. Optimize for the common case.
>> Hence my suggestion to look at some client code.
>>
>
> You're not making an argument here. You're making rhetorical questions.
>
> Code should be as simple as possible. How could it be more simple than
> optional<T>? It's obviously the right idiom and the right type.

optional<T> is just a return type, it's not client code. Speaking of
which, could you link me to some real-world client code, please?

>> That feature of optional is about to be removed from the current
>> optional proposal.
>>
>
> What feature? That an optional can be checked whether it's set?

get_value_or()

> optional<string_ref> maybe = uri.path();
> string path = maybe? string(*maybe) : string("/");
>
> Why is this not simple?

It's a lot less simple than string path = uri.path(); isn't it?
If there's no need to differentiate between null and empty path, then
that's all that's needed.

BTW, string path = uri.path().get_value_or("/"); is simpler than your
code (but about to be removed).

>> Shouldn't the primary goal be to have client code as simple as
>> possible (in most cases)?
>>
>
> You haven't shown me code that's "much more simple" than code using
> optional. Line per line optional<> is simpler than any other
> alternative conveying the same concept.

See above


--
Olaf

Dean Michael Berris

unread,
Jan 24, 2013, 5:29:26 AM1/24/13
to std-pr...@isocpp.org
On Thu, Jan 24, 2013 at 7:48 PM, Olaf van der Spek <olafv...@gmail.com> wrote:
> On Thu, Jan 24, 2013 at 9:39 AM, Dean Michael Berris <dbe...@google.com> wrote:
>>> > My implementation of an HTTP client already does.
>>>
>>> Where do I find your code?
>>>
>>
>> cpp-netlib.org
>
> Found that site, found github, but really can't find your
> "implementation of an HTTP client"
>
>>> >> And what approach would result in simpler client code, both if the
>>> >> code makes the distinction and if it doesn't.
>>> >
>>> > Consistency wise this is important.
>>>
>>> That doesn't answer the question, does it?
>>>
>>
>> It does. Consistent interfaces makes simpler client code. There's no point
>
> Consistency isn't the only thing that counts.

Yes it is. We're talking about standard interfaces here.

> Nobody will say that
> consistency is bad, but often it's a trade-off.
>

It's not a tradeoff -- if your design is not consistent, there is no
excuse for it. That is the hallmark of bad design. If you can't make
it consistent, there's a failure somewhere.

So sorry, you cannot trade consistency away. This is the reason why
the standard has been successful in defining algorithms and concepts
and this is why it has lasted this long. The algorithms have a
consistent design -- they work with iterators, they take predicates,
and they take binary or unary functions as parameters.

If you cannot make an API consistent, you have failed.

So consistency is a requirement. Then you have to model the solution
accordingly with the features available in the language.

C++ is type-safe, and so far the type-safe way of conveying
"potentially undefined value" is optional<T>.

>> in limiting the design when there's a perfectly acceptable and reasonable
>> implementation.
>>
>> Also note that these are URI instances not just specific to HTTP. So you
>> cannot make the assumption that all other schemes take the same approach as
>> HTTP in dealing with empty or nonexistent paths.
>
> I'm not making that assumption.
>

So why then would you want to have these "has_" checks when
optional<string_ref> is much more idiomatic and conveys the intent
exactly? I don't know what you're getting at.

>>> Ah. But your original proposal didn't, right?
>>>
>>
>> The original returns the range including the delimiters.
>
> Even for host, scheme and path?

The host part is not optional, because there will always be a host
part in a URI -- it may be empty though (mailto:user). The scheme is
required for URI's the last time I checked. You can't construct a URI
without a scheme, and the ":" delimiter. Paths can be omitted (again,
like mailto:someone@host).

I don't have the RFC memorized but I know Glyn took pains in modeling
the RFC as closely as he can. He's conveyed the intent in the
interface in a consistent manner with the RFC.

> Anyway, let's have a look at client code, please.
>

Look at your own peril:

https://github.com/cpp-netlib/cpp-netlib/blob/0.9-devel/boost/network/protocol/http/algorithms/linearize.hpp#L48

In this iteration the URI always returned a range (or if we had it, a
string_ref). This version has the problem that we don't handle cases
like this properly:

http://cpp-netlib.org/?

Because the case is that the accessor treats this case as
"http://cpp-netlib.org/" which is obviously wrong. This is an attempt
at cleaning this up:

https://github.com/cpp-netlib/cpp-netlib/blob/master/http/src/network/protocol/http/algorithms/linearize.hpp#L49

But the logic in this algorithm still has the same bug.

We're in the process of re-writing this, and this code will be much
cleaner with optionals now that the interface already provides it in
the URI library's current form.

Thank you for asking BTW.

Vincent Jacquet

unread,
Jan 24, 2013, 5:32:06 AM1/24/13
to std-pr...@isocpp.org, Vincent Jacquet, gly...@acm.org, dbe...@google.com, cme...@cmeerw.org
I think we all agree that knowing whether a component of the uri is defined matters.
It matter when the uri should be normalized (true for Scheme-Based Normalization [§6.2.3], but apparently not for Syntax-Based Normalization [§6.2.2]), when a uri reference should be transformed to a target uri [§5.2.2], when parsed uri components should be recomposed to a uri [§5.3].

But, when we need to parse the query string or retrieve the fragment, should we (always) process differently not defined and empty ?

Also, unless I am mistaken, the normalization, transformation and recomposition use cases should be encapsulated by the uri type.

If I were to make a function to parse the uri and return a tuple with different components, I'd go with optional<>. But aren't we talking about a uri *type*?
The filesytem's path class returns path for the different accessors (parent_path(), filename(), stem(), extension(), ...).
So, may be question is not whether to return string_ref or optional<string_ref>, but uri?

It would then be easy to do
uri q = source.query();
uri target = uri("http://www.example.com/") + q;
q would be "?name=ferret" 

Yes, there is the "?", because "name=ferret" cannot be parse to a uri, where "?name=ferret" can.

With the current proposal, how would I make "q"?
uri q = source.query() ? uri("?" + *source.query()) : uri("");


Regards,
Vincent

On Wednesday, January 23, 2013 10:01:48 AM UTC+1, Christof Meerwald wrote:
On Wed, Jan 23, 2013 at 12:33:02AM -0800, Vincent Jacquet wrote:
> I've found a lot of occurences of "empty" in RFC3986 ("the path may be empty
> (no characters)", page 16; "the registered name is empty (zero length)",
> page 21... ) but no occurences of the word "null".
> Could you please tell us which RFC you are referring to and where is the
> distinction between null and empty is made?

Sorry, null was the wrong term, RFC 3986 talks about empty and
undefined, for example in 5.3 "Note that we are careful to preserve
the distinction between a component that is undefined, meaning that
its separator was not present in the reference, and a component that
is empty, meaning that the separator was present and was immediately
followed by the next component separator or the end of the reference."

It probably needs to be reviewed if the modelling of what can be
undefined matches the RFC (I think a path can be empty, but not
undefined).


Dean Michael Berris

unread,
Jan 24, 2013, 5:36:10 AM1/24/13
to std-pr...@isocpp.org
On Thu, Jan 24, 2013 at 9:23 PM, Olaf van der Spek <olafv...@gmail.com> wrote:
> On Thu, Jan 24, 2013 at 11:01 AM, Dean Michael Berris
> <dbe...@google.com> wrote:
>> On Thu, Jan 24, 2013 at 8:43 PM, Olaf van der Spek <olafv...@gmail.com> wrote:
>>> On Thu, Jan 24, 2013 at 10:33 AM, Nicol Bolas <jmck...@gmail.com> wrote:
>>>>> That's my question: does the majority of client code make that
>>>>> distinction?
>>>>
>>>>
>>>> Does it have to? Does it matter what the majority of client code does?
>>>
>>> Yes
>>> If say 90% of the cases don't make that distinction then IMO that code
>>> should be as simple as possible. Optimize for the common case.
>>> Hence my suggestion to look at some client code.
>>>
>>
>> You're not making an argument here. You're making rhetorical questions.
>>
>> Code should be as simple as possible. How could it be more simple than
>> optional<T>? It's obviously the right idiom and the right type.
>
> optional<T> is just a return type, it's not client code. Speaking of
> which, could you link me to some real-world client code, please?
>

I just did.

>>> That feature of optional is about to be removed from the current
>>> optional proposal.
>>>
>>
>> What feature? That an optional can be checked whether it's set?
>
> get_value_or()
>

I wouldn't use this -- I've been living with Boost.Optional for a long
time and I've never needed this feature. So yes, it's fine that it
goes away. What's your point?

>> optional<string_ref> maybe = uri.path();
>> string path = maybe? string(*maybe) : string("/");
>>
>> Why is this not simple?
>
> It's a lot less simple than string path = uri.path(); isn't it?

But how do you convey that the URI never had a path defined? An empty
path is different from one that was not provided. Tell me what the
path is in this URI:

mailto:dbe...@google.com

> If there's no need to differentiate between null and empty path, then
> that's all that's needed.
>

But there is a need for that in the concept of a URI. We're being as
faithful to the model as we possibly can be, and there's no point in
excluding a state "just because it's not common".

> BTW, string path = uri.path().get_value_or("/"); is simpler than your
> code (but about to be removed).
>

I disagree. get_value_or should go away.

string_ref path = uri.path() ? *uri.path() : "/";

This is C++, not any other programming language. We have a ternary
operator -- we should use it.

>>> Shouldn't the primary goal be to have client code as simple as
>>> possible (in most cases)?
>>>
>>
>> You haven't shown me code that's "much more simple" than code using
>> optional. Line per line optional<> is simpler than any other
>> alternative conveying the same concept.
>
> See above
>

No, your code is verbose and doesn't convey the meaning properly. You
and I may have different opinions on what's simple -- a ternary
operator is a primitive, it works with pointers, values, and it's been
there since day one. C++ programmers don't need this "get_value_or"
member. It's easy to implement it with a ternary operator.

Olaf van der Spek

unread,
Jan 24, 2013, 5:47:26 AM1/24/13
to std-pr...@isocpp.org
On Thu, Jan 24, 2013 at 11:29 AM, Dean Michael Berris
<dbe...@google.com> wrote:
>> Consistency isn't the only thing that counts.
>
> Yes it is. We're talking about standard interfaces here.

So a consistent but unusable interface is good?

> So sorry, you cannot trade consistency away. This is the reason why
> the standard has been successful in defining algorithms and concepts
> and this is why it has lasted this long. The algorithms have a
> consistent design -- they work with iterators, they take predicates,
> and they take binary or unary functions as parameters.

Are you saying C++ doesn't have any inconsistencies?

>> Even for host, scheme and path?
>
> The host part is not optional, because there will always be a host
> part in a URI

What about relative URIs?
If it can't be optional, why is it returned as an optional?

> -- it may be empty though (mailto:user). The scheme is
> required for URI's the last time I checked.

My browser doesn't require me to enter a scheme. Does that mean it
can't use the URI class directly?

> https://github.com/cpp-netlib/cpp-netlib/blob/0.9-devel/boost/network/protocol/http/algorithms/linearize.hpp#L48

Thanks, but that doesn't look like normal client code. Where's the
HTTP client you spoke of?
To be honest I can't even find the usage of the URI class.

> We're in the process of re-writing this, and this code will be much
> cleaner with optionals now that the interface already provides it in
> the URI library's current form.

It'd be interesting to see the version with optional and compare it.

> Thank you for asking BTW.

You're welcome.


--
Olaf

Olaf van der Spek

unread,
Jan 24, 2013, 5:54:45 AM1/24/13
to std-pr...@isocpp.org
On Thu, Jan 24, 2013 at 11:36 AM, Dean Michael Berris
<dbe...@google.com> wrote:
>>> optional<string_ref> maybe = uri.path();
>>> string path = maybe? string(*maybe) : string("/");
>>>
>>> Why is this not simple?
>>
>> It's a lot less simple than string path = uri.path(); isn't it?
>
> But how do you convey that the URI never had a path defined? An empty
> path is different from one that was not provided.

What's the difference?

> Tell me what the
> path is in this URI:
>
> mailto:dbe...@google.com

Null or empty, I don't know the difference.

>> If there's no need to differentiate between null and empty path, then
>> that's all that's needed.
>>
>
> But there is a need for that in the concept of a URI. We're being as
> faithful to the model as we possibly can be, and there's no point in
> excluding a state "just because it's not common".

You're right, the *concept* should define it. This does not mean it
should use optional.

But not every user has to use it.

>> BTW, string path = uri.path().get_value_or("/"); is simpler than your
>> code (but about to be removed).
>>
>
> I disagree. get_value_or should go away.

> string_ref path = uri.path() ? *uri.path() : "/";
>
> This is C++, not any other programming language. We have a ternary
> operator -- we should use it.

Did you miss the discussion about get_value_or()?
The conditional operator isn't usable with rvalues.

>>>> Shouldn't the primary goal be to have client code as simple as
>>>> possible (in most cases)?
>>>>
>>>
>>> You haven't shown me code that's "much more simple" than code using
>>> optional. Line per line optional<> is simpler than any other
>>> alternative conveying the same concept.
>>
>> See above
>>
>
> No, your code is verbose and doesn't convey the meaning properly. You

It's shorter than yours.

> and I may have different opinions on what's simple -- a ternary
> operator is a primitive, it works with pointers, values, and it's been
> there since day one. C++ programmers don't need this "get_value_or"
> member. It's easy to implement it with a ternary operator.

The conditional operator isn't usable with rvalues.

--
Olaf

Christof Meerwald

unread,
Jan 24, 2013, 5:55:53 AM1/24/13
to std-pr...@isocpp.org, Dean Michael Berris
On Thu, Jan 24, 2013 at 09:36:10PM +1100, Dean Michael Berris wrote:
> But how do you convey that the URI never had a path defined? An empty
> path is different from one that was not provided. Tell me what the
> path is in this URI:
>
> mailto:dbe...@google.com

The path is "dbe...@google.com" - this case is similar to the xmpp
example in your proposal, also see RFC 3986, section 3.3:

"A path consists of a sequence of path segments separated by a slash
("/") character. A path is always defined for a URI, though the
defined path may be empty (zero length). Use of the slash character
to indicate hierarchy is only required when a URI will be used as
the context for relative references. For example, the URI
<mailto:fr...@example.com> has a path of "fr...@example.com", whereas
the URI <foo://info.example.com?fred> has an empty path."

Dean Michael Berris

unread,
Jan 24, 2013, 6:00:43 AM1/24/13
to std-pr...@isocpp.org
On Thu, Jan 24, 2013 at 9:47 PM, Olaf van der Spek <olafv...@gmail.com> wrote:
> On Thu, Jan 24, 2013 at 11:29 AM, Dean Michael Berris
> <dbe...@google.com> wrote:
>>> Consistency isn't the only thing that counts.
>>
>> Yes it is. We're talking about standard interfaces here.
>
> So a consistent but unusable interface is good?
>

Can you quantify "unusable"? I can quantify consistency in the
deviation from similar components.

Seriously I don't know why I bother answering these kinds of questions.

>> So sorry, you cannot trade consistency away. This is the reason why
>> the standard has been successful in defining algorithms and concepts
>> and this is why it has lasted this long. The algorithms have a
>> consistent design -- they work with iterators, they take predicates,
>> and they take binary or unary functions as parameters.
>
> Are you saying C++ doesn't have any inconsistencies?
>

Did I say that? I think you can read what I wrote.

>>> Even for host, scheme and path?
>>
>> The host part is not optional, because there will always be a host
>> part in a URI
>
> What about relative URIs?
> If it can't be optional, why is it returned as an optional?
>

Sorry, I had my brain around just absolule URI's. You're right,
they're all optional.

>> -- it may be empty though (mailto:user). The scheme is
>> required for URI's the last time I checked.
>
> My browser doesn't require me to enter a scheme. Does that mean it
> can't use the URI class directly?
>

What does this have to do with anything? Seriously, your browser -- if
it is a web browser we're talking about -- will pick a default scheme
(typically HTTP) to parse the thing you enter. It can use the URI type
directly if it wanted to use the correct

>> https://github.com/cpp-netlib/cpp-netlib/blob/0.9-devel/boost/network/protocol/http/algorithms/linearize.hpp#L48
>
> Thanks, but that doesn't look like normal client code. Where's the
> HTTP client you spoke of?

Really, you want me to spoon-feed you the links?

Done:

http://cpp-netlib.org/latest/reference/http_client.html

This is the facade:

https://github.com/cpp-netlib/cpp-netlib/blob/master/http/src/network/protocol/http/client/facade.hpp

This is the interface for the HTTP request object:

https://github.com/cpp-netlib/cpp-netlib/blob/master/http/src/network/protocol/http/request/request.hpp#L28

And this is the implementation of one of the accessors to a request object:

https://github.com/cpp-netlib/cpp-netlib/blob/master/http/src/network/protocol/http/message/wrappers/anchor.ipp

This is exactly how client code using the URI will look like.

> To be honest I can't even find the usage of the URI class.
>

See above.

>> We're in the process of re-writing this, and this code will be much
>> cleaner with optionals now that the interface already provides it in
>> the URI library's current form.
>
> It'd be interesting to see the version with optional and compare it.
>

See above.

Dean Michael Berris

unread,
Jan 24, 2013, 6:03:02 AM1/24/13
to Christof Meerwald, std-pr...@isocpp.org
On Thu, Jan 24, 2013 at 9:55 PM, Christof Meerwald <cme...@cmeerw.org> wrote:
> On Thu, Jan 24, 2013 at 09:36:10PM +1100, Dean Michael Berris wrote:
>> But how do you convey that the URI never had a path defined? An empty
>> path is different from one that was not provided. Tell me what the
>> path is in this URI:
>>
>> mailto:dbe...@google.com
>
> The path is "dbe...@google.com" - this case is similar to the xmpp
> example in your proposal, also see RFC 3986, section 3.3:
>
> "A path consists of a sequence of path segments separated by a slash
> ("/") character. A path is always defined for a URI, though the
> defined path may be empty (zero length). Use of the slash character
> to indicate hierarchy is only required when a URI will be used as
> the context for relative references. For example, the URI
> <mailto:fr...@example.com> has a path of "fr...@example.com", whereas
> the URI <foo://info.example.com?fred> has an empty path."
>

Gah, you're right. My bad.

I should have said "host" or "anchor".

Olaf van der Spek

unread,
Jan 24, 2013, 6:14:20 AM1/24/13
to std-pr...@isocpp.org
On Thu, Jan 24, 2013 at 12:00 PM, Dean Michael Berris
<dbe...@google.com> wrote:
>>> So sorry, you cannot trade consistency away. This is the reason why
>>> the standard has been successful in defining algorithms and concepts
>>> and this is why it has lasted this long. The algorithms have a
>>> consistent design -- they work with iterators, they take predicates,
>>> and they take binary or unary functions as parameters.
>>
>> Are you saying C++ doesn't have any inconsistencies?
>>
>
> Did I say that? I think you can read what I wrote.

You said consistency is the only thing that counts.
If that's true we'd better make sure C++ doesn't have any inconsistencies.


>> My browser doesn't require me to enter a scheme. Does that mean it
>> can't use the URI class directly?
>>
>
> What does this have to do with anything? Seriously, your browser -- if
> it is a web browser we're talking about -- will pick a default scheme
> (typically HTTP) to parse the thing you enter. It can use the URI type
> directly if it wanted to use the correct

The correct what?
I'm asking because you said scheme is required. Which made me assume
the URI class would not parse the user input (without scheme).

> And this is the implementation of one of the accessors to a request object:
>
> https://github.com/cpp-netlib/cpp-netlib/blob/master/http/src/network/protocol/http/message/wrappers/anchor.ipp
>
> This is exactly how client code using the URI will look like.

::network::uri uri_;
request_.get_uri(uri_);
auto fragment = uri_.fragment();
return fragment? std::string(*fragment) : std::string();

Looks like you map null to empty here. If fragment() would've returned
string_ref, the last two lines would be just return uri_.fragment()
The second line appears to prevent you from using a rvalue, otherwise
the entire function could've been just return
request_.get_uri().fragment()



--
Olaf

Dean Michael Berris

unread,
Jan 24, 2013, 6:14:30 AM1/24/13
to std-pr...@isocpp.org
On Thu, Jan 24, 2013 at 9:54 PM, Olaf van der Spek <olafv...@gmail.com> wrote:
> On Thu, Jan 24, 2013 at 11:36 AM, Dean Michael Berris
> <dbe...@google.com> wrote:
>>>> optional<string_ref> maybe = uri.path();
>>>> string path = maybe? string(*maybe) : string("/");
>>>>
>>>> Why is this not simple?
>>>
>>> It's a lot less simple than string path = uri.path(); isn't it?
>>
>> But how do you convey that the URI never had a path defined? An empty
>> path is different from one that was not provided.
>
> What's the difference?
>

That's dependent on the scheme. It may be important in some schemes,
it may not be important in others. Still it's a matter of
interpretation so you let the user decide.

>> Tell me what the
>> path is in this URI:
>>
>> mailto:dbe...@google.com
>
> Null or empty, I don't know the difference.
>

I should have picked "anchor".

The difference is "there's no anchor defined" and "there's an anchor
specified to be empty". It depends on the scheme. You should not throw
that information away especially if it may be important.

>>> If there's no need to differentiate between null and empty path, then
>>> that's all that's needed.
>>>
>>
>> But there is a need for that in the concept of a URI. We're being as
>> faithful to the model as we possibly can be, and there's no point in
>> excluding a state "just because it's not common".
>
> You're right, the *concept* should define it. This does not mean it
> should use optional.
>

But why not?

> But not every user has to use it.
>

Has to use what?

>>> BTW, string path = uri.path().get_value_or("/"); is simpler than your
>>> code (but about to be removed).
>>>
>>
>> I disagree. get_value_or should go away.
>
>> string_ref path = uri.path() ? *uri.path() : "/";
>>
>> This is C++, not any other programming language. We have a ternary
>> operator -- we should use it.
>
> Did you miss the discussion about get_value_or()?

I have it muted. I never needed get_value_or before, so I deemed it
not important.

> The conditional operator isn't usable with rvalues.
>

I don't see wording in the standard that says this. I'm looking at
5.16. What are you talking about "conditional operator isn't usable
with rvalues"?

>>>>> Shouldn't the primary goal be to have client code as simple as
>>>>> possible (in most cases)?
>>>>>
>>>>
>>>> You haven't shown me code that's "much more simple" than code using
>>>> optional. Line per line optional<> is simpler than any other
>>>> alternative conveying the same concept.
>>>
>>> See above
>>>
>>
>> No, your code is verbose and doesn't convey the meaning properly. You
>
> It's shorter than yours.
>

It's also totally unnecessary.

>> and I may have different opinions on what's simple -- a ternary
>> operator is a primitive, it works with pointers, values, and it's been
>> there since day one. C++ programmers don't need this "get_value_or"
>> member. It's easy to implement it with a ternary operator.
>
> The conditional operator isn't usable with rvalues.
>

Isn't usable? Please educate me of the wording in the standard that states this.

Ville Voutilainen

unread,
Jan 24, 2013, 6:22:56 AM1/24/13
to std-pr...@isocpp.org
On 24 January 2013 13:14, Dean Michael Berris <dbe...@google.com> wrote:
>> The conditional operator isn't usable with rvalues.
> I don't see wording in the standard that says this. I'm looking at
> 5.16. What are you talking about "conditional operator isn't usable
> with rvalues"?

That's not the point. You can't do

get_temp_value() ? magically_refer_to_that_value.something() : something_else();

You can do that with get_value_or(). That would speak _for_ returning
an optional<string_ref>
from the uri, as far as I can see - I can use get_value_or with it. If
I want to check whether
it is non-null and I have to do that with a conditional operator, I
have to store the return
value, whereas with get_value_or I don't have to do that, I can
compute the things I need
from a temporary.

Dean Michael Berris

unread,
Jan 24, 2013, 6:26:45 AM1/24/13
to std-pr...@isocpp.org
On Thu, Jan 24, 2013 at 10:14 PM, Olaf van der Spek
<olafv...@gmail.com> wrote:
> On Thu, Jan 24, 2013 at 12:00 PM, Dean Michael Berris
> <dbe...@google.com> wrote:
>>>> So sorry, you cannot trade consistency away. This is the reason why
>>>> the standard has been successful in defining algorithms and concepts
>>>> and this is why it has lasted this long. The algorithms have a
>>>> consistent design -- they work with iterators, they take predicates,
>>>> and they take binary or unary functions as parameters.
>>>
>>> Are you saying C++ doesn't have any inconsistencies?
>>>
>>
>> Did I say that? I think you can read what I wrote.
>
> You said consistency is the only thing that counts.

When designing APIs.

> If that's true we'd better make sure C++ doesn't have any inconsistencies.
>

You're putting words in my mouth. I was talking about designing APIs.

That being said, yes this is why there are committee members that are
dedicated in making this a reality. We're human and fallible too. If
you want to help, that would be very much appreciated.

>
>>> My browser doesn't require me to enter a scheme. Does that mean it
>>> can't use the URI class directly?
>>>
>>
>> What does this have to do with anything? Seriously, your browser -- if
>> it is a web browser we're talking about -- will pick a default scheme
>> (typically HTTP) to parse the thing you enter. It can use the URI type
>> directly if it wanted to use the correct
>
> The correct what?
> I'm asking because you said scheme is required. Which made me assume
> the URI class would not parse the user input (without scheme).
>

In my haste, I left that hanging.

It can use the URI type directly if it wanted to use the correct
enforcement of the RFC -- or more to the point, if it wanted to use it
as part of its internal APIs.

I realize the proposal actually supports relative URI's (I must admit
I let Glyn define that part) and that this is even now useful in the
context of an HTTP server. Of course it has to make assumptions on the
scheme and in that context it's definitely possible.

>> And this is the implementation of one of the accessors to a request object:
>>
>> https://github.com/cpp-netlib/cpp-netlib/blob/master/http/src/network/protocol/http/message/wrappers/anchor.ipp
>>
>> This is exactly how client code using the URI will look like.
>
> ::network::uri uri_;
> request_.get_uri(uri_);
> auto fragment = uri_.fragment();
> return fragment? std::string(*fragment) : std::string();
>
> Looks like you map null to empty here.

This is a short-coming. As I said this is in the process of being re-written.

> If fragment() would've returned
> string_ref, the last two lines would be just return uri_.fragment()

Yes. The string_ref implementation in cpp-netlib has only been
recently added. I have yet to use it throughout (or actually, just use
Boost.string_ref when that's released). That's off-topic though.

> The second line appears to prevent you from using a rvalue, otherwise
> the entire function could've been just return
> request_.get_uri().fragment()
>

No, this is just a matter of style. Clang compiles this just fine and
I can't see why it's a problem from reading the standard:

struct convertible {
operator bool () {
return true;
}
};

convertible f() {
return {};
}

int main() {
int x = f() ? 0 : 1;
}

Dean Michael Berris

unread,
Jan 24, 2013, 6:30:51 AM1/24/13
to std-pr...@isocpp.org
Hmmm... so what's the difference between that and:

auto value = get_temp_value();
value ? value.something() : something_else();

Or for that matter:

if (auto value = get_temp_value()) {
// more statements
} else {
// more statements
}

All I'm saying here is that whether there's a get_value_or() on
optional is irrelevant to whether the URI should return an optional.

Ville Voutilainen

unread,
Jan 24, 2013, 6:34:00 AM1/24/13
to std-pr...@isocpp.org
On 24 January 2013 13:30, Dean Michael Berris <dbe...@google.com> wrote:
>> That's not the point. You can't do
>> get_temp_value() ? magically_refer_to_that_value.something() : something_else();
> Hmmm... so what's the difference between that and:
> auto value = get_temp_value();
> value ? value.something() : something_else();

Seriously? The difference is glaringly visible, you need two
statements and a temporary
variable there, as opposed to having an expression there. :)

> All I'm saying here is that whether there's a get_value_or() on
> optional is irrelevant to whether the URI should return an optional.

I'm saying being able to use get_value_or with the return type of URI
is a big plus. You can't achieve that with has_* functions and such.

Olaf van der Spek

unread,
Jan 24, 2013, 6:36:53 AM1/24/13
to std-pr...@isocpp.org
On Thu, Jan 24, 2013 at 12:26 PM, Dean Michael Berris
<dbe...@google.com> wrote:
>>> And this is the implementation of one of the accessors to a request object:
>>>
>>> https://github.com/cpp-netlib/cpp-netlib/blob/master/http/src/network/protocol/http/message/wrappers/anchor.ipp
>>>
>>> This is exactly how client code using the URI will look like.
>>
>> ::network::uri uri_;
>> request_.get_uri(uri_);
>> auto fragment = uri_.fragment();
>> return fragment? std::string(*fragment) : std::string();
>>
>> Looks like you map null to empty here.
>
> This is a short-coming. As I said this is in the process of being re-written.

Let's wait for the re-written code then.

>> If fragment() would've returned
>> string_ref, the last two lines would be just return uri_.fragment()
>
> Yes. The string_ref implementation in cpp-netlib has only been
> recently added. I have yet to use it throughout (or actually, just use
> Boost.string_ref when that's released). That's off-topic though.

Is it? The point is that return uri_.fragment() is only possible if
fragment() doesn't return an optional.

>> The second line appears to prevent you from using a rvalue, otherwise
>> the entire function could've been just return
>> request_.get_uri().fragment()
>>
>
> No, this is just a matter of style. Clang compiles this just fine and
> I can't see why it's a problem from reading the standard:

Of course it compiles just fine, that's not the point. The point is simplicity.

BTW, what coding style is that? Are you using trailing underscores for
both local- and member vars?


--
Olaf

Dean Michael Berris

unread,
Jan 24, 2013, 6:39:46 AM1/24/13
to std-pr...@isocpp.org
On Thu, Jan 24, 2013 at 10:34 PM, Ville Voutilainen
<ville.vo...@gmail.com> wrote:
> On 24 January 2013 13:30, Dean Michael Berris <dbe...@google.com> wrote:
>>> That's not the point. You can't do
>>> get_temp_value() ? magically_refer_to_that_value.something() : something_else();
>> Hmmm... so what's the difference between that and:
>> auto value = get_temp_value();
>> value ? value.something() : something_else();
>
> Seriously? The difference is glaringly visible, you need two
> statements and a temporary
> variable there, as opposed to having an expression there. :)
>

I know what the actual difference is, I'm asking what the conceptual
difference is.

Would something like "optional.get(default_value)" (like how Python
dict's work) make better sense? I don't know.

Maybe I'm just blinded by familiarity with Boost.Optional and the
preference for clear constructs.

I'm just saying I won't miss it because I've never had it. ;)

>> All I'm saying here is that whether there's a get_value_or() on
>> optional is irrelevant to whether the URI should return an optional.
>
> I'm saying being able to use get_value_or with the return type of URI
> is a big plus. You can't achieve that with has_* functions and such.
>

We're in violent agreement then. :)

Dean Michael Berris

unread,
Jan 24, 2013, 6:44:13 AM1/24/13
to std-pr...@isocpp.org
On Thu, Jan 24, 2013 at 10:36 PM, Olaf van der Spek
<olafv...@gmail.com> wrote:
> On Thu, Jan 24, 2013 at 12:26 PM, Dean Michael Berris
> <dbe...@google.com> wrote:
>>>> And this is the implementation of one of the accessors to a request object:
>>>>
>>>> https://github.com/cpp-netlib/cpp-netlib/blob/master/http/src/network/protocol/http/message/wrappers/anchor.ipp
>>>>
>>>> This is exactly how client code using the URI will look like.
>>>
>>> ::network::uri uri_;
>>> request_.get_uri(uri_);
>>> auto fragment = uri_.fragment();
>>> return fragment? std::string(*fragment) : std::string();
>>>
>>> Looks like you map null to empty here.
>>
>> This is a short-coming. As I said this is in the process of being re-written.
>
> Let's wait for the re-written code then.
>

Let's.

>>> If fragment() would've returned
>>> string_ref, the last two lines would be just return uri_.fragment()
>>
>> Yes. The string_ref implementation in cpp-netlib has only been
>> recently added. I have yet to use it throughout (or actually, just use
>> Boost.string_ref when that's released). That's off-topic though.
>
> Is it? The point is that return uri_.fragment() is only possible if
> fragment() doesn't return an optional.
>

Uh, if I was going to return an optional I'll just return an optional.
That's what's going to happen anyway. This is a short-coming of the
current implementation. This was written when there was no string_ref
and it didn't occur to me at the time that returning an optional would
have been a better implementation.

I certainly have learned quite a bit between then and now. ;)

>>> The second line appears to prevent you from using a rvalue, otherwise
>>> the entire function could've been just return
>>> request_.get_uri().fragment()
>>>
>>
>> No, this is just a matter of style. Clang compiles this just fine and
>> I can't see why it's a problem from reading the standard:
>
> Of course it compiles just fine, that's not the point. The point is simplicity.
>

It's simple to me. I don't get it.

> BTW, what coding style is that? Are you using trailing underscores for
> both local- and member vars?
>

This is off-topic.

Ville Voutilainen

unread,
Jan 24, 2013, 7:00:03 AM1/24/13
to std-pr...@isocpp.org
On 24 January 2013 13:39, Dean Michael Berris <dbe...@google.com> wrote:
> Maybe I'm just blinded by familiarity with Boost.Optional and the
> preference for clear constructs.
> I'm just saying I won't miss it because I've never had it. ;)

Indeed. I will miss it because I have written enough value extractor
helper functions.
Widening horizons further beyond boost, I also have QSettings::value
that can take
a default value, and QVariant::value which can default-construct a
value. Such things
have been found rather useful, and conditional operator doesn't quite cut it.

>> I'm saying being able to use get_value_or with the return type of URI
>> is a big plus. You can't achieve that with has_* functions and such.
> We're in violent agreement then. :)

Yep! :)

Klaim - Joël Lamotte

unread,
Jan 24, 2013, 7:26:49 AM1/24/13
to std-pr...@isocpp.org, Vincent Jacquet, gly...@acm.org, dbe...@google.com, cme...@cmeerw.org

On Thu, Jan 24, 2013 at 11:32 AM, Vincent Jacquet <vjac...@flowgroup.fr> wrote:
Also, unless I am mistaken, the normalization, transformation and recomposition use cases should be encapsulated by the uri type.

If I were to make a function to parse the uri and return a tuple with different components, I'd go with optional<>. But aren't we talking about a uri *type*?
The filesytem's path class returns path for the different accessors (parent_path(), filename(), stem(), extension(), ...).
So, may be question is not whether to return string_ref or optional<string_ref>, but uri?

It would then be easy to do
uri q = source.query();
uri target = uri("http://www.example.com/") + q;
q would be "?name=ferret" 

Yes, there is the "?", because "name=ferret" cannot be parse to a uri, where "?name=ferret" can.

With the current proposal, how would I make "q"?
uri q = source.query() ? uri("?" + *source.query()) : uri("");

This is interesting.
Dean and Glynn, did you consider returning uri?

Joel Lamotte

Dean Michael Berris

unread,
Jan 24, 2013, 7:42:06 AM1/24/13
to Klaim - Joël Lamotte, std-pr...@isocpp.org, Vincent Jacquet, gly...@acm.org, Christof Meerwald
No. Just off the top of my head, I think this is wasteful.

Nothing's wrong with returning an optional<string_ref>. If there's
something wrong with optional, then it should be fixed but the concept
stays the same and is consistent as far as I understand.

Vincent Jacquet

unread,
Jan 24, 2013, 9:49:31 AM1/24/13
to std-pr...@isocpp.org, Klaim - Joël Lamotte, Vincent Jacquet, gly...@acm.org, Christof Meerwald
Speaking of *uri*, I sincerly think there is something wrong with accessor returning an optional<string_ref>: the string_ref part!

I want to manipulate uri, not its components, because the manipulation of the components of a uri is scheme dependent, and I do not want to write scheme dependent code all over the place.

It may be a little extreme but please consider:
"W3C recommends that all web servers support semicolon separators in addition to ampersand separators[6] to allow application/x-www-form-urlencoded query strings in URLs within HTML documents without having to entity escape ampersands." (see <http://en.wikipedia.org/wiki/Query_string>).
I didn't knew that until 5 minutes ago. If I had started to parse the query string into a multi_map, my parsing would have been incomplete.

IMVHO the design of the uri type should not encourage to access the components as a string (or string_ref or optional<string_ref>), because it is way too tempting to "handle it myself". Ideally, it should give me a scheme object that would interpret its parts.


Regards,
Vincent

Glyn Matthews

unread,
Jan 24, 2013, 10:57:10 AM1/24/13
to std-pr...@isocpp.org
On 24 January 2013 11:32, Vincent Jacquet <vjac...@flowgroup.fr> wrote:
I think we all agree that knowing whether a component of the uri is defined matters.
It matter when the uri should be normalized (true for Scheme-Based Normalization [§6.2.3], but apparently not for Syntax-Based Normalization [§6.2.2]), when a uri reference should be transformed to a target uri [§5.2.2], when parsed uri components should be recomposed to a uri [§5.3].

But, when we need to parse the query string or retrieve the fragment, should we (always) process differently not defined and empty ?

Also, unless I am mistaken, the normalization, transformation and recomposition use cases should be encapsulated by the uri type.

If I were to make a function to parse the uri and return a tuple with different components, I'd go with optional<>. But aren't we talking about a uri *type*?
The filesytem's path class returns path for the different accessors (parent_path(), filename(), stem(), extension(), ...).
So, may be question is not whether to return string_ref or optional<string_ref>, but uri?

It would then be easy to do
uri q = source.query();
uri target = uri("http://www.example.com/") + q;
q would be "?name=ferret" 

Yes, there is the "?", because "name=ferret" cannot be parse to a uri, where "?name=ferret" can.

With the current proposal, how would I make "q"?
uri q = source.query() ? uri("?" + *source.query()) : uri("");



I acknowledge that this isn't clear in the current proposal and I thank you for your question, but what you ask above should be able to be done by using the uri_builder:

std::network::uri_builder builder;
builder.query("name=ferret"); // the builder query method should be able to add the prefix ? automatically if it is missing
std::network::uri q = builder.uri();

And to make "target":

std::network::uri base("http://www.example.com/");
std::network::uri target = base.resolve(q);

Regards,
Glyn 

Glyn Matthews

unread,
Jan 24, 2013, 10:57:16 AM1/24/13
to std-pr...@isocpp.org
On 24 January 2013 12:26, Dean Michael Berris <dbe...@google.com> wrote:


It can use the URI type directly if it wanted to use the correct
enforcement of the RFC -- or more to the point, if it wanted to use it
as part of its internal APIs.

I realize the proposal actually supports relative URI's (I must admit
I let Glyn define that part) and that this is even now useful in the
context of an HTTP server. Of course it has to make assumptions on the
scheme and in that context it's definitely possible.


The proposal does support relative URIs. Relative URIs can be created in one of two ways:

1. As the return value of "uri::relativize"
2. By using "uri_builder"

I will acknowledge now that that "uri_builder" is a weak part of this proposal as it doesn't go far enough into the details of how to validate each part, and I hope to encourage comments on this part of the proposal.

Regards,
Glyn


Glyn Matthews

unread,
Jan 24, 2013, 10:57:22 AM1/24/13
to std-pr...@isocpp.org
No, I always wanted to return a reference to the underlying string. Returning a uri object means an additional string copy.

Regards,
Glyn


Glyn Matthews

unread,
Jan 24, 2013, 10:57:32 AM1/24/13
to std-pr...@isocpp.org
On 24 January 2013 15:49, Vincent Jacquet <vjac...@flowgroup.fr> wrote:
Speaking of *uri*, I sincerly think there is something wrong with accessor returning an optional<string_ref>: the string_ref part!

I want to manipulate uri, not its components, because the manipulation of the components of a uri is scheme dependent, and I do not want to write scheme dependent code all over the place.

It may be a little extreme but please consider:
"W3C recommends that all web servers support semicolon separators in addition to ampersand separators[6] to allow application/x-www-form-urlencoded query strings in URLs within HTML documents without having to entity escape ampersands." (see <http://en.wikipedia.org/wiki/Query_string>).
I didn't knew that until 5 minutes ago. If I had started to parse the query string into a multi_map, my parsing would have been incomplete.


I was surprised to see this as well when I first encountered it while implementing the URI in cpp-netlib. However, this doesn't really affect this proposal as this is a scheme-specific detail and the proposed uri is limited to being generic.
 
IMVHO the design of the uri type should not encourage to access the components as a string (or string_ref or optional<string_ref>), because it is way too tempting to "handle it myself". Ideally, it should give me a scheme object that would interpret its parts.


I'm sorry, I don't understand what you want in the last sentence. If you want scheme-specific processing, I think there is definitely room for other proposals for scheme-specific URIs, particularly for HTTP.

Regards,
Glyn

Tony V E

unread,
Jan 24, 2013, 12:26:07 PM1/24/13
to std-pr...@isocpp.org
On Wed, Jan 23, 2013 at 5:33 PM, Dean Michael Berris <dbe...@google.com> wrote:
>
> std::optional<std::string_ref> is perfect in this use case. Here's the
> list of reasons:
>
> - If the URI has a part, it should return a reference to that part of
> the string.
> - If the URI does not have that part explicitly defined, you return an
> uninitialised std::optional<>.
> - If the URI does have that part explicitly defined but empty, you
> return a std::optional<> that had a default-constructed
> std::string_ref.
>
> It's also good for value semantics.
>
> I'm opposed to adding more members to the URI class unless it's
> absolutely necessary. std::optional<> does this nicely for us
> conveying the correct semantics. It keeps the URI class simple and
> easy to understand and easy to keep a mental model of.
>

Note that if string_ref itself distinquished between null and empty,
we wouldn't need optional at all.
ie a string_ref which is pointer + length would be:

[ 0xabcd1234, 0 ] = points to one-past-question-mark, has length 0
[ 0, 0 ] = no question-mark in URI

And similar for [ begin, end ] style of string_ref:

[ 0xabcd1234, 0xabcd1234 ]
[ 0, 0 ]

Tony

Vincent Jacquet

unread,
Jan 24, 2013, 12:47:35 PM1/24/13
to std-pr...@isocpp.org, gly...@acm.org

No, I always wanted to return a reference to the underlying string. Returning a uri object means an additional string copy.

Regards,
Glyn 


There might be a few more assignments involved but, if your uri holds a shared_ptr<char*>, you might not need an additional string copy. Your assertion might be implementation specific. Anyway, I agree it would cost more to create a new uri than to return string_ref or optional<string_ref>.

I am currious: you are saying that you always wanted to return a reference to the underlying string. Why? What do you want to do with it?

Thanks,
Vincent

Nevin Liber

unread,
Jan 24, 2013, 12:52:22 PM1/24/13
to std-pr...@isocpp.org
On 24 January 2013 11:26, Tony V E <tvan...@gmail.com> wrote:

Note that if string_ref itself distinquished between null and empty,
we wouldn't need optional at all.

That just moves the problem.
 
ie a string_ref which is pointer + length would be:

[ 0xabcd1234, 0 ]   = points to one-past-question-mark, has length 0
[ 0, 0 ] = no question-mark in URI

It also means one cannot easily create a string_ref out of a vector<char> (since vector.data() is allowed to return nullptr).

If the URI library wants to reserve its own special address for meaning a "null" string_ref for itself, that would be fine, but I wouldn't use nullptr as that address.
--
 Nevin ":-)" Liber  <mailto:ne...@eviloverlord.com(847) 691-1404

Olaf van der Spek

unread,
Jan 24, 2013, 1:05:35 PM1/24/13
to std-pr...@isocpp.org
On Thu, Jan 24, 2013 at 6:52 PM, Nevin Liber <ne...@eviloverlord.com> wrote:
> That just moves the problem.

Why? IMO it'd solve it.

> It also means one cannot easily create a string_ref out of a vector<char>
> (since vector.data() is allowed to return nullptr).

Why not?
It may just mean the resulting string_ref would be null.

--
Olaf

Nevin Liber

unread,
Jan 24, 2013, 1:35:05 PM1/24/13
to std-pr...@isocpp.org

Because that is not what is meant by vector<char>, as it has doesn't model the nullable concept.  For a typical implementation:

vector<char> v;
string_ref vnull(v.data(), v.size());
v.reserve(1);
string_ref vempty(v.data(), v.size());

It is unexpected that vnull and vempty represent two different things, and that is only because of a side effect of the implementation.

Vincent Jacquet

unread,
Jan 24, 2013, 3:22:13 PM1/24/13
to std-pr...@isocpp.org, gly...@acm.org
"this is a scheme-specific detail and the proposed uri is limited to being generic". 
How do you implement scheme_based_normalization or protocol_based_normalization then ?

Glyn Matthews

unread,
Jan 25, 2013, 8:44:17 AM1/25/13
to std-pr...@isocpp.org
That was what I wanted when I first started implementing the URI in cpp-netlib, long before I'd heard of string_ref. The only reason is one of efficiency, I saw no need to copy each string part as some other implementations do.

Regards,
Glyn

Glyn Matthews

unread,
Jan 25, 2013, 8:44:26 AM1/25/13
to std-pr...@isocpp.org
On 24 January 2013 21:22, Vincent Jacquet <vjac...@flowgroup.fr> wrote:
"this is a scheme-specific detail and the proposed uri is limited to being generic". 
How do you implement scheme_based_normalization or protocol_based_normalization then ?

 
The proposal currently states that it won't implement each of these steps:


An open question therefore is how to extend the proposal to allow at least scheme-based normalization.

Regards,
Gltyn

Olaf van der Spek

unread,
Jan 25, 2013, 8:47:07 AM1/25/13
to std-pr...@isocpp.org
On Fri, Jan 25, 2013 at 2:44 PM, Glyn Matthews <gly...@acm.org> wrote:
>> I am currious: you are saying that you always wanted to return a reference
>> to the underlying string. Why? What do you want to do with it?
>>
>
> That was what I wanted when I first started implementing the URI in
> cpp-netlib, long before I'd heard of string_ref. The only reason is one of
> efficiency, I saw no need to copy each string part as some other
> implementations do.

But the implementation still copies the input string, right?
Would there be a way to avoid this copy too?

--
Olaf
Reply all
Reply to author
Forward
0 new messages