[boost] boost.url review

128 views
Skip to first unread message

Klemens Morgenstern via Boost

unread,
Aug 12, 2022, 10:30:13 PM8/12/22
to bo...@lists.boost.org, Klemens Morgenstern
Hi all,

the formal boost review of the boost.url library starts today, the 13th of
August and will last until and including the 22nd of this month.

The library has been developed by Vinnie Falco and Alan de Freitas.
The master branch is frozen during the review and can be found here:
https://github.com/CPPAlliance/url

The current documentation can be found here: https://master.url.cpp.al/

Boost.URL is a portable C++ library which provides containers and
algorithms which model a "URL" and understands the various grammars related
to URLs and provides for validating and parsing of strings, manipulation of
URL strings, and algorithms operating on URLs such as normalization and
resolution.

Please explicitly state that you either *accept* or *reject* the inclusion
of this library into boost.
Also please indicate the time & effort spent on the evaluation and give the
reasons for your decision.

Some questions to consider:

- Does this library bring real benefit to C++ developers for real world
use-case?
- Do you have an application for this library?
- Does the API match with current best practices?
- Is the documentation helpful and clear?
- Did you try to use it? What problems or surprises did you encounter?
- What is your evaluation of the implementation?

More information about the Boost Formal Review Process can be found
at: http://www.boost.org/community/reviews.html

The review is open to anyone who is prepared to put in the work of
evaluating and reviewing the library. Prior experience in contributing to
Boost reviews is not a requirement.

Reviews can also be submitted privately to me, and I will not disclose who
sent them.
I might however cite passages from those reviews in my conclusion.

Thanks to Vinnie & Alan for providing us with a new library & thanks for
all the reviews in advance.

Klemens

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Vinnie Falco via Boost

unread,
Aug 13, 2022, 12:13:58 AM8/13/22
to bo...@lists.boost.org, Vinnie Falco
On Fri, Aug 12, 2022 at 7:30 PM Klemens Morgenstern via Boost
<bo...@lists.boost.org> wrote:
> ...

The library depends on the tip of the current Boost master branch (in
particular, constexpr support in boost::core::empty_value) - the
documentation does not reflect this, and Boost.URLs master branch is
frozen.

Thanks

Vinnie Falco via Boost

unread,
Aug 14, 2022, 10:17:49 AM8/14/22
to bo...@lists.boost.org, Vinnie Falco
On Fri, Aug 12, 2022 at 7:30 PM Klemens Morgenstern via Boost
<bo...@lists.boost.org> wrote:
> https://github.com/CPPAlliance/url

There was a hiccup - the library requires a constexpr
boost::core::empty_value, which is available in the developmental
superproject but not the latest Boost release. Therefore, to ensure
ease of installation and usage of the library during the review I have
reverted a commit on the Boost.URL master branch so that the library
will work with the just-released Boost 1.80.

This does not affect the public API or documentation.

Thanks

Ruben Perez via Boost

unread,
Aug 16, 2022, 2:03:48 PM8/16/22
to boost@lists.boost.org List, Ruben Perez
Hi all,

Here is my review of Boost.Url.

> - Does this library bring real benefit to C++ developers for real world
> use-case? Do you have an application for this library?

I'm happy to see this library being proposed. I strongly believe
we need such functionality in Boost. If it gets accepted, I'm
planning on using it in Boost.Mysql to create type-erased
connections from connection URLs (such as
mysql://user@localhost:3306/database?ssl-mode=disable).
I think URL functionality is a must for the C++ for web environment
we're developing.

- Does the API match with current best practices?

The API is clear and concise. It allows parsing and modifying
URLs in a pretty nice way.

I've found the naming for view containers confusing (e.g.
params and params_view). When I saw it, I thought params
would take ownership of the query string params buffer, while
params_view would not. If I've understood correctly, the difference
is that params allows for modification (as it's returned from url containers)
and params_view does not (as it's returned from url_view). So it's
more about mutability than ownership. If that's the case, I would
either emphasize it in the docs, or rename the containers to
something that reflects this fact (e.g. const_params and mutable_params,
as asio does).

I'm happy to see percent-encoding standalone functions being exposed,
as this is something present in almost all other languages. I'd consider
exposing higher-level functions here, maybe taking a lvalue reference
to a std::string, allowing the user to percent-encode strings easier
(so they resemble JavaScript encodeURIComponent).

Something I've found when playing with mutating URLs is that modifying
a query parameter to a certain value is a little bit verbose:

url u; // get it from somewhere
auto it = u.params().find("k");
if (it == u.params().end())
{
u.params().append("k", "value");
}
else
{
u.params().replace_value(it, "value");
}

I had this use case in the last web application I wrote (although it
wasn't C++).
I don't know if this is a common use case, but I'd consider adding a function
that simplifies it (i.e. params::set_value(string_view key, string_view value).

I'm not very keen on exposing the parsing infrastructure and helper
functions (other than percent encoding/decoding) as public API,
as I think it violates the principle of API surface minimization.
I understand that these parsing facilities may help libraries
such as HTTPProto, but I'm not sure if the general public should
be using them. I may be missing something, but I haven't
found the example on magnet links very compelling - I feel
the same result could be achieved in a cleaner way by using
the higher level API. If the parsing facilities are to be kept
public, I think we need a more compelling example for them.
From reading this comment
https://github.com/CPPAlliance/url/issues/379#issuecomment-1212026752
I get the impression that some schemes have different reserved
character sets, and that these functions help with that problem
- maybe an example of such an scheme could be helpful.

I can see other public types in the reference such as recycled,
recycled_ptr and aligned_storage, which again have the feeling
should not be exposed as public types.

- Is the documentation helpful and clear?

The documentation is good and the reference is complete, but I miss
some examples. We've just got two examples, and one of them is
about a very concrete feature that most users will likely not be using
day-to-day.

In particular, I miss an example (and possibly a page) about composing
and mutating URLs (starting with a base URL, add some segments and query
parameters, emphasizing on when percent encoding gets applied).
This is a pretty typical task when you're consuming a third party
REST API.

I'd add a page and an example on percent encoding functions, as they
are only explained in the reference and the explanation assumes
familiarity with the CharSet concept.

As there are several container view types, I would also suggest
creating a comparison table, like in json::value
(https://www.boost.org/doc/libs/1_80_0/libs/json/doc/html/json/dom/value.html)
(I've found that one particularly useful).

The docstring for segments::insert seems to be wrong
(https://master.url.cpp.al/url/ref/boost__urls__segments/insert/overload1.html),
as I've passed a non-percent encoded string and it has accepted it,
encoding the value. It seems an incorrect copy/paste from
segments_encoded::insert. params::insert overloads with key only
and with key/value also seem to have swapped docstrings.

I'm also curious about the security review that is announced
in the main page - when is that going to take place?

- Did you try to use it? What problems or surprises did you encounter?

I used it in a toy project via CMake FetchContent, with Boost 1.80.0,
Linux and clang12. I had no problems building it. I initially
tried `mkdir build && cmake -DCMAKE_PREFIX_PATH=path/to/boost ..`
but got errors about linking to non existent targets.

I only built the source (not the official examples and tests) and
my toy programs. I focused on composing and mutating URLs,
but also tried parsing and percent encoding. Except for the
mismatched docstrings I've already mentioned, no surprises.

- What is your evaluation of the implementation?

I haven't looked into it.

- How much effort did you put into your evaluation? A glance?
A quick reading? In-depth study?

I've dedicated half day to build and play with the library,
and had read most of the docs (and some code) beforehand.

- Are you knowledgeable about the problem domain?

I'm not a URL expert, just a regular URL user. I've written
several web applications where URL handling is a day-to-day task.

My recommendation is to CONDITIONALLY ACCEPT
this library as part of Boost, subject to reviewing which
types/functions in the lower-level API are exposed and which
ones are not.

Thanks Vinnie and Alan for your work.

Regards,
Ruben.

Vinnie Falco via Boost

unread,
Aug 16, 2022, 2:31:33 PM8/16/22
to Ruben Perez, Vinnie Falco, boost@lists.boost.org List
On Tue, Aug 16, 2022 at 11:03 AM Ruben Perez <rubenp...@gmail.com> wrote:
> Here is my review of Boost.Url.

Thank you for graciously contributing your time to review the library!

> I've found the naming for view containers confusing (e.g.
> params and params_view).

Well...yeah. I am not entirely happy about having 8 different
containers between the segments and params ranges. However the
decision was made to offer the user an interface that resembles
standard containers, so that standard algorithms could be composed on
them (more or less). I think that the ForwardRange and
BidirectionalRange interfaces were the right choice, so we followed
the design where it led us. The need for having read-only views,
modifiable views, over separate encoded versus unencoded strings left
us with the 8 containers.

> I would either emphasize it in the docs, or rename the containers to
> something that reflects this fact (e.g. const_params and mutable_params,
> as asio does).

I would much rather emphasize it in the docs, because writing "const"
and "mutable" over and over is bound to be a chore. Asio gets away
with it because the mutability of buffers is central to the role of
asynchronous operations but I'm not so sure the same applies here.

We are very happy to hear proposals for alternate container names,
especially if they are complete proposals (give an alternative for
each existing name) and if necessary what the url_view and url member
functions would be renamed to.

> I'd consider exposing higher-level functions here, maybe taking a lvalue
> reference to a std::string, allowing the user to percent-encode strings easier
> (so they resemble JavaScript encodeURIComponent).

Yes, I am not joyous about our current percent-encoding APIs. There's
a lot of them, with small variations, and some simple operations are
missing (appending or assigning to a MutableString out-param as you
pointed out). They are still great but I think they could be greater.
We will work on them some more, and if you or anyone else would like
to provide input into that process, it is welcomed.

> I don't know if this is a common use case, but I'd consider adding a function
> that simplifies it (i.e. params::set_value(string_view key, string_view value).

Yep, we need that! There are open questions (like what do you do if
more than one key exists). Here is an open issue for it:

<https://github.com/CPPAlliance/url/issues/412>

> I'm not very keen on exposing the parsing infrastructure and helper
> functions (other than percent encoding/decoding) as public API,
> as I think it violates the principle of API surface minimization.

I feel you there but they have to go somewhere because the
alternatives are worse. It isn't perfect, I know.

> The documentation is good and the reference is complete, but I miss
> some examples.

Yep. Its not easy to come up with a complete application that just
does stuff with URLs.

> I'd add a page and an example on percent encoding functions, as they
> are only explained in the reference and the explanation assumes
> familiarity with the CharSet concept.

Yes I think these routines need some tidying and docs.

> I'm also curious about the security review that is announced
> in the main page - when is that going to take place?

Once the code settles down a little, if the library is accepted then
the version which first appears in Boost would be reviewed. Possibly
the second release if the company doing the review is very busy.

Regards

Ruben Perez via Boost

unread,
Aug 16, 2022, 5:27:16 PM8/16/22
to Vinnie Falco, Ruben Perez, boost@lists.boost.org List
>
> > I would either emphasize it in the docs, or rename the containers to
> > something that reflects this fact (e.g. const_params and mutable_params,
> > as asio does).
>
> I would much rather emphasize it in the docs, because writing "const"
> and "mutable" over and over is bound to be a chore. Asio gets away
> with it because the mutability of buffers is central to the role of
> asynchronous operations but I'm not so sure the same applies here.
>
> We are very happy to hear proposals for alternate container names,
> especially if they are complete proposals (give an alternative for
> each existing name) and if necessary what the url_view and url member
> functions would be renamed to.

I would suggest leaving url and url_view as-is, and replacing the "view"
part for "const" for the rest of the containers:
- params stays as is
- params_view becomes const_params
- params_encoded stays as is
- params_encoded_view becomes const_params_encoded

And so on; I'd leave the member function names as they are.

>
> > I don't know if this is a common use case, but I'd consider adding a function
> > that simplifies it (i.e. params::set_value(string_view key, string_view value).
>
> Yep, we need that! There are open questions (like what do you do if
> more than one key exists). Here is an open issue for it:
>
> <https://github.com/CPPAlliance/url/issues/412>

Just commented there.

>
> > I'm not very keen on exposing the parsing infrastructure and helper
> > functions (other than percent encoding/decoding) as public API,
> > as I think it violates the principle of API surface minimization.
>
> I feel you there but they have to go somewhere because the
> alternatives are worse. It isn't perfect, I know.

Are the functions intended to be used just by other Boost libraries,
or also the general public? If the former is the case, could they
not be separated into a BSL-licensed repository shared between
libraries and just used as a submodule? I understand you don't want
to put them into a separate Boost library because of the review process
it would involve and the overlapping it would create, but I'd say
a submodule repo could work-around it and not violate any
of the Boost guidelines (AFAIK).

>
> > The documentation is good and the reference is complete, but I miss
> > some examples.
>
> Yep. Its not easy to come up with a complete application that just
> does stuff with URLs.

I would find this useful (this builds a QR code using Google's
Infographic API):


#include <boost/url.hpp>
#include <iostream>

using namespace boost::urls;

int
main(int argc, char **argv)
{
url u ("https://chart.googleapis.com/");
u.segments().insert(u.segments().end(), "charts");
u.params().append("cht", "qr");
u.params().append("chs", "600x600");
u.params().append("chs", argv[1]);

std::cout << u << std::endl;
}

Regards,
Ruben.

Vinnie Falco via Boost

unread,
Aug 16, 2022, 6:15:54 PM8/16/22
to Ruben Perez, Vinnie Falco, boost@lists.boost.org List
On Tue, Aug 16, 2022 at 2:27 PM Ruben Perez <rubenp...@gmail.com> wrote:
> Are the functions intended to be used just by other Boost libraries,
> or also the general public?

Some enterprising folks are going to want to use them to do things
like say, parse cookies, parse URLs-within-URLs (like when you put a
whole URL in the fragment), parse a larger grammar that contains URLs,
or even just parse things that are not URLs but they are already using
Boost.URL (and Boost.Beast and Boost.JSON) for their server and the
algorithms do exactly what they want with the performance and runtime
characteristics which are ideal for the use-case.

> If the former is the case, could they
> not be separated into a BSL-licensed repository shared between
> libraries and just used as a submodule?

That is the equivalent of just making them private. Which I could
easily do, just remove them from the docs and that's that. Submodules
in boost library include/ directories don't work out too well (I'm not
sure they are even allowed).

> I understand you don't want to put them into a separate Boost
> library because of the review process it would involve and the
> overlapping it would create

I think my current trajectory is to leave it the way it is for now,
continue building on it in other soon to be proposed libraries to
validate the design, and continue to explore new use cases. People
keep wanting to parse things, but Beast's offerings are subpar:

<https://github.com/boostorg/beast/blob/76043dec2cf67a2ba33b32bdcc129f5f0027b8be/include/boost/beast/http/rfc7230.hpp>

The ideas in this rfc7230.hpp file are ancient and they have over time
turned into what you see in Boost.URL now. The code in Beast is
showing its age and doesn't do everything people want - I would point
them to this new code. We could even update Beast to use Boost.URL and
replace the ramshackle parsers with stuff that actually works although
I am reluctant to invest heavily in new things in Beast. On the other
hand some of the new parsers that I am writing could probably just be
copied in so there's that. For example this rule handles the legacy
"extra commas and whitespace allowed in lists" grammar which was
specified in the errata:

This could be ported directly to Beast. At some point I think I will
propose it as a separate library, and by then there will be several
downstream users so the design will have more evidence supporting it.

> I would find this useful (this builds a QR code using Google's
> Infographic API):
> ...

IDK.... that's a pretty small example that just does a printf after a
few mods to the URL... I feel like an example should have a lot more
meat on the bone. On the other hand we do need more examples...

Thanks

Vinnie Falco via Boost

unread,
Aug 16, 2022, 6:17:23 PM8/16/22
to Ruben Perez, Vinnie Falco, boost@lists.boost.org List
On Tue, Aug 16, 2022 at 3:15 PM Vinnie Falco <vinnie...@gmail.com> wrote:
> For example this rule handles the legacy
> "extra commas and whitespace allowed in lists" grammar which was
> specified in the errata:

Forgot to include the link to this rule implementation:

<https://github.com/CPPAlliance/http_proto/blob/7cd9b63bbf5924fb7519d6288c1605ff0b9253e9/include/boost/http_proto/rfc/list_rule.hpp#L21>

Ruben Perez via Boost

unread,
Aug 16, 2022, 7:27:52 PM8/16/22
to Vinnie Falco, Ruben Perez, boost@lists.boost.org List
>
>
>
> > I would find this useful (this builds a QR code using Google's
> > Infographic API):
> > ...
>
> IDK.... that's a pretty small example that just does a printf after a
> few mods to the URL... I feel like an example should have a lot more
> meat on the bone. On the other hand we do need more examples...
>

I disagree, small examples may be
even more useful to newcomers.
Having this would have saved me some
time when coding my use case.

Regards,
Ruben.

Alex Christensen via Boost

unread,
Aug 17, 2022, 1:47:53 AM8/17/22
to bo...@lists.boost.org, Alex Christensen
It seems like you decided not to handle anything related to non-ASCII characters. Are there plans to do this in a future library, or are there plans to update or add to this library? A lot has happened since RFC 3986.

Andrzej Krzemienski via Boost

unread,
Aug 17, 2022, 2:13:08 PM8/17/22
to bo...@lists.boost.org, Andrzej Krzemienski
wt., 16 sie 2022 o 20:31 Vinnie Falco via Boost <bo...@lists.boost.org>
napisał(a):

Maybe an alternative would be to keep them in the public API but give them
a separate namespace, like `urls::encoding`. This would indicate more
clearly that you have a relatively separate sub-library there, that needs
to stay for logistic reasons. Boost.Lexical_cast took a similar approach
for try_lexical_convert:
https://www.boost.org/doc/libs/1_80_0/doc/html/boost_lexical_cast/synopsis.html#boost_lexical_cast.synopsis.try_lexical_convert

Regards,
&rzej;

Vinnie Falco via Boost

unread,
Aug 17, 2022, 2:17:37 PM8/17/22
to Andrzej Krzemienski, Vinnie Falco, bo...@lists.boost.org
On Wed, Aug 17, 2022 at 11:12 AM Andrzej Krzemienski <akrz...@gmail.com> wrote:
> Maybe an alternative would be to keep them in the public API but give
> them a separate namespace, like `urls::encoding`. This would indicate
> more clearly that you have a relatively separate sub-library there,
> that needs to stay for logistic reasons.

Yes that is precisely the design choice that we made, that namespace
is `boost::urls::grammar`

Thanks

Richard Hodges via Boost

unread,
Aug 21, 2022, 3:25:21 AM8/21/22
to boost@lists.boost.org List, klemens.m...@gmx.net, Richard Hodges
My review of Boost.URL

Before answering the questions presented by the review manager, I thought
I’d share a few words about what Boost represents to me and why I take an
interest in it.

Many moons ago, Boost was pretty much the only way to use the C++
programming language in any kind of useful, cross-platform way.

Today, the C++ standard has evolved and grown considerably. Many of the
original drivers to use Boost are seemingly “solved” in the language
itself. Notwithstanding that some of the design decisions taken by the
Committee over the years have resulted in an anaemic half-copy of Boost
features being standardised.

Because of this (in my view) failure of the committee decision process in
the face of reality, I still use many of the fundamental Boost utilities in
place of their standard counterparts.

Therefore, to my mind, Boost is still a necessary component of any
non-trivial program I write in C++.

Boost also has a collection of utility libraries, some of which have waned
in relevance over time. For example, Boost.Format is now pretty much
obsolete in the face of fmtlib (not std::format, which is another example
of a half-hearted attempt to standardise a solution).

When I program, I do so almost exclusively in C++. I don’t really want the
bother of having to reach for other languages or to stay current in their
dependency management solutions of the day. What I really want is one
massive library that I can pull in which does everything I need, or at
least the common things.

Common things for me are:

-

Communications, including HTTP, WebSockets, databases and messages buses.
-

Console IO
-

Logging
-

Graphics & Human Interfaces
-

Sound


For me, anything that takes us a step closer to having one library that
shrink-wraps all that into one `find_package` call in CMake is a win.

Now I will admit that once upon a time, I believed that all this stuff
should be in the standard library. However since then, having worked as a
library maintainer myself, I have come to realise that this is an error.
Giving compiler maintainers the burden of maintaining humongous libraries
will simply discourage innovation in compiler technology and result in
incomplete implementations and porting to new computing platforms. Boost’s
cross-platform compatibility is unparalleled.

Therefore I am strongly of the view that the place for C++ library
innovation is Boost, and the place for standardising the language changes
that are demonstrated as necessary by Boost innovation lies with the
Committee and compiler developers.

For me, any new library that increases Boost’s real-world utility, is fully
cross platform and reduces my workload when writing application programs is
very welcome.

For for the review:

> Does this library bring real benefit to C++ developers for real world
use-case?

Yes. URL parsing can be fluked with regex, split and a few hand-rolled
utilities. But why give developers the burden of hand-rolling this
boilerplate nonsense? Make simple things simple!

> Do you have an application for this library?

Absolutely. I am happy to retrofit Boost.URL into existing code and would
certainly use it in all future code.

> Does the API match with current best practices?

I have no idea what “best practices” are in API design. The API allows me
to interrogate, build and update internet URLs. I may have made different
naming choices and if I had written it I would probably have made more use
of free functions, but I’m not going to argue about style. Utility is all
that matters to me.

> Is the documentation helpful and clear?

Much like the Boost.ASIO documentation, it is 100% correct and tersely
describes the exact functionality of the API. Use of the documentation
requires at least a cursory understanding of the terms in RFC3986. Twenty
years ago, it would probably have been reasonable to require users to read
this RFC prior to using the library. Today, attention spans are shorter and
the internet is a commodity. I would prefer to see a little more narrative
and descriptions of real-world uses of some of the methods, along with code
examples.

> Did you try to use it?

Yes, I retrofitted Boost.URL into a program that I used as the basis of a
blog post - a program to demonstrate following redirect responses when
connecting a WebSocket.
I have also integrated Boost.URL into a new program which connects to an
exchange in the past week. Doing so saved probably 200 lines of hand-rolled
code.

> What problems or surprises did you encounter?

The library lacked a specific facility to extract just the HTTP TARGET
field from a URL. After reporting this to the author with a supporting use
case and hand-rolled work-arwound, this has now been added.

> What is your evaluation of the implementation?

It is clear that a great deal of care has gone into making the
implementation robust and efficient.

> How much effort did you put into your evaluation? A glance?

A quick reading? In-depth study?

I retrofitted Boost.URL into an existing program and wrote a new program
that uses the library. I spent some time discussing what I felt were
missing features with the author, who has been responsive. My required
features subsequently appeared.

> Are you knowledgeable about the problem domain?

I’ve been writing internet-facing servers and clients in C++ for a decade.
In all that time I have never read RFC3986. I suspect like 99% of
developers, I have simply inferred the rules of URL encoding through use.
So perhaps a time-served journeyman rather than domain expert.

Recommendation: ACCEPT

The day I can write a CML that has only one dependency, Boost, will be a
happy day.


--
Richard Hodges
hodg...@gmail.com
office: +44 2032 898 513
home: +376 861 195
mobile: +376 380 212

Rainer Deyke via Boost

unread,
Aug 21, 2022, 10:15:27 AM8/21/22
to bo...@lists.boost.org, Rainer Deyke
Here is my quick review of Boost.URL. I vote to ACCEPT this library.
It may not be perfect, but it's a good enough solution for a real problem.

I do have some concerns:

- The lack of IRI support is unfortunate. It's 2022; we should all
be writing software with Unicode support by default. However, this can
be built on top of Boost.URL, and isn't needed in all cases.

- The "string+offsets" implementation means that a lot of characters
need to be copied every time something at the left end of a long URL is
modified - for example when changing an http URL to use https. This is
unlikely to cause a problem in most circumstances, but it's something to
be aware of.

On 13.08.22 04:29, Klemens Morgenstern via Boost wrote:

> Some questions to consider:
>
> - Does this library bring real benefit to C++ developers for real world
> use-case?

Yes.

> - Do you have an application for this library?

Yes.

> - Does the API match with current best practices?

It's good enough.

> - Is the documentation helpful and clear?

Yes, but I did find a few errors that should be fixed. One of the
examples on https://master.url.cpp.al/url/containers/authority.html does
not match the text. I have to admit I skipped over the customization
section.

> - Did you try to use it? What problems or surprises did you encounter?

No.

> - What is your evaluation of the implementation?

Like many Boost libraries, I found the implementation fairly opaque.
Class url delegates to url_base, which delegates to url_view_base, which
delegates to detail::url_impl, with much of the functionality spread out
over various view classes. This isn't a problem per se, but it makes
casual reading difficult.


--
Rainer Deyke (rai...@eldwood.com)

Vinnie Falco via Boost

unread,
Aug 21, 2022, 10:15:33 AM8/21/22
to bo...@lists.boost.org, Vinnie Falco
On Sun, Aug 21, 2022 at 12:25 AM Richard Hodges via Boost
<bo...@lists.boost.org> wrote:
> My review of Boost.URL

Thank you for taking the time to review the library

> I would prefer to see a little more narrative
> and descriptions of real-world uses of some of the methods, along with code
> examples.

Yes we agree, the docs will be improved.

Regards

Vinnie Falco via Boost

unread,
Aug 21, 2022, 11:13:40 AM8/21/22
to bo...@lists.boost.org, Vinnie Falco, Rainer Deyke
On Sun, Aug 21, 2022 at 7:15 AM Rainer Deyke via Boost
<bo...@lists.boost.org> wrote:
> - The lack of IRI support is unfortunate. It's 2022; we should all
> be writing software with Unicode support by default. However, this can
> be built on top of Boost.URL, and isn't needed in all cases.

We will probably add something to parse an IRI but in all likelihood
it would convert it to UTF-8 as a regular URL. I don't know if I have
the stomach for a total duplication of the existing library except
names like iri_view, iri, static_iri, the duplication of all the
segments and params containers, and the modification of all those
mutating algorithms to support Unicode. I'm not even sure that it is
called for, given that IRIs are for more user-facing purposes. Such
things cannot be submitted in HTTP requests, and as far as I know,
unicode host names would need to be converted to punycode anyway
(which we could do) to submit them to a DNS server.

> - The "string+offsets" implementation means that a lot of characters
> need to be copied every time something at the left end of a long URL is
> modified.

Yep! Thanks for noticing that :) Was it the right implementation
choice? I sure hope so, but we won't know for sure without field
experience. And I hope we get lots of it, as I am already writing new
libraries whose containers use this same model.

> Yes, but I did find a few errors that should be fixed. One of the
> examples on https://master.url.cpp.al/url/containers/authority.html does
> not match the text. I have to admit I skipped over the customization
> section.

Happy to have any issues reported in the repository. The Customization
section is only aimed for the most advanced users, so no loss there.

Thank you for taking the time to review the library.

Regards

Rainer Deyke via Boost

unread,
Aug 21, 2022, 1:55:45 PM8/21/22
to bo...@lists.boost.org, Rainer Deyke
On 21.08.22 17:13, Vinnie Falco via Boost wrote:
> On Sun, Aug 21, 2022 at 7:15 AM Rainer Deyke via Boost
> <bo...@lists.boost.org> wrote:
>> - The lack of IRI support is unfortunate. It's 2022; we should all
>> be writing software with Unicode support by default. However, this can
>> be built on top of Boost.URL, and isn't needed in all cases.
>
> We will probably add something to parse an IRI but in all likelihood
> it would convert it to UTF-8 as a regular URL. I don't know if I have
> the stomach for a total duplication of the existing library except
> names like iri_view, iri, static_iri, the duplication of all the
> segments and params containers, and the modification of all those
> mutating algorithms to support Unicode. I'm not even sure that it is
> called for, given that IRIs are for more user-facing purposes. Such
> things cannot be submitted in HTTP requests, and as far as I know,
> unicode host names would need to be converted to punycode anyway
> (which we could do) to submit them to a DNS server.

I see IRIs not as a different datatype, but as a specific interpretation
of URIs. The transparent percent en-/decoding of Boost.URL already gets
us most of the way there. Additional IRI support would mean:
- Decoding accessors that perform utf-8 validation. (Arbitrary
percent-encoded 8-bit values are legal in URLs, but not in IRIs.)
- Additional mutators that perform NFC normalization or validation.
- Punycode encoding/decoding.


--
Rainer Deyke (rai...@eldwood.com)

Peter Dimov via Boost

unread,
Aug 21, 2022, 2:36:54 PM8/21/22
to bo...@lists.boost.org, Peter Dimov
Rainer Deyke wrote:
> (Arbitrary percent-encoded 8-bit values are legal in URLs, but not in IRIs.)

Aren't they? I didn't find anything prohibiting them in the RFC, although I
might well have missed it.

The sections that specify the recommended way to convert between URI
and IRI do say how these are handled - a percent-encoded sequence in a
URI that doesn't correspond to a valid UTF-8 encoded code point is left
alone in the IRI. (Valid UTF-8 percent encodings are percent-decoded.)

Rainer Deyke via Boost

unread,
Aug 21, 2022, 4:01:51 PM8/21/22
to bo...@lists.boost.org, Rainer Deyke
On 21.08.22 20:36, Peter Dimov via Boost wrote:
> Rainer Deyke wrote:
>> (Arbitrary percent-encoded 8-bit values are legal in URLs, but not in IRIs.)
>
> Aren't they? I didn't find anything prohibiting them in the RFC, although I
> might well have missed it.
>
> The sections that specify the recommended way to convert between URI
> and IRI do say how these are handled - a percent-encoded sequence in a
> URI that doesn't correspond to a valid UTF-8 encoded code point is left
> alone in the IRI. (Valid UTF-8 percent encodings are percent-decoded.)

OK, it is possible to partially decode a URL containing a mix of utf-8
and arbitrary 8-bit values to get something that looks like a URL, but
with Unicode. And this half-decoded URL is an IRI, so on a technical
level you are correct.

On the other hand, if calling the 'path' member function of a URL object
returns a fully percent-decoded path, then calling the utf-8 equivalent
of that member function should return something that is both legal utf-8
and fully percent-decoded. Which is only possible if the path contains
no percent-encoded values that are not utf-8.


--
Rainer Deyke (rai...@eldwood.com)

Vinnie Falco via Boost

unread,
Aug 21, 2022, 4:18:35 PM8/21/22
to bo...@lists.boost.org, Vinnie Falco, Rainer Deyke
On Sun, Aug 21, 2022 at 1:01 PM Rainer Deyke via Boost
<bo...@lists.boost.org> wrote:
> ...IRI...

What are the use-cases for IRI?

Thanks

Alan de Freitas via Boost

unread,
Aug 21, 2022, 6:55:34 PM8/21/22
to bo...@lists.boost.org, Alan de Freitas
Thank you for your review.


> I would prefer to see a little more narrative
> and descriptions of real-world uses of some of the methods, along with code
> examples.
>

Yes. We have a number of issues related to improving the documentation.
People have been proposing a number of interesting examples.

Use of the documentation
> requires at least a cursory understanding of the terms in RFC3986


Sure. We have also been improving that by interleaving basic concepts
from RFC3986 with the documentation.
Over time, I think we will infer the most common questions, and the
documentation will become easier for newcomers.

Thanks again

Rainer Deyke via Boost

unread,
Aug 22, 2022, 1:49:40 AM8/22/22
to bo...@lists.boost.org, Rainer Deyke
On 21.08.22 22:18, Vinnie Falco via Boost wrote:
> On Sun, Aug 21, 2022 at 1:01 PM Rainer Deyke via Boost
> <bo...@lists.boost.org> wrote:
>> ...IRI...
>
> What are the use-cases for IRI?

I think that anytime a URL is shown to or input by the user, whole or in
part, it should be in IRI form by default.


--
Rainer Deyke (rai...@eldwood.com)

Дмитрий Архипов via Boost

unread,
Aug 22, 2022, 9:15:23 AM8/22/22
to bo...@lists.boost.org, Дмитрий Архипов
This is my review of Boost.Url.

Disclaimer I am employed by The C++ Alliance, that is by Vinnie. Nevertheless,
I haven't followed development of this library closely, so I did not have much
prior knowledge about design choices it makes.

My vote is to ACCEPT the library, since it does solve the use cases for URLs
that I currently can come up with, it is quite easy to use (and not too easy
to use incorrectly) and the flaws that it has doesn't seem to be fundamental.

> What is your evaluation of the design?

Overall, I found the API fairly easy to use. I understand the concern about
parsing into a view type, but I consider it a lesser evil than allocating on
every parse or creating two kinds of functions, one that allocates and one that
doesn't.

I did also find the naming situation for encoded/decoded a bit confusing. In
particular, url::path returns pct_encoded_view, but url::encoded_path returns
string_view. I would expect the opposite. Maybe this needs additional
explanation in the docs?

Finally, I find the use of system::result as the sole method of reporting
errors intriguing. I have not yet encountered a library that uses this
approach, and I have been thinking about employing it myself in the past. If
this library is accepted, it will become a great experiment for this sort of
API and will provide us with the experience on whether it's a good replacement
for dual APIs using error_code.

> What is your evaluation of the implementation?

I haven't looked into that.

> What is your evaluation of the documentation?

Documentation seems sparse but ultimately I have been able to successfully
find all the information that I needed. Still, a lot more examples and detailed
explanations are necessary.

> What is your evaluation of the potential usefulness of the library?

The obvious application is networking, but given that URIs were designed as a
universal way to identify entities, there are other potential use cases.

> Did you try to use the library? With what compiler? Did you have any problems?

I have implemented a very simple program that constructed URLs (including ones
with custom schemes), compared them against each other, and deconstructed them
for "routing". b2 was used as the build system and GCC 10.3 as the
compiler. I did not encounter any problems.

> How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?

3 hours of coding and 2 hours of reading documentation.

> Are you knowledgeable about the problem domain?

I'm not an expert, but I have been dealing with URLs in Python and in C++ using
Qt's QUrl.

сб, 13 авг. 2022 г. в 05:30, Klemens Morgenstern via Boost
<bo...@lists.boost.org>:

Alan de Freitas via Boost

unread,
Aug 22, 2022, 1:36:23 PM8/22/22
to boost@lists.boost.org List, Alan de Freitas
Hi, everyone.

This is a summary of issues we have opened during the review so anyone
interested can track them.


* Public grammar namespace:

https://github.com/CPPAlliance/url/issues/444

I find the creation of a mini-parsing library and the expectation that
> users will use this mini-library to write their own parsers to be
> somewhat problematic.


Regarding the parser interface, I feel this should mature into its own
> library.


I'm not very keen on exposing the parsing infrastructure and helper
> functions

I can see other public types in the reference such as recycled,
> recycled_ptr and aligned_storage, which again have the feeling
> should not be exposed as public types.


* parse_url vs. parse_uri:

https://github.com/CPPAlliance/url/issues/443

When I call parse_uri(), I get a result of type result<url_view>? Why
> isn't the function spelled parse_url()


In contrast to others, I think the choice of Nomenclature, namely URL
> everywhere except in the grammar domain (the parse function names) is a
> very
> elegant one

* Documentation issues:

https://github.com/CPPAlliance/url/issues/442

(About 20 minor issues with the docs)


* Exploring IRIs:

https://github.com/CPPAlliance/url/issues/441

The lack of IRI support is unfortunate. It's 2022; we should all
> be writing software with Unicode support by default. However, this can
> be built on top of Boost.URL, and isn't needed in all cases.

* Add lowercase/uppercase in pct_encode_opts:

https://github.com/CPPAlliance/url/issues/440

In the past I have had situations where it was necessary to specify the
> case for hex letters

* Optional query values:

https://github.com/CPPAlliance/url/issues/439

I would be inclined to use a std::optional for the value instead.

* Name of the decoding view is confusing:

https://github.com/CPPAlliance/url/issues/438

I would expect a "decoded_view" to refer to underlying encoded data


pct_encoded_view is an unfortunate choice for a view that actively decodes
> URL-encoded characters


I did also find the naming situation for encoded/decoded a bit confusing. I
> would expect the opposite.

* Using std::string in url:

https://github.com/CPPAlliance/url/issues/437

In your non-view url object, you seem to allocate storage using new char[].
> Why not use a std::string?


* url::persist docs:

https://github.com/CPPAlliance/url/issues/436

so url_view doesn't own the underlying storage, except when it
> does as a result of this mysterious method. What's the rationale for
> this?


* url lifetimes:

https://github.com/CPPAlliance/url/issues/435

There's no indication here that r is a non-owning view and that
> the caller is responsible for keeping s alive for the lifetime of r.


* Examples:

https://github.com/CPPAlliance/url/issues/419

I miss an example (and possibly a page) about composing and mutating URLs

I'd add a page and an example on percent encoding functions

small examples may be even more useful to newcomers.


I found resolve a bit mystifying. (...) it will be a big help to have
> inspirational examples


Perhaps the examples would be better when fleshed out in long form


* pct-encoding overloads:

https://github.com/CPPAlliance/url/issues/417

I'd consider exposing higher-level functions here, maybe taking a lvalue
> reference to a std::string


* container names:

https://github.com/CPPAlliance/url/issues/416

I've found the naming for view containers confusing (e.g. params and
> params_view).

I would suggest leaving url and url_view as-is, and replacing the "view"
> part for "const" for the rest of the containers

Em sex., 12 de ago. de 2022 às 23:30, Klemens Morgenstern via Boost <
bo...@lists.boost.org> escreveu:


--
Alan Freitas
https://github.com/alandefreitas

Christopher Bläsius via Boost

unread,
Aug 22, 2022, 2:56:15 PM8/22/22
to bo...@lists.boost.org, Christopher Bläsius
Hi,

I'm not a boost developer, but as someone who has to deal with URLs (for
calling or implementing Web-APIs) at his job frequently, I want to give
my two cents as a most likely user of this library. I hope this is OK (I
won't give a recommendation to accept or reject though).

I spent about 2.5h integrating the library into our program and tried
some simple parsing and constructing of urls. I used MSVC2022 which
worked without problems.

> - Does this library bring real benefit to C++ developers for real > world use-case? Yes definitely. I believe probably every programmer
has written a (mostly crappy) URL-parser before, and I would be more
than happy to replace our parser with this library.

> - Do you have an application for this library?

Yes. Besides parsing incoming HTTP-Requests, I would use it to
programmatically construct urls to call external Web-APIs.

One of the more common scenarios in our applications is the construction
of urls after the user provided some login credentials and often
hosts/port combinations. Also a prefix path is often needed (for example
to differentiate between a demo or beta service and the real production
api). I feel like the current  API design lacks a bit in this regard.

To construct a new url i did the following:

    auto url = boost::urls::url{};
    url.set_scheme("https")
     .set_host("special-api.com")
     .set_port("443")
     .set_path("/prefix/api/v1/");
    url.segments().push_back("applications");
    url.segments().push_back(applicationId);
    url.segments().push_back("devices");
    url.params().append("since", "2022-01-01");

The url::set_... methods seem to follow a fluent api design which is
nice, but unfortunately they return url_base&. So I can't construct a
url inside a function call like:

httpClientGet(url{}.set_host(host).set_path("some/path"))

Being able to add segments is great. Personally I'm a big fan of
overloading the /-operator to append paths like in the filesystem
library, because I think this results in more readable code. The example
of above could become:

httpClientGet(url{host} / "some/path")


I really like that the library offers different parse_ functions, so an
user can choose the appropriate one depending on the context.

But it is a shame that there are no semantic types to go along with
them. As mentioned above we often have the case that we have to
construct a url by combining a prefix path with the actual path of the
resource. As this prefix path has to be absolute and the resource path
relative I want to use 2 distinct types for them. It seems to me, the
resolve() function has a similar problem. The documentation states that
the base url has to satisfy the absolute-URI grammar. Wouldn't it be
great to have a distinct type for this?

BTW why is there only an authority_view and not an owning container?
Seems inconsistent to me.

Its nice that there are functions to manually do percentage encoding,
but I would prefer (additional) simpler functions which just return a
std::string.

It would be nice if the library also provided the default port numbers
for the schemes. I know the documentation states that this is not part
of the URL-protocol, but I gonna need that information if I use the library.

> - Is the documentation helpful and clear?

Mostly yes. Reading the "Quick Look" page is enough to get started with
using the library which I appreciate. I also like the help card, which
would be greatly improved by adding links to the reference pages.

I skipped / skimmed the Customization and Concept pages.

I would recommend not to start your example page with a 700 lines
implementation of magnet links, that's super intimidating.

Something I missed was a simple example showing that the parse functions
also accept unknown schemes like opc.tcp:// for example.


Thanks a lot for all your work on providing useful high quality
libraries for everyone!

Best Regards,
Chris

Vinnie Falco via Boost

unread,
Aug 22, 2022, 4:15:11 PM8/22/22
to bo...@lists.boost.org, Vinnie Falco
On Mon, Aug 22, 2022 at 11:56 AM Christopher Bläsius via Boost
<bo...@lists.boost.org> wrote:
> I hope this is OK (I won't give a recommendation to accept or reject though).

Yes it is great! Feedback from the trenches is highly valued.

> One of the more common scenarios in our applications is the construction

> of urls...I feel like the current API design lacks a bit in this regard.

Now that I am looking at your example code, yes I agree. I don't like
some things that I'm seeing. We would be very happy to work with you
and fine tune the current API to work better for your use case. To
start, I would instead prefer to see your example written thusly:

> auto url = boost::urls::url{};
> url.set_scheme("https")
> .set_host("special-api.com")
> .set_port("443")
> .set_path("/prefix/api/v1/");

> url.segments().append({ "applications", applicationId, "devices" });


> url.params().append("since", "2022-01-01");

> The url::set_... methods seem to follow a fluent api design which is
> nice, but unfortunately they return url_base&.

Hmm yeah I see what you mean. I didn't think of this - that's why
field experience is so vital. The reason they return url_base& is
because that's the class that has all the mutation logic. This is
shared between `url` and `static_url`. To get functions like
`set_host` to return `url&` instead would be challenging. We would
have to put a copy of the entire set of mutating API member function
signatures into both `url` and `static_url_base` and have those return
the correct type.

> So I can't construct a
> url inside a function call like:
>
> httpClientGet(url{}.set_host(host).set_path("some/path"))

What if there was a helper function or a helper constructor? For
example that takes the five parts and assembles them into one URL:

// here, scheme, query, and fragment are `none`
httpClientGet( url( none, host, "some/path", none, none );

> Being able to add segments is great. Personally I'm a big fan of
> overloading the /-operator to append paths like in the filesystem
> library, because I think this results in more readable code. The example
> of above could become:
>
> httpClientGet(url{host} / "some/path")

This is going to have the same problem, operator/ will return
url_base&. I agree the operators are nice. We were thinking to first
get some more field experience and fine tune the current APIs, and
then once things are stable - pour some operators into the mix.

> it is a shame that there are no semantic types to go along with

> them. ...The documentation states that the base url has to satisfy


> the absolute-URI grammar. Wouldn't it be great to have a distinct type
> for this?

That would be a challenge to design and implement. Consider the following:

absolute_uri u( "http://example.com" );

what happens if I then write:

u.remove_scheme();

Does this return a new type? If the library offers uri and
relative_ref as separate semantic types, what would
parse_uri_reference return, given that its grammar is:

URI-reference = URI / relative-ref

Would the return type be `result< variant< uri_view, relative_ref_view
> >`? I explored the use of alternative types when I first started
writing the library but it became obviously fairly quickly that it
places significant constraints on the resulting APIs, and gets quite
unergonomic. Not to mention, that there would be an enormous
duplication of function signatures and the documentation that goes
with it.

While the current design loses the compile-time distinctions between
the various flavors of compliant URLs, it does gain convenience as a
vocabulary type. Each container is capable of holding any valid
string, and users never have to worry about the type changing upon
conversion.

> BTW why is there only an authority_view and not an owning container?
> Seems inconsistent to me.

The authority_view is a concession to users of HTTP, in order to parse
request-target. The authority_view grammar is called "authority-form"
in rfc7230:

request-target = origin-form
/ absolute-form
/ authority-form
/ asterisk-form

Implementing authority_view was quite simple. We made a copy of
url_view and just trimmed away the stuff that wasn't relevant. But
implementing an authority container with mutation operations, would
not be as simple. We could have done it, and nothing stops us from
adding it in the future, but I do not have sufficient evidence that
such a container would be as popular or widely used as the main url
containers. If enough users clamour for this feature in the future
then we might add it.

Generating a valid URL can be quite an undertaking. But generating a
valid authority is a lot more simple. Especially since we provide the
ipv4 and ipv6 address classes which will correctly format those
address types for you. So while the url containers offer significant
value in assisting users in producing valid URLs, we do not feel that
an "authority" container would offer as much additional value beyond
what the user can already accomplish by composing the authority string
themselves. Furthermore there are far fewer circumstances when a user
has to build an authority, compared to when they have to build a URL.

> Its nice that there are functions to manually do percentage encoding,
> but I would prefer (additional) simpler functions which just return a
> std::string.

Yeah, that's fair. We have on our todo to review and refactor the
percent-encoding suite of functions.

> It would be nice if the library also provided the default port numbers
> for the schemes. I know the documentation states that this is not part
> of the URL-protocol, but I gonna need that information if I use the library.

That's already been added to the develop branch!

> I would recommend not to start your example page with a 700 lines
> implementation of magnet links, that's super intimidating.

LOL! point taken...

> Something I missed was a simple example showing that the parse functions
> also accept unknown schemes like opc.tcp:// for example.

Yes that's a good idea

> Thanks a lot for all your work on providing useful high quality
> libraries for everyone!

Thank you for taking the time to review the library!

Regards

Gavin Lambert via Boost

unread,
Aug 22, 2022, 7:58:32 PM8/22/22
to bo...@lists.boost.org, Gavin Lambert
On 23/08/2022 08:14, Vinnie Falco wrote:
> Hmm yeah I see what you mean. I didn't think of this - that's why
> field experience is so vital. The reason they return url_base& is
> because that's the class that has all the mutation logic. This is
> shared between `url` and `static_url`. To get functions like
> `set_host` to return `url&` instead would be challenging. We would
> have to put a copy of the entire set of mutating API member function
> signatures into both `url` and `static_url_base` and have those return
> the correct type.

Another option (not necessarily a better one) is to make url_base (or
whatever provides those methods, if moved) use CRTP.

This could potentially even be just a thin wrapper around private
implementation that still uses a concrete base internally, at the cost
of a slightly dodgy downcast in the wrapper.
Reply all
Reply to author
Forward
0 new messages