Maybe an alternative would be to keep them in the public API but give them
a separate namespace, like `urls::encoding`. This would indicate more
clearly that you have a relatively separate sub-library there, that needs
to stay for logistic reasons. Boost.Lexical_cast took a similar approach
for try_lexical_convert:
https://www.boost.org/doc/libs/1_80_0/doc/html/boost_lexical_cast/synopsis.html#boost_lexical_cast.synopsis.try_lexical_convert
Regards,
&rzej;
Before answering the questions presented by the review manager, I thought
I’d share a few words about what Boost represents to me and why I take an
interest in it.
Many moons ago, Boost was pretty much the only way to use the C++
programming language in any kind of useful, cross-platform way.
Today, the C++ standard has evolved and grown considerably. Many of the
original drivers to use Boost are seemingly “solved” in the language
itself. Notwithstanding that some of the design decisions taken by the
Committee over the years have resulted in an anaemic half-copy of Boost
features being standardised.
Because of this (in my view) failure of the committee decision process in
the face of reality, I still use many of the fundamental Boost utilities in
place of their standard counterparts.
Therefore, to my mind, Boost is still a necessary component of any
non-trivial program I write in C++.
Boost also has a collection of utility libraries, some of which have waned
in relevance over time. For example, Boost.Format is now pretty much
obsolete in the face of fmtlib (not std::format, which is another example
of a half-hearted attempt to standardise a solution).
When I program, I do so almost exclusively in C++. I don’t really want the
bother of having to reach for other languages or to stay current in their
dependency management solutions of the day. What I really want is one
massive library that I can pull in which does everything I need, or at
least the common things.
Common things for me are:
-
Communications, including HTTP, WebSockets, databases and messages buses.
-
Console IO
-
Logging
-
Graphics & Human Interfaces
-
Sound
For me, anything that takes us a step closer to having one library that
shrink-wraps all that into one `find_package` call in CMake is a win.
Now I will admit that once upon a time, I believed that all this stuff
should be in the standard library. However since then, having worked as a
library maintainer myself, I have come to realise that this is an error.
Giving compiler maintainers the burden of maintaining humongous libraries
will simply discourage innovation in compiler technology and result in
incomplete implementations and porting to new computing platforms. Boost’s
cross-platform compatibility is unparalleled.
Therefore I am strongly of the view that the place for C++ library
innovation is Boost, and the place for standardising the language changes
that are demonstrated as necessary by Boost innovation lies with the
Committee and compiler developers.
For me, any new library that increases Boost’s real-world utility, is fully
cross platform and reduces my workload when writing application programs is
very welcome.
For for the review:
> Does this library bring real benefit to C++ developers for real world
use-case?
Yes. URL parsing can be fluked with regex, split and a few hand-rolled
utilities. But why give developers the burden of hand-rolling this
boilerplate nonsense? Make simple things simple!
> Do you have an application for this library?
Absolutely. I am happy to retrofit Boost.URL into existing code and would
certainly use it in all future code.
> Does the API match with current best practices?
I have no idea what “best practices” are in API design. The API allows me
to interrogate, build and update internet URLs. I may have made different
naming choices and if I had written it I would probably have made more use
of free functions, but I’m not going to argue about style. Utility is all
that matters to me.
> Is the documentation helpful and clear?
Much like the Boost.ASIO documentation, it is 100% correct and tersely
describes the exact functionality of the API. Use of the documentation
requires at least a cursory understanding of the terms in RFC3986. Twenty
years ago, it would probably have been reasonable to require users to read
this RFC prior to using the library. Today, attention spans are shorter and
the internet is a commodity. I would prefer to see a little more narrative
and descriptions of real-world uses of some of the methods, along with code
examples.
> Did you try to use it?
Yes, I retrofitted Boost.URL into a program that I used as the basis of a
blog post - a program to demonstrate following redirect responses when
connecting a WebSocket.
I have also integrated Boost.URL into a new program which connects to an
exchange in the past week. Doing so saved probably 200 lines of hand-rolled
code.
> What problems or surprises did you encounter?
The library lacked a specific facility to extract just the HTTP TARGET
field from a URL. After reporting this to the author with a supporting use
case and hand-rolled work-arwound, this has now been added.
> What is your evaluation of the implementation?
It is clear that a great deal of care has gone into making the
implementation robust and efficient.
> How much effort did you put into your evaluation? A glance?
A quick reading? In-depth study?
I retrofitted Boost.URL into an existing program and wrote a new program
that uses the library. I spent some time discussing what I felt were
missing features with the author, who has been responsive. My required
features subsequently appeared.
> Are you knowledgeable about the problem domain?
I’ve been writing internet-facing servers and clients in C++ for a decade.
In all that time I have never read RFC3986. I suspect like 99% of
developers, I have simply inferred the rules of URL encoding through use.
So perhaps a time-served journeyman rather than domain expert.
Recommendation: ACCEPT
The day I can write a CML that has only one dependency, Boost, will be a
happy day.
--
Richard Hodges
hodg...@gmail.com
office: +44 2032 898 513
home: +376 861 195
mobile: +376 380 212
Disclaimer I am employed by The C++ Alliance, that is by Vinnie. Nevertheless,
I haven't followed development of this library closely, so I did not have much
prior knowledge about design choices it makes.
My vote is to ACCEPT the library, since it does solve the use cases for URLs
that I currently can come up with, it is quite easy to use (and not too easy
to use incorrectly) and the flaws that it has doesn't seem to be fundamental.
> What is your evaluation of the design?
Overall, I found the API fairly easy to use. I understand the concern about
parsing into a view type, but I consider it a lesser evil than allocating on
every parse or creating two kinds of functions, one that allocates and one that
doesn't.
I did also find the naming situation for encoded/decoded a bit confusing. In
particular, url::path returns pct_encoded_view, but url::encoded_path returns
string_view. I would expect the opposite. Maybe this needs additional
explanation in the docs?
Finally, I find the use of system::result as the sole method of reporting
errors intriguing. I have not yet encountered a library that uses this
approach, and I have been thinking about employing it myself in the past. If
this library is accepted, it will become a great experiment for this sort of
API and will provide us with the experience on whether it's a good replacement
for dual APIs using error_code.
> What is your evaluation of the implementation?
I haven't looked into that.
> What is your evaluation of the documentation?
Documentation seems sparse but ultimately I have been able to successfully
find all the information that I needed. Still, a lot more examples and detailed
explanations are necessary.
> What is your evaluation of the potential usefulness of the library?
The obvious application is networking, but given that URIs were designed as a
universal way to identify entities, there are other potential use cases.
> Did you try to use the library? With what compiler? Did you have any problems?
I have implemented a very simple program that constructed URLs (including ones
with custom schemes), compared them against each other, and deconstructed them
for "routing". b2 was used as the build system and GCC 10.3 as the
compiler. I did not encounter any problems.
> How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
3 hours of coding and 2 hours of reading documentation.
> Are you knowledgeable about the problem domain?
I'm not an expert, but I have been dealing with URLs in Python and in C++ using
Qt's QUrl.
сб, 13 авг. 2022 г. в 05:30, Klemens Morgenstern via Boost
<bo...@lists.boost.org>:
This is a summary of issues we have opened during the review so anyone
interested can track them.
* Public grammar namespace:
https://github.com/CPPAlliance/url/issues/444
I find the creation of a mini-parsing library and the expectation that
> users will use this mini-library to write their own parsers to be
> somewhat problematic.
Regarding the parser interface, I feel this should mature into its own
> library.
I'm not very keen on exposing the parsing infrastructure and helper
> functions
I can see other public types in the reference such as recycled,
> recycled_ptr and aligned_storage, which again have the feeling
> should not be exposed as public types.
* parse_url vs. parse_uri:
https://github.com/CPPAlliance/url/issues/443
When I call parse_uri(), I get a result of type result<url_view>? Why
> isn't the function spelled parse_url()
In contrast to others, I think the choice of Nomenclature, namely URL
> everywhere except in the grammar domain (the parse function names) is a
> very
> elegant one
* Documentation issues:
https://github.com/CPPAlliance/url/issues/442
(About 20 minor issues with the docs)
* Exploring IRIs:
https://github.com/CPPAlliance/url/issues/441
The lack of IRI support is unfortunate. It's 2022; we should all
> be writing software with Unicode support by default. However, this can
> be built on top of Boost.URL, and isn't needed in all cases.
* Add lowercase/uppercase in pct_encode_opts:
https://github.com/CPPAlliance/url/issues/440
In the past I have had situations where it was necessary to specify the
> case for hex letters
* Optional query values:
https://github.com/CPPAlliance/url/issues/439
I would be inclined to use a std::optional for the value instead.
* Name of the decoding view is confusing:
https://github.com/CPPAlliance/url/issues/438
I would expect a "decoded_view" to refer to underlying encoded data
pct_encoded_view is an unfortunate choice for a view that actively decodes
> URL-encoded characters
I did also find the naming situation for encoded/decoded a bit confusing. I
> would expect the opposite.
* Using std::string in url:
https://github.com/CPPAlliance/url/issues/437
In your non-view url object, you seem to allocate storage using new char[].
> Why not use a std::string?
* url::persist docs:
https://github.com/CPPAlliance/url/issues/436
so url_view doesn't own the underlying storage, except when it
> does as a result of this mysterious method. What's the rationale for
> this?
* url lifetimes:
https://github.com/CPPAlliance/url/issues/435
There's no indication here that r is a non-owning view and that
> the caller is responsible for keeping s alive for the lifetime of r.
* Examples:
https://github.com/CPPAlliance/url/issues/419
I miss an example (and possibly a page) about composing and mutating URLs
I'd add a page and an example on percent encoding functions
small examples may be even more useful to newcomers.
I found resolve a bit mystifying. (...) it will be a big help to have
> inspirational examples
Perhaps the examples would be better when fleshed out in long form
* pct-encoding overloads:
https://github.com/CPPAlliance/url/issues/417
I'd consider exposing higher-level functions here, maybe taking a lvalue
> reference to a std::string
* container names:
https://github.com/CPPAlliance/url/issues/416
I've found the naming for view containers confusing (e.g. params and
> params_view).
I would suggest leaving url and url_view as-is, and replacing the "view"
> part for "const" for the rest of the containers
Em sex., 12 de ago. de 2022 às 23:30, Klemens Morgenstern via Boost <
bo...@lists.boost.org> escreveu:
--
Alan Freitas
https://github.com/alandefreitas
I'm not a boost developer, but as someone who has to deal with URLs (for
calling or implementing Web-APIs) at his job frequently, I want to give
my two cents as a most likely user of this library. I hope this is OK (I
won't give a recommendation to accept or reject though).
I spent about 2.5h integrating the library into our program and tried
some simple parsing and constructing of urls. I used MSVC2022 which
worked without problems.
> - Does this library bring real benefit to C++ developers for real > world use-case? Yes definitely. I believe probably every programmer
has written a (mostly crappy) URL-parser before, and I would be more
than happy to replace our parser with this library.
> - Do you have an application for this library?
Yes. Besides parsing incoming HTTP-Requests, I would use it to
programmatically construct urls to call external Web-APIs.
One of the more common scenarios in our applications is the construction
of urls after the user provided some login credentials and often
hosts/port combinations. Also a prefix path is often needed (for example
to differentiate between a demo or beta service and the real production
api). I feel like the current API design lacks a bit in this regard.
To construct a new url i did the following:
auto url = boost::urls::url{};
url.set_scheme("https")
.set_host("special-api.com")
.set_port("443")
.set_path("/prefix/api/v1/");
url.segments().push_back("applications");
url.segments().push_back(applicationId);
url.segments().push_back("devices");
url.params().append("since", "2022-01-01");
The url::set_... methods seem to follow a fluent api design which is
nice, but unfortunately they return url_base&. So I can't construct a
url inside a function call like:
httpClientGet(url{}.set_host(host).set_path("some/path"))
Being able to add segments is great. Personally I'm a big fan of
overloading the /-operator to append paths like in the filesystem
library, because I think this results in more readable code. The example
of above could become:
httpClientGet(url{host} / "some/path")
I really like that the library offers different parse_ functions, so an
user can choose the appropriate one depending on the context.
But it is a shame that there are no semantic types to go along with
them. As mentioned above we often have the case that we have to
construct a url by combining a prefix path with the actual path of the
resource. As this prefix path has to be absolute and the resource path
relative I want to use 2 distinct types for them. It seems to me, the
resolve() function has a similar problem. The documentation states that
the base url has to satisfy the absolute-URI grammar. Wouldn't it be
great to have a distinct type for this?
BTW why is there only an authority_view and not an owning container?
Seems inconsistent to me.
Its nice that there are functions to manually do percentage encoding,
but I would prefer (additional) simpler functions which just return a
std::string.
It would be nice if the library also provided the default port numbers
for the schemes. I know the documentation states that this is not part
of the URL-protocol, but I gonna need that information if I use the library.
> - Is the documentation helpful and clear?
Mostly yes. Reading the "Quick Look" page is enough to get started with
using the library which I appreciate. I also like the help card, which
would be greatly improved by adding links to the reference pages.
I skipped / skimmed the Customization and Concept pages.
I would recommend not to start your example page with a 700 lines
implementation of magnet links, that's super intimidating.
Something I missed was a simple example showing that the parse functions
also accept unknown schemes like opc.tcp:// for example.
Thanks a lot for all your work on providing useful high quality
libraries for everyone!
Best Regards,
Chris
Yes it is great! Feedback from the trenches is highly valued.
> One of the more common scenarios in our applications is the construction
> of urls...I feel like the current API design lacks a bit in this regard.
Now that I am looking at your example code, yes I agree. I don't like
some things that I'm seeing. We would be very happy to work with you
and fine tune the current API to work better for your use case. To
start, I would instead prefer to see your example written thusly:
> auto url = boost::urls::url{};
> url.set_scheme("https")
> .set_host("special-api.com")
> .set_port("443")
> .set_path("/prefix/api/v1/");
> url.segments().append({ "applications", applicationId, "devices" });
> url.params().append("since", "2022-01-01");
> The url::set_... methods seem to follow a fluent api design which is
> nice, but unfortunately they return url_base&.
Hmm yeah I see what you mean. I didn't think of this - that's why
field experience is so vital. The reason they return url_base& is
because that's the class that has all the mutation logic. This is
shared between `url` and `static_url`. To get functions like
`set_host` to return `url&` instead would be challenging. We would
have to put a copy of the entire set of mutating API member function
signatures into both `url` and `static_url_base` and have those return
the correct type.
> So I can't construct a
> url inside a function call like:
>
> httpClientGet(url{}.set_host(host).set_path("some/path"))
What if there was a helper function or a helper constructor? For
example that takes the five parts and assembles them into one URL:
// here, scheme, query, and fragment are `none`
httpClientGet( url( none, host, "some/path", none, none );
> Being able to add segments is great. Personally I'm a big fan of
> overloading the /-operator to append paths like in the filesystem
> library, because I think this results in more readable code. The example
> of above could become:
>
> httpClientGet(url{host} / "some/path")
This is going to have the same problem, operator/ will return
url_base&. I agree the operators are nice. We were thinking to first
get some more field experience and fine tune the current APIs, and
then once things are stable - pour some operators into the mix.
> it is a shame that there are no semantic types to go along with
> them. ...The documentation states that the base url has to satisfy
> the absolute-URI grammar. Wouldn't it be great to have a distinct type
> for this?
That would be a challenge to design and implement. Consider the following:
absolute_uri u( "http://example.com" );
what happens if I then write:
u.remove_scheme();
Does this return a new type? If the library offers uri and
relative_ref as separate semantic types, what would
parse_uri_reference return, given that its grammar is:
URI-reference = URI / relative-ref
Would the return type be `result< variant< uri_view, relative_ref_view
> >`? I explored the use of alternative types when I first started
writing the library but it became obviously fairly quickly that it
places significant constraints on the resulting APIs, and gets quite
unergonomic. Not to mention, that there would be an enormous
duplication of function signatures and the documentation that goes
with it.
While the current design loses the compile-time distinctions between
the various flavors of compliant URLs, it does gain convenience as a
vocabulary type. Each container is capable of holding any valid
string, and users never have to worry about the type changing upon
conversion.
> BTW why is there only an authority_view and not an owning container?
> Seems inconsistent to me.
The authority_view is a concession to users of HTTP, in order to parse
request-target. The authority_view grammar is called "authority-form"
in rfc7230:
request-target = origin-form
/ absolute-form
/ authority-form
/ asterisk-form
Implementing authority_view was quite simple. We made a copy of
url_view and just trimmed away the stuff that wasn't relevant. But
implementing an authority container with mutation operations, would
not be as simple. We could have done it, and nothing stops us from
adding it in the future, but I do not have sufficient evidence that
such a container would be as popular or widely used as the main url
containers. If enough users clamour for this feature in the future
then we might add it.
Generating a valid URL can be quite an undertaking. But generating a
valid authority is a lot more simple. Especially since we provide the
ipv4 and ipv6 address classes which will correctly format those
address types for you. So while the url containers offer significant
value in assisting users in producing valid URLs, we do not feel that
an "authority" container would offer as much additional value beyond
what the user can already accomplish by composing the authority string
themselves. Furthermore there are far fewer circumstances when a user
has to build an authority, compared to when they have to build a URL.
> Its nice that there are functions to manually do percentage encoding,
> but I would prefer (additional) simpler functions which just return a
> std::string.
Yeah, that's fair. We have on our todo to review and refactor the
percent-encoding suite of functions.
> It would be nice if the library also provided the default port numbers
> for the schemes. I know the documentation states that this is not part
> of the URL-protocol, but I gonna need that information if I use the library.
That's already been added to the develop branch!
> I would recommend not to start your example page with a 700 lines
> implementation of magnet links, that's super intimidating.
LOL! point taken...
> Something I missed was a simple example showing that the parse functions
> also accept unknown schemes like opc.tcp:// for example.
Yes that's a good idea
> Thanks a lot for all your work on providing useful high quality
> libraries for everyone!
Thank you for taking the time to review the library!
Regards