Variadic append for std::string

733 views
Skip to first unread message

Olaf van der Spek

unread,
Dec 28, 2016, 3:50:40 AM12/28/16
to ISO C++ Standard - Future Proposals
Hi,

One frequently needs to append stuff to strings, but the standard way
(s += "A" + "B" + to_string(42)) isn't optimal due to temporaries.
A variadic append() for std::string seems like the obvious solution.
It could support string_view, integers, maybe floats
but without formatting options..
It could even be extensible by calling append(s, t);

append(s, "A", "B", 42);

Would this be useful for the C++ std lib?

lnal...@gmail.com

unread,
Dec 28, 2016, 5:03:18 AM12/28/16
to ISO C++ Standard - Future Proposals
Hello,

It would be useful, but will however probably cause temporary string in variadic iteration (I'm not a variadic template expert).

but my dream is more to have a library like this one standardized https://github.com/fmtlib/fmt

Laurent

Andrey Semashev

unread,
Dec 28, 2016, 6:12:20 AM12/28/16
to std-pr...@isocpp.org
I think, conceptually, formatting should be decoupled from string. It
would probably make sense to have `append` work with strings (although
template expressions would be even better, I think), but it should not
deal with formatting stuff.

Thiago Macieira

unread,
Dec 28, 2016, 7:20:58 AM12/28/16
to std-pr...@isocpp.org
Em quarta-feira, 28 de dezembro de 2016, às 00:50:40 BRST, Olaf van der Spek
escreveu:
It can also be solved without code change, at least the string parts. The
following code in Qt does exactly one allocation:

#define QT_USE_FAST_OPERATOR_PLUS
#include <qstring.h>

QString s;
s += "A" + QLatin1String("B") + '.' + QLatin1Char(';');

This expression will do a two-pass scan of all the arguments: first, it
calculates their maximum sizes (some of them may shorted when converted from
UTF-8 to UTF-16). Once that is known, it increase s's storage, memcpys the
data or does an in-place conversion into the buffer, then it shrinks s's size
to the actual size (no reallocation).

The only reason why you have to have that #define is that the plus expression
results in a QStringBuilder<...> template instance instead of QString, so an
expression like:

(QLatin1String("%1 ") + types).arg(n)

Is valid without it but will fail to compile with the macro. The solution is
an explicit cast to QString:

QString(QLatin1String("%1 ") + types).arg(n)

The fast operator plus is also enabled for QByteArray.

Without the macro, you can use the otherwise-unused operator%:

s += "A" % QLatin1String("B") % '.' % QLatin1Char(';');

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

Olaf van der Spek

unread,
Dec 28, 2016, 1:09:31 PM12/28/16
to ISO C++ Standard - Future Proposals
Op woensdag 28 december 2016 13:20:58 UTC+1 schreef Thiago Macieira:
Em quarta-feira, 28 de dezembro de 2016, às 00:50:40 BRST, Olaf van der Spek 
escreveu: 
> Hi, 

> One frequently needs to append stuff to strings, but the standard way 
> (s += "A" + "B" + to_string(42)) isn't optimal due to temporaries. 
> A variadic append() for std::string seems like the obvious solution. 
> It could support string_view, integers, maybe floats 
> but without formatting options.. 
> It could even be extensible by calling append(s, t); 

> append(s, "A", "B", 42); 

> Would this be useful for the C++ std lib? 

It can also be solved without code change, at least the string parts. The 
following code in Qt does exactly one allocation: 

The non-string parts are kinda important.
Depending on a macro definition is a no-go and overloading operator+ for all kinds of types is probably not a good idea.

Thiago Macieira

unread,
Dec 28, 2016, 10:35:15 PM12/28/16
to std-pr...@isocpp.org
Em quarta-feira, 28 de dezembro de 2016, às 10:09:30 BRST, Olaf van der Spek
escreveu:
> > > append(s, "A", "B", 42);
> > >
> > > Would this be useful for the C++ std lib?
> >
> > It can also be solved without code change, at least the string parts. The
>
> > following code in Qt does exactly one allocation:
> The non-string parts are kinda important.

I disagree. Anything but other strings and elements of strings (characters)
should be done with the proper string formatting functions. Otherwise, we'll
soon have someone asking for hex formatting, zero padding, etc. We already
have the right tools for that.

Nicol Bolas

unread,
Dec 29, 2016, 12:45:19 AM12/29/16
to ISO C++ Standard - Future Proposals


On Wednesday, December 28, 2016 at 10:35:15 PM UTC-5, Thiago Macieira wrote:
Em quarta-feira, 28 de dezembro de 2016, às 10:09:30 BRST, Olaf van der Spek
escreveu:
> > > append(s, "A", "B", 42);
> > >
> > > Would this be useful for the C++ std lib?
> >
> > It can also be solved without code change, at least the string parts. The
>
> > following code in Qt does exactly one allocation:
> The non-string parts are kinda important.

I disagree. Anything but other strings and elements of strings (characters)
should be done with the proper string formatting functions. Otherwise, we'll
soon have someone asking for hex formatting, zero padding, etc. We already
have the right tools for that.

We kinda have the right tools for that ;) We still don't have a decent C++-ified printf, despite having had variadic templates for 2 language revisions now.

Victor Dyachenko

unread,
Dec 29, 2016, 1:54:32 AM12/29/16
to ISO C++ Standard - Future Proposals
I think operator<<() would be better here:

s << "A" << "B" << to_string(42);

And about support for non-character types. It is useful at least for diagnostics. No need for powerful formatting here, just the ability to convert any fundamental type to text of any form. iostreams are to cumbersom, printf is not generic (one need to specify exact specifier for the type).

Olaf van der Spek

unread,
Dec 29, 2016, 3:44:33 AM12/29/16
to std-pr...@isocpp.org
2016-12-29 4:35 GMT+01:00 Thiago Macieira <thi...@macieira.org>:
> Em quarta-feira, 28 de dezembro de 2016, às 10:09:30 BRST, Olaf van der Spek
> escreveu:
>> > > append(s, "A", "B", 42);
>> > >
>> > > Would this be useful for the C++ std lib?
>> >
>> > It can also be solved without code change, at least the string parts. The
>>
>> > following code in Qt does exactly one allocation:
>> The non-string parts are kinda important.
>
> I disagree. Anything but other strings and elements of strings (characters)
> should be done with the proper string formatting functions. Otherwise, we'll

What proper functions would that be?
Are they as performant as the proposed functions?

> soon have someone asking for hex formatting, zero padding, etc. We already
> have the right tools for that.

Do we?

--
Olaf

Thiago Macieira

unread,
Dec 29, 2016, 7:19:14 AM12/29/16
to std-pr...@isocpp.org
Em quinta-feira, 29 de dezembro de 2016, às 09:44:31 BRST, Olaf van der Spek
escreveu:
> > I disagree. Anything but other strings and elements of strings
> > (characters)
> > should be done with the proper string formatting functions. Otherwise,
> > we'll
> What proper functions would that be?

I think in the standard library, that's std::stringstream. I don't use it, so
I wouldn't know.

QString has them built in: the .arg() overloads.

> Are they as performant as the proposed functions?

They don't have to be because they serve different purposes. We also need
something that can support internationalisation (i18n) and concatenation with
plus operators can't do that.

> > soon have someone asking for hex formatting, zero padding, etc. We already
> > have the right tools for that.
>
> Do we?

Yes.

Thiago Macieira

unread,
Dec 29, 2016, 7:21:02 AM12/29/16
to std-pr...@isocpp.org
Em quarta-feira, 28 de dezembro de 2016, às 22:54:32 BRST, Victor Dyachenko
escreveu:
> I think operator<<() would be better here:
>
> s << "A" << "B" << to_string(42);
>
> And about support for non-character types. It is useful at least for
> diagnostics. No need for powerful formatting here, just the ability to
> convert any fundamental type to text of any form. iostreams are to
> cumbersom, printf is not generic (one need to specify exact specifier for
> the type).

Why can't you use std::stringstream or another std::ostream here? I know
you're saying it's cumbersome, and I agree that the iostreams part of the
standard library is an extreme overkill (using polymorphism for things that
didn't need it). Still, we have the tool.

Why not fix iostreams instead?

Victor Dyachenko

unread,
Dec 29, 2016, 7:35:57 AM12/29/16
to ISO C++ Standard - Future Proposals


On Thursday, December 29, 2016 at 3:21:02 PM UTC+3, Thiago Macieira wrote:
Em quarta-feira, 28 de dezembro de 2016, às 22:54:32 BRST, Victor Dyachenko
escreveu:
> I think operator<<() would be better here:
>
> s << "A" << "B" << to_string(42);
>
> And about support for non-character types. It is useful at least for
> diagnostics. No need for powerful formatting here, just the ability to
> convert any fundamental type to text of any form. iostreams are to
> cumbersom, printf is not generic (one need to specify exact specifier for
> the type).

Why can't you use std::stringstream or another std::ostream here? I know
you're saying it's cumbersome, and I agree that the iostreams part of the
standard library is an extreme overkill (using polymorphism for things that
didn't need it). Still, we have the tool.

"We have unusable tool. Nobody uses it including myself, but we have it!" :-)
 
Why not fix iostreams instead?
Because it is not fixable by design. It tries to be everything, so any implementation will be bloated. Dependency on the locales, which weights more than 1MB per se, states everything (formatting parameters, flags, etc), virtual calls, et al. I don't require anything of that just to build the error message in the small function, like this:

result_t res = call(...);
if(failed(res)) throw std::logical_error(std::string() << "The call() returned " << res);

 

Victor Dyachenko

unread,
Dec 29, 2016, 7:42:50 AM12/29/16
to ISO C++ Standard - Future Proposals
FIX: s/ states everything / states everywhere /

Michał Dominiak

unread,
Dec 29, 2016, 8:02:32 AM12/29/16
to ISO C++ Standard - Future Proposals
Ah yes, because obviously the primary thing that `std::string`'s interface needs is more functionality. </s>

On Thu, Dec 29, 2016 at 1:42 PM Victor Dyachenko <victor.d...@gmail.com> wrote:
FIX: s/ states everything / states everywhere /

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/727df93a-3992-4c48-b729-cdc8815b5166%40isocpp.org.

Andrey Semashev

unread,
Dec 29, 2016, 8:10:03 AM12/29/16
to std-pr...@isocpp.org
On 12/29/16 15:35, Victor Dyachenko wrote:
>
> Why not fix iostreams instead?
>
> Because it is not fixable by design. It tries to be everything, so any
> implementation will be bloated. Dependency on the locales, which weights
> more than 1MB per se, states everything (formatting parameters, flags,
> etc), virtual calls, et al. I don't require anything of that just to
> build the error message in the small function, like this:
>
> |
> result_t res =call(...);
> if(failed(res))throwstd::logical_error(std::string()<<"The call()
> returned "<<res);
> |

std::string is already bloated too much. Adding yet more bloat is hardly
the way to go.

Olaf van der Spek

unread,
Dec 29, 2016, 8:11:58 AM12/29/16
to std-pr...@isocpp.org
2016-12-29 14:09 GMT+01:00 Andrey Semashev <andrey....@gmail.com>:
> std::string is already bloated too much. Adding yet more bloat is hardly the
> way to go.

We're not requesting stuff to be added to std::string. ;)


--
Olaf

Victor Dyachenko

unread,
Dec 29, 2016, 8:16:24 AM12/29/16
to ISO C++ Standard - Future Proposals
On Thursday, December 29, 2016 at 4:10:03 PM UTC+3, Andrey Semashev wrote:
std::string is already bloated too much.
Agree.
 
Adding yet more bloat is hardly
the way to go.
 Specifically these features can be implemented in 50 lines of code using sprintf(), and in a few hundred w/o any dependencies.

Thiago Macieira

unread,
Dec 29, 2016, 8:30:10 AM12/29/16
to std-pr...@isocpp.org
Em quinta-feira, 29 de dezembro de 2016, às 04:35:57 BRST, Victor Dyachenko
escreveu:
> > "We have unusable tool. Nobody uses it including myself, but we have it!"
> > :-)

You misunderstand me. I don't use std::string in the first place. Therefore, I
have no need for std::stringstream.

Though I also don't use iostreams, except in Hello World applications.

> > Why not fix iostreams instead?
>
> Because it is not fixable by design. It tries to be everything, so any
> implementation will be bloated. Dependency on the locales, which weights
> more than 1MB per se, states everything (formatting parameters, flags,
> etc), virtual calls, et al.

So dump it and start something new. I think we'll all benefit from it, since
most of us think its design is bloated. It came from the very depths of C++'s
origins, when the rule was to make *everything* polymorphic and overrideable.
We've learned a lot since then.

Anyway, I don't oppose having a formatting method for strings, outside of
iostreams or a replacement of it. I think we need it, even.

I just don't think we should combine that with concatenation.

Victor Dyachenko

unread,
Dec 29, 2016, 8:47:29 AM12/29/16
to ISO C++ Standard - Future Proposals
On Thursday, December 29, 2016 at 4:30:10 PM UTC+3, Thiago Macieira wrote:
I just don't think we should combine that with concatenation.
Makes sense. "<<" for non-character type, indeed, is the subset of the concatenation feature. As for me, the ability to use "<<" instead of "+=" for char, const char  * and std::string would be a good starting point.

Andrey Semashev

unread,
Dec 29, 2016, 8:56:56 AM12/29/16
to std-pr...@isocpp.org
Adding operator<< overloads for std::string is adding to std::string. As
is creating a specialized append() algorithm that is targeted
specifically at string manipulation.

Olaf van der Spek

unread,
Dec 29, 2016, 9:00:49 AM12/29/16
to std-pr...@isocpp.org
What are you suggesting?
I'm fine with s being a template parameter allowing you to maybe use
it for file or vector<char> too...

A string algo is exactly what (some) people need..



--
Olaf

Andrey Semashev

unread,
Dec 29, 2016, 9:06:14 AM12/29/16
to std-pr...@isocpp.org
On 12/29/16 16:16, Victor Dyachenko wrote:
The exact number of lines is not my only concern, although I doubt that
your esimate is accurate, at least in the "no added dependencies" case.
Correctly formatting FP numbers, in particular, sounds like a
complicated task. Then there is support for user-defined types, some of
which are probably ostreamable - would you not want existing operator<<
to be used? This brings a dependency on iostreams.

Conceptually, you're adding more functionality to std::string, which IMO
should be nothing more than a container of characters. Any formatting
tools should build on top of that container (and maybe even allow
different containers to be used) instead of hijacking it.

Andrey Semashev

unread,
Dec 29, 2016, 9:13:00 AM12/29/16
to std-pr...@isocpp.org
I don't have a proposal, but if that's a generic formatting algorithm
then it should not be coupled with std::string and of course it should
not be named append() (because, well, appending is the least of the
things it does). I think, a C++ version of sprintf() is a frequently
requested feature, and it seems it could potentially fit your case as well.

Olaf van der Spek

unread,
Dec 29, 2016, 9:24:35 AM12/29/16
to std-pr...@isocpp.org
2016-12-29 15:12 GMT+01:00 Andrey Semashev <andrey....@gmail.com>:
> I don't have a proposal, but if that's a generic formatting algorithm then

From my first post: "It could support string_view, integers, maybe floats
**but without formatting options**.."

> it should not be coupled with std::string and of course it should not be
> named append() (because, well, appending is the least of the things it
> does). I think, a C++ version of sprintf() is a frequently requested
> feature, and it seems it could potentially fit your case as well.

Maybe, but the formatting options and string make it vastly more
complex. It'd also have a different interface. Maybe it'd even use my
proposed append function internally.. ;)


--
Olaf

Thiago Macieira

unread,
Dec 29, 2016, 9:36:07 AM12/29/16
to std-pr...@isocpp.org, Victor Dyachenko
Em quinta-feira, 29 de dezembro de 2016, às 05:47:29 BRST, Victor Dyachenko
escreveu:
The left-most operand is always a class type, probably a template
instantiation of std::basic_string.

Bengt Gustafsson

unread,
Dec 29, 2016, 11:14:55 AM12/29/16
to ISO C++ Standard - Future Proposals, victor.d...@gmail.com
Note that deferring number formatting to existing functions reduces the performance gain of this function: Each of the number formatter functions will have to allocate a std::string or similar and return it before the Append() function or similar gets hold of it. Thus we can never get down to the one allocation solution envisioned. Furthermore, even to find out the number of characters to allocate for a formatted number without actually doing the formatting is very complicated, especially if it needs to heed formatting options. Using a simplified formatted length functionality can work but means that either there could be a significant over-allocation if guesses are too high or significant amounts of double allocation if guesses are too low.

Another solution would be to use a thread-local buffer to basically sprintf into and then copy the now known length result into the string (causing one allocation). But this can all be viewed as  QOI level discussions. What is important is that if number formatting is excluded from a feature like this much of its appeal when it comes to allocation count reduction is lost.

So while this proposal without number capability does reduce allocations there are many more allocations that can be gotten rid of if numbers are allowed. An alternative would be to introduce another set of number formatting functions which return objects that actually do the formatting into a buffer, but can also tell how large that buffer needs to be. Unfortunately this would add to C++ already large set of number formatting functions and there would be no warning if older, less performant functions are used inside an Append call by mistake.

I think, but have no proof, that it could be easier to get a customization point similar to operator<<(ostream&, T) for additional types if an operator chaining approach is implemented than for a Append function. As someone suggested it should be possible to let this mechanism fall back to a use of the ostream based operator if a specific overload is not available. Allowing the stream modifiers such as hex etc. would be possible but of course cumbersome and also perpetuate this old fashioned way of specifying formatting. One thought that struck me was that maybe a library implementer can turn the tables and implement shifting into an ostream in terms of this new feature (reducing the number of reallocations of the stream buffer) but this is probably not possible due to backward compatiblibty issues.

Nicol Bolas

unread,
Dec 29, 2016, 11:47:02 AM12/29/16
to ISO C++ Standard - Future Proposals

I find this "too bloated" argument to be unconvincing.

Does `std::basic_string` have functions that are, strictly speaking, not necessary? Yes. But that doesn't mean you shouldn't add function which actually are necessary to the interface. The fact that a type already has unneeded member functions shouldn't stop you from putting needed ones in there.

Now, you can argue that it actually isn't necessary, that there are ways to achieve the same performance of a concatenation function without making it a member. But "appeal to bloat" isn't a valid argument against it.

Olaf van der Spek

unread,
Dec 29, 2016, 11:47:59 AM12/29/16
to std-pr...@isocpp.org, victor.d...@gmail.com
2016-12-29 17:14 GMT+01:00 Bengt Gustafsson <bengt.gu...@beamways.com>:
> Note that deferring number formatting to existing functions reduces the
> performance gain of this function: Each of the number formatter functions
> will have to allocate a std::string or similar and return it before the
> Append() function or similar gets hold of it. Thus we can never get down to

http://en.cppreference.com/w/cpp/utility/to_chars ;)

> I think, but have no proof, that it could be easier to get a customization
> point similar to operator<<(ostream&, T) for additional types if an operator
> chaining approach is implemented than for a Append function. As someone

I think this isn't true, a customization point should be no trouble
with an append function..

Chaining looks good but wouldn't it make the single-allocation
optimization impossible?


--
Olaf

Nicol Bolas

unread,
Dec 29, 2016, 12:03:11 PM12/29/16
to ISO C++ Standard - Future Proposals, victor.d...@gmail.com
On Thursday, December 29, 2016 at 11:47:59 AM UTC-5, Olaf van der Spek wrote:
2016-12-29 17:14 GMT+01:00 Bengt Gustafsson <bengt.gu...@beamways.com>:
> Note that deferring number formatting to existing functions reduces the
> performance gain of this function: Each of the number formatter functions
> will have to allocate a std::string or similar and return it before the
> Append() function or similar gets hold of it. Thus we can never get down to

http://en.cppreference.com/w/cpp/utility/to_chars ;)

`to_chars` lacks the ability to tell you exactly how many characters a conversion will take. That's going to make it really difficult to pre-allocate the right amount of memory. At least, not without allocating a lot of extra space.

Andrey Semashev

unread,
Dec 29, 2016, 12:18:32 PM12/29/16
to std-pr...@isocpp.org
On 12/29/16 19:47, Nicol Bolas wrote:
> On Thursday, December 29, 2016 at 8:10:03 AM UTC-5, Andrey Semashev wrote:
>
> std::string is already bloated too much. Adding yet more bloat is
> hardly
> the way to go.
>
> I find this "too bloated" argument to be unconvincing.
>
> Does `std::basic_string` have functions that are, strictly speaking, not
> necessary? Yes. But that doesn't mean you shouldn't add function which
> actually /are necessary/ to the interface. The fact that a type already
> has unneeded member functions shouldn't stop you from putting needed
> ones in there.

I don't think formatting qualifies as the "necessary" or "needed"
functions for std::string.

> Now, you can argue that it actually /isn't/ necessary, that there are
> ways to achieve the same performance of a concatenation function without
> making it a member. But "appeal to bloat" isn't a valid argument against it.

It is, because that is exactly how std::string interface became bloated.
Consider the bunch of find*/rfind/compare member functions that are
currently present in std::string interface but could perfectly be
standalone. Some of them, in fact, already exist as standalone generic
algorithms, which a decent compiler will optimize to the same degree as
the dedicated member functions.

The proposed extension is worse than the mentioned member functions in
that it potentially brings new dependencies to implement formatting.

Andrey Semashev

unread,
Dec 29, 2016, 12:23:47 PM12/29/16
to std-pr...@isocpp.org
On 12/29/16 17:24, Olaf van der Spek wrote:
> 2016-12-29 15:12 GMT+01:00 Andrey Semashev <andrey....@gmail.com>:
>> I don't have a proposal, but if that's a generic formatting algorithm then
>
> From my first post: "It could support string_view, integers, maybe floats
> **but without formatting options**.."

Well, it may not support formatting options, but as long as it converts
arbitrary data to strings (or anything that can act like a string), it's
still a generic formatting algorithm.

>> it should not be coupled with std::string and of course it should not be
>> named append() (because, well, appending is the least of the things it
>> does). I think, a C++ version of sprintf() is a frequently requested
>> feature, and it seems it could potentially fit your case as well.
>
> Maybe, but the formatting options and string make it vastly more
> complex. It'd also have a different interface. Maybe it'd even use my
> proposed append function internally.. ;)

If we're talking about an sprintf() equivalent, I don't think a more
flexible formatting implementation can be based on top of a less
flexible one (that is, your proposed function).

Nicol Bolas

unread,
Dec 29, 2016, 4:58:59 PM12/29/16
to ISO C++ Standard - Future Proposals
On Thursday, December 29, 2016 at 12:18:32 PM UTC-5, Andrey Semashev wrote:
On 12/29/16 19:47, Nicol Bolas wrote:
> On Thursday, December 29, 2016 at 8:10:03 AM UTC-5, Andrey Semashev wrote:
>
>     std::string is already bloated too much. Adding yet more bloat is
>     hardly
>     the way to go.
>
> I find this "too bloated" argument to be unconvincing.
>
> Does `std::basic_string` have functions that are, strictly speaking, not
> necessary? Yes. But that doesn't mean you shouldn't add function which
> actually /are necessary/ to the interface. The fact that a type already
> has unneeded member functions shouldn't stop you from putting needed
> ones in there.

I don't think formatting qualifies as the "necessary" or "needed"
functions for std::string.

And I do not think that the basis of the append functionality requires including a "formatting" system. The basic proposal is fast sequential string appending. Stuff added on top of that should not interfere with that basic ideal.

> Now, you can argue that it actually /isn't/ necessary, that there are
> ways to achieve the same performance of a concatenation function without
> making it a member. But "appeal to bloat" isn't a valid argument against it.

It is, because that is exactly how std::string interface became bloated.

That's not my understanding of the evolution of this type.

As I understand it, what became known as `std::basic_string` was a regular old concrete string class. It had exactly the kind of interface that string classes would be expected to have: searching, replacing, etc. That's typical of string classes, and there's no reason to consider that "bloat". It certainly would not have been considered "bloated" at the time it was designed.

However, when they moved to stick it into the standard, it was decided to give it an STL interface in addition to its existing interface.

Consider the bunch of find*/rfind/compare member functions that are
currently present in std::string interface but could perfectly be
standalone. Some of them, in fact, already exist as standalone generic
algorithms, which a decent compiler will optimize to the same degree as
the dedicated member functions.

There is a fundamental difference between the argument you just outlined and the "X is bloat and bloat is bad" you said before. The difference being that the above is an actual argument, and the latter is an opinion.

If you want to say that it is not necessary to add appending functionality to `std::string` itself, then make that argument. But saying that the class is "bloated" and therefore adding anything is wrong is not a functional argument.

It should also be noted that the primary reason why those member functions continue to exist in `basic_string` (that is, why the committee did not replace their interface with the STL-based one) is because they work with integer offsets, rather than iterators/pointers. A lot of people do work with strings based on integer offsets, and they're not going to rewrite all of their code just because you think that iterators involve less "bloat". In some cases, they can't rewrite their code, since it's in C. So rather than making a string type that many people would reject, the committee made sure that the member API would use offsets, while relying on algorithms for those who want to use iterators.

If you feel that this is "bloat", that's your prerogative. But there is a genuine reason behind these APIs.

Thiago Macieira

unread,
Dec 29, 2016, 5:24:17 PM12/29/16
to std-pr...@isocpp.org
Em quinta-feira, 29 de dezembro de 2016, às 09:03:10 BRST, Nicol Bolas
escreveu:
> `to_chars` lacks the ability to tell you exactly how many characters a
> conversion will take. That's going to make it really difficult to
> pre-allocate the right amount of memory. At least, not without allocating a
> lot of extra space.

Which you can't know until you perform the conversion anyway. So any code that
tries to single-allocate a formatting chain needs to estimate with the worst
case scenario (which, for 'f', could be in the hundreds of characters) and
then trim the string.

As a QoI bonus, implementations can improve the estimation by performing log10
on the value, or log2 and divide by 3 (log2 is extremely fast).

Thiago Macieira

unread,
Dec 29, 2016, 5:30:07 PM12/29/16
to std-pr...@isocpp.org
Em quinta-feira, 29 de dezembro de 2016, às 20:18:28 BRST, Andrey Semashev
escreveu:
> It is, because that is exactly how std::string interface became bloated.
> Consider the bunch of find*/rfind/compare member functions that are
> currently present in std::string interface but could perfectly be
> standalone. Some of them, in fact, already exist as standalone generic
> algorithms, which a decent compiler will optimize to the same degree as
> the dedicated member functions.

And yet hardly any compiler will optimise as well as the dedicated copies of
those functions that exist for QString inside QtCore.

Andrey Semashev

unread,
Dec 29, 2016, 8:33:20 PM12/29/16
to ISO C++ Standard - Future Proposals
On Fri, Dec 30, 2016 at 1:30 AM, Thiago Macieira <thi...@macieira.org> wrote:
> Em quinta-feira, 29 de dezembro de 2016, às 20:18:28 BRST, Andrey Semashev
> escreveu:
>> It is, because that is exactly how std::string interface became bloated.
>> Consider the bunch of find*/rfind/compare member functions that are
>> currently present in std::string interface but could perfectly be
>> standalone. Some of them, in fact, already exist as standalone generic
>> algorithms, which a decent compiler will optimize to the same degree as
>> the dedicated member functions.
>
> And yet hardly any compiler will optimise as well as the dedicated copies of
> those functions that exist for QString inside QtCore.

I'm not familiar with Qt implementation, but I suspect it doesn't do
anything significantly more optimized than libc string functions. I
know at least gcc is able to convert std::copy/std::fill into
memcpy/memset calls when possible, and I see no reason why it couldn't
convert std::find/std::equal into memmem/memcmp. Does Qt do something
better than that?

Nicol Bolas

unread,
Dec 30, 2016, 12:15:50 AM12/30/16
to ISO C++ Standard - Future Proposals
On Thursday, December 29, 2016 at 5:24:17 PM UTC-5, Thiago Macieira wrote:
Em quinta-feira, 29 de dezembro de 2016, às 09:03:10 BRST, Nicol Bolas
escreveu:
> `to_chars` lacks the ability to tell you exactly how many characters a
> conversion will take. That's going to make it really difficult to
> pre-allocate the right amount of memory. At least, not without allocating a
> lot of extra space.

Which you can't know until you perform the conversion anyway. So any code that
tries to single-allocate a formatting chain needs to estimate with the worst
case scenario (which, for 'f', could be in the hundreds of characters) and
then trim the string.

As a QoI bonus, implementations can improve the estimation by performing log10
on the value, or log2 and divide by 3 (log2 is extremely fast).

Don't misunderstand my point; it's a perfectly fine design. I was just pointing out that this makes it essentially impossible to have both a single-allocation append design and in-situ string formatting during append operations.

So it's probably best to leave that alone.

Nicol Bolas

unread,
Dec 30, 2016, 12:23:58 AM12/30/16
to ISO C++ Standard - Future Proposals

Here's a better question: so what if it doesn't?

I'm in favor of QOI when it comes to algorithms. But at the end of the day, it costs me as a user nothing to have both `std::find` and `basic_string::find`. Does it hurt my program in any way that I could have used `std::find` instead of the member function? No. Does it make my program in any way confusing? No. Does it make my program run any slower? No. It doesn't even make my executable bigger, since either way, it'll compile down to an inlined function.

Then so long as there are genuine benefits to the member function version (like being able to take integer indices), what's the big deal?

Andrey Semashev

unread,
Dec 30, 2016, 5:08:17 AM12/30/16
to std-pr...@isocpp.org
On 12/30/16 08:23, Nicol Bolas wrote:
> On Thursday, December 29, 2016 at 8:33:20 PM UTC-5, Andrey Semashev wrote:
>
> I'm not familiar with Qt implementation, but I suspect it doesn't do
> anything significantly more optimized than libc string functions. I
> know at least gcc is able to convert std::copy/std::fill into
> memcpy/memset calls when possible, and I see no reason why it couldn't
> convert std::find/std::equal into memmem/memcmp. Does Qt do something
> better than that?
>
> Here's a better question: so what if it doesn't?
>
> I'm in favor of QOI when it comes to algorithms. But at the end of the
> day, it costs me as a user /nothing/ to have both `std::find` and
> `basic_string::find`. Does it hurt my program in any way that I could
> have used `std::find` instead of the member function? No. Does it make
> my program in any way confusing? No. Does it make my program run any
> slower? No. It doesn't even make my executable bigger, since either way,
> it'll compile down to an inlined function.
>
> Then so long as there are genuine benefits to the member function
> version (like being able to take integer indices), what's the big deal?

It affects interface conciseness. As a user you have to learn what those
functions do, why they are there, and when to use them and not the
standalone algorithms. Frankly, I did not find a definitive answer to
these questions myself after years of programming practice.

Index-based interface of these functions is not an advantage that
warrants their existence because you can trivially convert indices into
iterators yourself (and I'm sure these functions do that internally
anyway). Yet that interface requires std::string::npos, a special magic
number, which one has to check for everywhere, including those member
functions. You may argue that the number is named and that it's probably
((size_t)-1) so that you'll never have a string that large, and checking
for it is not expensive, but still it rubs me the wrong way every time I
see it. So what is the reason to use the index-based interface then?

There is also a cost if you want to implement a class that mimics
std::string. I've done that a few times, and those member functions do
add complexity to the task.

There is obviously a cost for standard library implementers and standard
writers and committee.

Andrey Semashev

unread,
Dec 30, 2016, 5:13:51 AM12/30/16
to std-pr...@isocpp.org
I'll add that now the standard library did that once as well, in
std::string_view.

Thiago Macieira

unread,
Dec 30, 2016, 7:06:27 AM12/30/16
to std-pr...@isocpp.org
Em sexta-feira, 30 de dezembro de 2016, às 04:33:15 BRST, Andrey Semashev
escreveu:
> On Fri, Dec 30, 2016 at 1:30 AM, Thiago Macieira <thi...@macieira.org>
wrote:
> > And yet hardly any compiler will optimise as well as the dedicated copies
> > of those functions that exist for QString inside QtCore.
>
> I'm not familiar with Qt implementation, but I suspect it doesn't do
> anything significantly more optimized than libc string functions. I
> know at least gcc is able to convert std::copy/std::fill into
> memcpy/memset calls when possible, and I see no reason why it couldn't
> convert std::find/std::equal into memmem/memcmp. Does Qt do something
> better than that?

Now try that for std::u16string and std::wstring.

And yes, it does something better than that because the libc functions are
optimised differently. memcmp is optimised for large data blocks, but most
strings are actually quite short, to the point that the necessary detection at
runtime to figure out the best strategy for long and short strings is enough
overhead.

By having a dedicated function somewhere, an implementation can provide an
out-of-line copy, hand-rolled for the use-cases.

Thiago Macieira

unread,
Dec 30, 2016, 7:09:58 AM12/30/16
to std-pr...@isocpp.org
Em quinta-feira, 29 de dezembro de 2016, às 21:23:58 BRST, Nicol Bolas
escreveu:
> I'm in favor of QOI when it comes to algorithms. But at the end of the day,
> it costs me as a user *nothing* to have both `std::find` and
> `basic_string::find`.

I agree with that, but not with your conclusions.

> Does it hurt my program in any way that I could have
> used `std::find` instead of the member function? No. Does it make my
> program in any way confusing? No.

It could hurt, but it's probably not confusing.

> Does it make my program run any slower? No.

It could, if they are optimised differently. If you use the generic function
that is optimised for amortising its set up cost over 1 kB of data on a 64-
byte data block instead of the function optimised for less than 128 bytes,
then your code will run slower.

> It doesn't even make my executable bigger, since either way, it'll
> compile down to an inlined function.

Uh... that makes it bigger, not smaller. Compiling to more inlined code always
makes it bigger. If you want to make it smaller, you have to call the same
out-of-line function.

Thiago Macieira

unread,
Dec 30, 2016, 7:17:02 AM12/30/16
to std-pr...@isocpp.org
Em sexta-feira, 30 de dezembro de 2016, às 13:08:11 BRST, Andrey Semashev
escreveu:
> It affects interface conciseness. As a user you have to learn what those
> functions do, why they are there, and when to use them and not the
> standalone algorithms. Frankly, I did not find a definitive answer to
> these questions myself after years of programming practice.

A function called "find" somewhere does the same as another function called
"find" elsewhere. Naming matters. So just name like functions likewise, and
unlike functions differently.

This way, the learning carries from one place to the next.

Also, I don't subscribe to the concise-interface paradigm as much as you (and
I guess Bjarne) do. I don't think a date class needs to have a
"find_next_friday" function, but I do think useful functions should be added.

> Index-based interface of these functions is not an advantage that
> warrants their existence because you can trivially convert indices into
> iterators yourself (and I'm sure these functions do that internally
> anyway). Yet that interface requires std::string::npos, a special magic
> number, which one has to check for everywhere, including those member
> functions. You may argue that the number is named and that it's probably
> ((size_t)-1) so that you'll never have a string that large, and checking
> for it is not expensive, but still it rubs me the wrong way every time I
> see it. So what is the reason to use the index-based interface then?

Because people are used to it and expect it. There's a self-reinfocing cycle
here: people learnt it, so they use that technique in their code; which makes
new developers learn it too, then use it again and again.

Breaking the cycle is possible, but you're going to cause grief to people who
are used to and expect that technique.

> There is also a cost if you want to implement a class that mimics
> std::string. I've done that a few times, and those member functions do
> add complexity to the task.

True, but that's not an issue we have to concern ourselves with. That happens
very infrequently, it's always done by experts, and if you want to implement a
very core class, you know you have an uphill battle.

> There is obviously a cost for standard library implementers and standard
> writers and committee.

True, but like the reimplementation, it happens once only, thereafter only
maintenance. But it may save hundreds of hours of work down the line by other
people.

Thiago Macieira

unread,
Dec 30, 2016, 7:19:21 AM12/30/16
to std-pr...@isocpp.org
Em quinta-feira, 29 de dezembro de 2016, às 21:15:50 BRST, Nicol Bolas
escreveu:
> Don't misunderstand my point; it's a perfectly fine design. I was just
> pointing out that this makes it essentially impossible to have both a
> single-allocation append design and in-situ string formatting during append
> operations.
>
> So it's probably best to leave that alone.

That's why I've been saying that single-allocation appending and formatting
should be separate things.

But to enable the latter, the former should allow for a generator object that
can specify the maximum size on constant time, then inform what the actual
size later.

Andrey Semashev

unread,
Dec 30, 2016, 8:07:49 AM12/30/16
to std-pr...@isocpp.org
On 12/30/16 15:06, Thiago Macieira wrote:
> Em sexta-feira, 30 de dezembro de 2016, às 04:33:15 BRST, Andrey Semashev
> escreveu:
>> On Fri, Dec 30, 2016 at 1:30 AM, Thiago Macieira <thi...@macieira.org>
> wrote:
>>> And yet hardly any compiler will optimise as well as the dedicated copies
>>> of those functions that exist for QString inside QtCore.
>>
>> I'm not familiar with Qt implementation, but I suspect it doesn't do
>> anything significantly more optimized than libc string functions. I
>> know at least gcc is able to convert std::copy/std::fill into
>> memcpy/memset calls when possible, and I see no reason why it couldn't
>> convert std::find/std::equal into memmem/memcmp. Does Qt do something
>> better than that?
>
> Now try that for std::u16string and std::wstring.

The only problem is with memset, and it's partly mitigated by wmemset.
The other functions don't depend on character sizes.

> And yes, it does something better than that because the libc functions are
> optimised differently. memcmp is optimised for large data blocks, but most
> strings are actually quite short, to the point that the necessary detection at
> runtime to figure out the best strategy for long and short strings is enough
> overhead.
>
> By having a dedicated function somewhere, an implementation can provide an
> out-of-line copy, hand-rolled for the use-cases.

Ok, I see. But is there a reason to have these optimized algorithms as
class members as opposed to free functions? Why limit their use to a
particular class? I mean, std::string_view and other user-defined
analogues of std::string (maybe even Qt included) could just make use of
the optimized string algorithms in the standard library, if they were
standalone and generic enough.

Thiago Macieira

unread,
Dec 30, 2016, 8:21:12 AM12/30/16
to std-pr...@isocpp.org
Em sexta-feira, 30 de dezembro de 2016, às 16:07:45 BRST, Andrey Semashev
escreveu:
> > Now try that for std::u16string and std::wstring.
>
> The only problem is with memset, and it's partly mitigated by wmemset.
> The other functions don't depend on character sizes.

Sure they do. memchr finds a single byte, not a word of 2 or 4 bytes. memcmp is
the same: it does a byte-by-byte comparison and returns a difference of the
first byte that compared differently. Except that the difference is very likely
incorrect for a 2-byte word on little-endian machines.

You can implement 2- and 4-byte string operations on top of the 1-byte libc
functions, but they won't be as efficient as the implementations doing direct 2-
and 4-byte ops.

> > By having a dedicated function somewhere, an implementation can provide an
> > out-of-line copy, hand-rolled for the use-cases.
>
> Ok, I see. But is there a reason to have these optimized algorithms as
> class members as opposed to free functions? Why limit their use to a
> particular class? I mean, std::string_view and other user-defined
> analogues of std::string (maybe even Qt included) could just make use of
> the optimized string algorithms in the standard library, if they were
> standalone and generic enough.

The question here is different. I'd like to have those methods accessible to
me without using std::string (though std::u16string_view would do nicely).

But I think they should be available as members for other reasons besides
efficiency.

Bo Persson

unread,
Dec 30, 2016, 8:52:37 AM12/30/16
to std-pr...@isocpp.org
On 2016-12-30 13:09, Thiago Macieira wrote:
> Em quinta-feira, 29 de dezembro de 2016, às 21:23:58 BRST, Nicol Bolas
> escreveu:

>> It doesn't even make my executable bigger, since either way, it'll
>> compile down to an inlined function.
>
> Uh... that makes it bigger, not smaller. Compiling to more inlined code always
> makes it bigger. If you want to make it smaller, you have to call the same
> out-of-line function.
>

Not *always*. Inlined code can open up new opportunities for the
optimizer, and save lots of code compared to passing parameters and
calling out-of-line functions.

Here is one example of when extensive inlining makes the code *a lot*
smaller:

http://stackoverflow.com/questions/11638271/examples-of-when-a-bitwise-swap-is-a-bad-idea/11639305#11639305



Bo Persson


Andrey Semashev

unread,
Dec 30, 2016, 8:58:21 AM12/30/16
to std-pr...@isocpp.org
On 12/30/16 16:21, Thiago Macieira wrote:
> Em sexta-feira, 30 de dezembro de 2016, às 16:07:45 BRST, Andrey Semashev
> escreveu:
>>> Now try that for std::u16string and std::wstring.
>>
>> The only problem is with memset, and it's partly mitigated by wmemset.
>> The other functions don't depend on character sizes.
>
> Sure they do. memchr finds a single byte, not a word of 2 or 4 bytes.

I listed memmem, which searches an arbitrarily sized needle. On systems
where it is absent, it is very easy to emulate through memchr.

> memcmp is
> the same: it does a byte-by-byte comparison and returns a difference of the
> first byte that compared differently.

No it doesn't. Its result is <0, 0 or >0, and not necessarilly the
difference.

> Except that the difference is very likely
> incorrect for a 2-byte word on little-endian machines.

Ah, right. We have wmemcmp then.

>> Ok, I see. But is there a reason to have these optimized algorithms as
>> class members as opposed to free functions? Why limit their use to a
>> particular class? I mean, std::string_view and other user-defined
>> analogues of std::string (maybe even Qt included) could just make use of
>> the optimized string algorithms in the standard library, if they were
>> standalone and generic enough.
>
> The question here is different. I'd like to have those methods accessible to
> me without using std::string (though std::u16string_view would do nicely).
>
> But I think they should be available as members for other reasons besides
> efficiency.

What are those reasons?

Thiago Macieira

unread,
Dec 30, 2016, 8:58:36 AM12/30/16
to std-pr...@isocpp.org
Em sexta-feira, 30 de dezembro de 2016, às 14:52:21 BRST, Bo Persson escreveu:
> On 2016-12-30 13:09, Thiago Macieira wrote:
> > Em quinta-feira, 29 de dezembro de 2016, às 21:23:58 BRST, Nicol Bolas
> >
> > escreveu:
> >> It doesn't even make my executable bigger, since either way, it'll
> >> compile down to an inlined function.
> >
> > Uh... that makes it bigger, not smaller. Compiling to more inlined code
> > always makes it bigger. If you want to make it smaller, you have to call
> > the same out-of-line function.
>
> Not *always*. Inlined code can open up new opportunities for the
> optimizer, and save lots of code compared to passing parameters and
> calling out-of-line functions.

Right, I apologise for the generalisation.

But in this case it holds true: those functions aren't trivial, so inlining
them at every call place increases the code size, not descreases.

Thiago Macieira

unread,
Dec 30, 2016, 9:46:09 AM12/30/16
to std-pr...@isocpp.org
Em sexta-feira, 30 de dezembro de 2016, às 16:58:18 BRST, Andrey Semashev
escreveu:
> On 12/30/16 16:21, Thiago Macieira wrote:
> > Em sexta-feira, 30 de dezembro de 2016, às 16:07:45 BRST, Andrey Semashev
> >
> > escreveu:
> >>> Now try that for std::u16string and std::wstring.
> >>
> >> The only problem is with memset, and it's partly mitigated by wmemset.
> >> The other functions don't depend on character sizes.
> >
> > Sure they do. memchr finds a single byte, not a word of 2 or 4 bytes.
>
> I listed memmem, which searches an arbitrarily sized needle. On systems
> where it is absent, it is very easy to emulate through memchr.

But not as efficient if you're looking at words, not just needles.

Take the following array of 16-bit elements:
{ 0x0102, 0x0304, 0x0506, 0 }

On little-endian machines, that's the byte sequence

02 01 04 03 06 05 00 00

If you search for 0x401 with memmem, you're going to find it at byte index 1,
but that's not a valid match because it straddles the boundary of two 16-bit
elements.

If you searched for 0x404 with memchr, you'd search first for 0x04, which you'd
find at byte index 2, so it's valid, only to conclude that the next byte isn't
a match.

> > memcmp is
> > the same: it does a byte-by-byte comparison and returns a difference of
> > the
> > first byte that compared differently.
>
> No it doesn't. Its result is <0, 0 or >0, and not necessarilly the
> difference.

Either way, it can be wrong. Take these two strings:

u"ABC\u0180" => 41 00 42 00 43 00 80 01
u"ABC\u027F" => 41 00 42 00 43 00 7F 02

A proper char16_t comparison should find that the first string is less than the
second. But a pure memcmp will find that byte index 6 differs and that 0x7F is
less than 0x80, so the second string is less than the first.

So memcmp won't cut it. You need a function that returns the pointer to or
index of the first byte that compared unequally, so that you can inspect the
word that contains that differing byte. There's no such function in libc.

> > Except that the difference is very likely
> > incorrect for a 2-byte word on little-endian machines.
>
> Ah, right. We have wmemcmp then.

wchar_t is 4 bytes on Unix systems, so it won't help for char16_t. Where is is
2 bytes, it won't help for char32_t.

> > But I think they should be available as members for other reasons besides
> > efficiency.
>
> What are those reasons?

I explained in another email: convenience and difference in philosophy. I do
believe a class should provide as members most of the common operations to be
done to its data. That's why QString has startsWith() and endsWith(), which
are trivially easy to implement.

Olaf van der Spek

unread,
Dec 30, 2016, 11:56:02 AM12/30/16
to std-pr...@isocpp.org
2016-12-30 15:46 GMT+01:00 Thiago Macieira <thi...@macieira.org>:
> Em sexta-feira, 30 de dezembro de 2016, às 16:58:18 BRST, Andrey Semashev
> escreveu:
> I explained in another email: convenience and difference in philosophy. I do
> believe a class should provide as members most of the common operations to be
> done to its data. That's why QString has startsWith() and endsWith(), which
> are trivially easy to implement.

However, if those are implemented as free functions taking something
like string_view, other string types would be able to take advantage
of them too...



--
Olaf

Nicol Bolas

unread,
Dec 30, 2016, 12:10:22 PM12/30/16
to ISO C++ Standard - Future Proposals
On Friday, December 30, 2016 at 8:58:36 AM UTC-5, Thiago Macieira wrote:
Em sexta-feira, 30 de dezembro de 2016, às 14:52:21 BRST, Bo Persson escreveu:
> On 2016-12-30 13:09, Thiago Macieira wrote:
> > Em quinta-feira, 29 de dezembro de 2016, às 21:23:58 BRST, Nicol Bolas
> >
> > escreveu:
> >> It doesn't even make my executable bigger, since either way, it'll
> >> compile down to an inlined function.
> >
> > Uh... that makes it bigger, not smaller. Compiling to more inlined code
> > always makes it bigger. If you want to make it smaller, you have to call
> > the same out-of-line function.
>
> Not *always*. Inlined code can open up new opportunities for the
> optimizer, and save lots of code compared to passing parameters and
> calling out-of-line functions.

Right, I apologise for the generalisation.

But in this case it holds true: those functions aren't trivial, so inlining
them at every call place increases the code size, not descreases.

My point was that both implementations, the member version and free function, would be inlined to the same code. So neither will be bigger relative to the other.

Nicol Bolas

unread,
Dec 30, 2016, 12:21:53 PM12/30/16
to ISO C++ Standard - Future Proposals

Sure. But the problem with over-genericization is that, anyone who is not steeped in the lore of all your generic algorithms can't figure out how to do something simple.

`startsWith` is a perfectly fine and descriptive name, for a string member function. The context and its parameters explain what it's doing. But the non-member generic version couldn't be simply named `starts_with`. It would have to be something like `matches_initial_sequence`. Far less descriptive. Plus, there's the fact that it can conceptually work with any forward range(s), which makes it less likely that someone looking for how to do this test on strings will find it.

What's worse is that people could easily argue that we don't need `matches_initial_sequence` at all. Once we have ranges, people can argue that `matches_initial_sequence is equivalent to `std::equal(some_str | initial_sequence(size(other_str)), other_str);` So do we really need such a function? I could see people arguing that it's just added "bloat".

Olaf van der Spek

unread,
Dec 30, 2016, 12:24:02 PM12/30/16
to std-pr...@isocpp.org
2016-12-30 18:21 GMT+01:00 Nicol Bolas <jmck...@gmail.com>:
> On Friday, December 30, 2016 at 11:56:02 AM UTC-5, Olaf van der Spek wrote:
>>
>> 2016-12-30 15:46 GMT+01:00 Thiago Macieira <thi...@macieira.org>:
>> > Em sexta-feira, 30 de dezembro de 2016, às 16:58:18 BRST, Andrey
>> > Semashev
>> > escreveu:
>> > I explained in another email: convenience and difference in philosophy.
>> > I do
>> > believe a class should provide as members most of the common operations
>> > to be
>> > done to its data. That's why QString has startsWith() and endsWith(),
>> > which
>> > are trivially easy to implement.
>>
>> However, if those are implemented as free functions taking something
>> like string_view, other string types would be able to take advantage
>> of them too...
>
>
> Sure. But the problem with over-genericization is that, anyone who is not
> steeped in the lore of all your generic algorithms can't figure out how to
> do something simple.
>
> `startsWith` is a perfectly fine and descriptive name, for a string member
> function. The context and its parameters explain what it's doing. But the
> non-member generic version couldn't be simply named `starts_with`. It would

Why not? Works fine for Boost:
http://www.boost.org/doc/libs/1_63_0/doc/html/boost/algorithm/starts_with.html

> have to be something like `matches_initial_sequence`. Far less descriptive.
> Plus, there's the fact that it can conceptually work with any forward
> range(s), which makes it less likely that someone looking for how to do this
> test on strings will find it.
>
> What's worse is that people could easily argue that we don't need
> `matches_initial_sequence` at all. Once we have ranges, people can argue
> that `matches_initial_sequence is equivalent to `std::equal(some_str |
> initial_sequence(size(other_str)), other_str);` So do we really need such a
> function? I could see people arguing that it's just added "bloat".

Hehe


--
Olaf

Nicol Bolas

unread,
Dec 30, 2016, 1:16:27 PM12/30/16
to ISO C++ Standard - Future Proposals

Interface conciseness is less important than overall readability. Consider a relatively simple programming task. You're given two strings. You want to find the last instance of string B within string A, then generate a string that consists of all characters before the instance you found.

This is what the implementation based on generic algorithms and iterators looks like:

std::string generic(std::string look, std::string pattern)
{
   
auto loc = std::search(look.rbegin(), look.rend(),
        pattern
.rbegin(), pattern.rend());
       
   
if(loc != look.rend())
   
{
        loc
+= pattern.size();
       
return std::string(look.begin(), loc.base());
   
}
   
else
       
return std::string{};
}

Understanding this code requires being well versed in how reverse iterators work. Two particularly non-obvious things I had to do were: 1) reversing the pattern as well as the string being searched and 2) offsetting `loc` by the pattern's size, since that's not the actual location.

This is what the index&member function version looks like:

std::string member(std::string look, std::string pattern)
{
   
auto loc = look.rfind(pattern);
   
   
if(loc != std::string::npos)
   
{
       
return look.substr(0, loc);
   
}
   
   
return std::string{};
}

That's much shorter and more easily understood. There's no need to reverse the pattern's range, nor the mysterious offset. The only thing that might be at all confusing is the test against `npos`. But that's ultimately no different from testing against `look.rend()`.

As for compiler optimizations, here are the compiled results for GCC 7, under -O3. It seems to me that `member` is much shorter in assembly than `generic`, requiring a lot fewer jumps and the like. So your "the compiler can sort it out" argument seems to not be true in this case.

There is also a cost if you want to implement a class that mimics
std::string. I've done that a few times, and those member functions do
add complexity to the task.

There is obviously a cost for standard library implementers and standard
writers and committee.

So you're saying that we should avoid good APIs because they're hard to get through committee? That sounds like a problem with the committee process, not with the API.

Thiago Macieira

unread,
Dec 30, 2016, 1:50:02 PM12/30/16
to std-pr...@isocpp.org
Em sexta-feira, 30 de dezembro de 2016, às 17:56:00 BRST, Olaf van der Spek
escreveu:
I don't see it that way. Like I said, they're trivially easy to implement, so
they're trivially easy to reimplement. They can be implemented multiple times,
one for each string-like class.

Or std::string could delegate everything to std::string_view on itself. This
achieves high code reuse.

Nicol Bolas

unread,
Dec 30, 2016, 8:48:36 PM12/30/16
to ISO C++ Standard - Future Proposals
On Friday, December 30, 2016 at 1:50:02 PM UTC-5, Thiago Macieira wrote:
Em sexta-feira, 30 de dezembro de 2016, às 17:56:00 BRST, Olaf van der Spek
escreveu:
> 2016-12-30 15:46 GMT+01:00 Thiago Macieira <thi...@macieira.org>:
> > Em sexta-feira, 30 de dezembro de 2016, às 16:58:18 BRST, Andrey Semashev
> > escreveu:
> > I explained in another email: convenience and difference in philosophy. I
> > do believe a class should provide as members most of the common
> > operations to be done to its data. That's why QString has startsWith()
> > and endsWith(), which are trivially easy to implement.
>
> However, if those are implemented as free functions taking something
> like string_view, other string types would be able to take advantage
> of them too...

I don't see it that way. Like I said, they're trivially easy to implement, so
they're trivially easy to reimplement. They can be implemented multiple times,
one for each string-like class.

Any one such function is relatively easy to implement. But when you have twenty such functions, it starts being a significant task. Especially once you want to start testing them all comprehensively.

This is precisely why even simple algorithms like `search` and `accumulate` and so forth are not bound to a specific container. Yes, we could always rewrite them. But what's the point of that, when we don't have to?

Thiago Macieira

unread,
Dec 30, 2016, 9:05:44 PM12/30/16
to std-pr...@isocpp.org
Em sexta-feira, 30 de dezembro de 2016, às 17:48:35 BRST, Nicol Bolas
escreveu:
> > I don't see it that way. Like I said, they're trivially easy to implement,
> > so
> > they're trivially easy to reimplement. They can be implemented multiple
> > times,
> > one for each string-like class.
>
> Any one such function is relatively easy to implement. But when you have
> *twenty* such functions, it starts being a significant task. *Especially*
> once you want to start testing them all comprehensively.

If they all call the same implementation, then you can test it once and it
would suffice. It is the exact same amount of testing necessary as if it were
only a free function.

> This is precisely why even simple algorithms like `search` and `accumulate`
> and so forth are not bound to a specific container. Yes, we could always
> rewrite them. But what's the point of that, when we don't *have to*?

I'm not asking for all functions to be added to each and every class. Only
those that are used very often and would benefit from conciseness of code and
discoverability.

Greg Marr

unread,
Dec 30, 2016, 10:18:40 PM12/30/16
to ISO C++ Standard - Future Proposals
On Friday, December 30, 2016 at 1:16:27 PM UTC-5, Nicol Bolas wrote:
Interface conciseness is less important than overall readability. Consider a relatively simple programming task. You're given two strings. You want to find the last instance of string B within string A, then generate a string that consists of all characters before the instance you found.

This is what the implementation based on generic algorithms and iterators looks like:

std::string generic(std::string look, std::string pattern)
{
   
auto loc = std::search(look.rbegin(), look.rend(),
        pattern
.rbegin(), pattern.rend());
       
   
if(loc != look.rend())
   
{
        loc
+= pattern.size();
       
return std::string(look.begin(), loc.base());
   
}
   
else
       
return std::string{};
}

Understanding this code requires being well versed in how reverse iterators work. Two particularly non-obvious things I had to do were: 1) reversing the pattern as well as the string being searched and 2) offsetting `loc` by the pattern's size, since that's not the actual location.

This is what the index&member function version looks like:

std::string member(std::string look, std::string pattern)
{
   
auto loc = look.rfind(pattern);
   
   
if(loc != std::string::npos)
   
{
       
return look.substr(0, loc);
   
}
   
   
return std::string{};
}

That's much shorter and more easily understood. There's no need to reverse the pattern's range, nor the mysterious offset. The only thing that might be at all confusing is the test against `npos`. But that's ultimately no different from testing against `look.rend()`.

Seems to me that's only because you chose a sub-optimal algorithm.  You want the one that matches string::rfind, but you used the one that matches string::find instead, so you had to add extra code to make it work right.  Try this:

std::string generic(std::string look, std::string pattern)
{

   
auto loc = std::find_end(look.begin(), look.end(),
        pattern
.begin(), pattern.end());
       
   
if(loc != look.end())
   
{
        return std::string(look.begin(), loc);
   
}

   
return std::string{};
}

or with Ranges:

std::string generic(std::string look, std::string pattern)
{

   
auto loc = std::find_end(look, pattern);
       
   
if(loc != look.end())
   
{
       
return std::string(look.begin(), loc);
   
}

   
return std::string{};
}
 
The generated assembly here looks to be smaller than search, but not as small as rfind.

Nicol Bolas

unread,
Dec 31, 2016, 11:15:06 AM12/31/16
to ISO C++ Standard - Future Proposals
On Friday, December 30, 2016 at 10:18:40 PM UTC-5, Greg Marr wrote:
On Friday, December 30, 2016 at 1:16:27 PM UTC-5, Nicol Bolas wrote:
Interface conciseness is less important than overall readability. Consider a relatively simple programming task. You're given two strings. You want to find the last instance of string B within string A, then generate a string that consists of all characters before the instance you found.

This is what the implementation based on generic algorithms and iterators looks like:

std::string generic(std::string look, std::string pattern)
{
   
auto loc = std::search(look.rbegin(), look.rend(),
        pattern
.rbegin(), pattern.rend());
       
   
if(loc != look.rend())
   
{
        loc
+= pattern.size();
       
return std::string(look.begin(), loc.base());
   
}
   
else
       
return std::string{};
}

Understanding this code requires being well versed in how reverse iterators work. Two particularly non-obvious things I had to do were: 1) reversing the pattern as well as the string being searched and 2) offsetting `loc` by the pattern's size, since that's not the actual location.

This is what the index&member function version looks like:

std::string member(std::string look, std::string pattern)
{
   
auto loc = look.rfind(pattern);
   
   
if(loc != std::string::npos)
   
{
       
return look.substr(0, loc);
   
}
   
   
return std::string{};
}

That's much shorter and more easily understood. There's no need to reverse the pattern's range, nor the mysterious offset. The only thing that might be at all confusing is the test against `npos`. But that's ultimately no different from testing against `look.rend()`.

Seems to me that's only because you chose a sub-optimal algorithm.  You want the one that matches string::rfind, but you used the one that matches string::find instead, so you had to add extra code to make it work right.  Try this:

This actually emphasizes one of the problems with having a gigantic algorithms library: the difficulty of finding appropriate operations.

I have been using C++ for decades now. And yet until you posted that, I had never even heard of `std::find_end`. It's been there for nearly two decades, and yet, I never knew it existed.

If people don't know an algorithm exists, it can't be used.

Ville Voutilainen

unread,
Dec 31, 2016, 11:48:27 AM12/31/16
to ISO C++ Standard - Future Proposals
On 31 December 2016 at 18:15, Nicol Bolas <jmck...@gmail.com> wrote:
>> Seems to me that's only because you chose a sub-optimal algorithm. You
>> want the one that matches string::rfind, but you used the one that matches
>> string::find instead, so you had to add extra code to make it work right.
>> Try this:
>
>
> This actually emphasizes one of the problems with having a gigantic
> algorithms library: the difficulty of finding appropriate operations.
>
> I have been using C++ for decades now. And yet until you posted that, I had
> never even heard of `std::find_end`. It's been there for nearly two decades,
> and yet, I never knew it existed.
>
> If people don't know an algorithm exists, it can't be used.


Right, but I'm not convinced people are any wiser when they don't know
a member function exists.
In a completely anecdotal fashion, I tend to separate "what tools do I
have to solve this problem?" from "let's bang on the
keyboard and see what we'll find". Sometimes it's said that member
functions are easier to find
for IDEs and tools like intellisense, but when I have done the "what
tools?" phase, I tend to
turn the completion tools off if they give me a single false positive,
because they end up slowing
me down when I know what I intend to write. To each their own, I guess.

But hey, what proposal are we discussing, again? :) After all this
rather philosophical discussion,
I'm not quite sure what we're looking at. ;)

Andrey Semashev

unread,
Dec 31, 2016, 7:08:24 PM12/31/16
to std-pr...@isocpp.org
On 12/30/16 21:16, Nicol Bolas wrote:
>
> Interface conciseness is less important than overall readability.
> Consider a relatively simple programming task. You're given two strings.
> You want to find the last instance of string B within string A, then
> generate a string that consists of all characters /before/ the instance
> you found.
>
> This is what the implementation based on generic algorithms and
> iterators looks like:
>
> |
> std::stringgeneric(std::stringlook,std::stringpattern)
> {
> autoloc =std::search(|look|.rbegin(),|look|.rend(),
> pattern.rbegin(),pattern.rend());
>
> if(loc !=|look|.rend())
> {
> loc +=pattern.size();
> returnstd::string(|look|.begin(),loc.base());
> }
> else
> returnstd::string{};
> }
> |
>
> This is what the index&member function version looks like:
>
> |
> std::stringmember(std::stringlook,std::stringpattern)
> {
> autoloc =look.rfind(pattern);
>
> if(loc !=std::string::npos)
> {
> returnlook.substr(0,loc);
> }
>
> returnstd::string{};
> }
> |

Well, if the `std::string::rfind` algorithm was standalone, I believe
the code would've been just as readable:

template< typename Iterator1, typename Iterator2 >
Iterator1 search_last(Iterator1 begin, Iterator1 end,
Iterator2 needle_begin, Iterator2 needle_end);

std::string generic2(std::string look, std::string pattern)
{
auto it = search_last(look.begin(), look.end(),
pattern.begin(), pattern.end());

if (it != look.end())
return std::string(look.begin(), it);

return std::string{};
}

That `search_last` algorithm could've been a generic algorithm with
specialized versions for optimal performance (if the compiler doesn't
already do the good job). So once again, index-based interface (I mean,
the choice of indices over iterators by itself), or algorithms being
`std::string` members doesn't make the code any clearer.

I can understand someone might be used to indices as a concept, but
really, most of the standard library is built around iterators; you'd
expect anyone more or less familiar with C++ should be used to iterators
in no less degree.

> As for compiler optimizations, here are the compiled results for GCC 7,
> under -O3 <https://godbolt.org/g/xDFhgs>. It seems to me that `member`
> is much shorter in assembly than `generic`, requiring a lot fewer jumps
> and the like. So your "the compiler can sort it out" argument seems to
> not be true in this case.

Ok, fair enough, the compiler is not good at optimizing reverse
iteration. Let the library provide an optimized implementation of that
algorithm then. Let it be reusable in classes other than `std::string`.
In that case there is no need to pile it in `std::string` interface.
More concise `std::string` interface, more reusable algorithms, everyone
wins.

I know it's too late for the existing algorithms in `std::string`. I
just don't want it to get worse.

> There is also a cost if you want to implement a class that mimics
> std::string. I've done that a few times, and those member functions do
> add complexity to the task.
>
> There is obviously a cost for standard library implementers and
> standard
> writers and committee.
>
> So you're saying that we should avoid good APIs because they're hard to
> get through committee? That sounds like a problem with the committee
> process, not with the API.

I'm saying poorly designed API that imposes coupling and additional
dependencies adds a cost both on users who want to use or implement that
API and the committee which has to maintain evolution of the library.

I'm not saying no new APIs should be added to the standard library.

ol...@join.cc

unread,
Jan 13, 2017, 11:10:33 AM1/13/17
to ISO C++ Standard - Future Proposals
Op zaterdag 31 december 2016 17:48:27 UTC+1 schreef Ville Voutilainen:
But hey, what proposal are we discussing, again? :) After all this
rather philosophical discussion,
I'm not quite sure what we're looking at. ;)

I'm glad you asked. :D

append(s, "A", "B", 42); with at least support for string but perhaps more general support.

It could call append(s, v) for each argument, providing a simple customization support. 
I'm not sure how to combine customization and optimal performance though.


Victor Zverovich

unread,
Jan 14, 2017, 4:02:03 PM1/14/17
to ISO C++ Standard - Future Proposals, lnal...@gmail.com
Hello,

The author of the fmt library here. FWIW I've been working on a proposal to introduce similar formatting functionality based on variadic templates to the standard: http://fmtlib.net/Text%20Formatting.html . It is still a very early and incomplete draft but I'd be glad to hear feedback.

With this functionality, one could do something like

  std::string s = std::format("{}{}{}", "A", "B", 42);

This is somewhat more general than what the OP proposes because it allows format specifiers and output targets other than strings. At the same time, if properly implemented, it solves the problem of extra allocations and can have performance similar to that of sprintf.

Any comments are very welcome.

Cheers,
Victor

On Wednesday, December 28, 2016 at 2:03:18 AM UTC-8, lnal...@gmail.com wrote:
Hello,

It would be useful, but will however probably cause temporary string in variadic iteration (I'm not a variadic template expert).

but my dream is more to have a library like this one standardized https://github.com/fmtlib/fmt

Laurent

Le mercredi 28 décembre 2016 09:50:40 UTC+1, Olaf van der Spek a écrit :
Hi,

One frequently needs to append stuff to strings, but the standard way
(s += "A" + "B" + to_string(42)) isn't optimal due to temporaries.
A variadic append() for std::string seems like the obvious solution.
It could support string_view, integers, maybe floats
but without formatting options..
It could even be extensible by calling append(s, t);


append(s, "A", "B", 42);

Would this be useful for the C++ std lib?

gmis...@gmail.com

unread,
Jan 14, 2017, 4:35:45 PM1/14/17
to ISO C++ Standard - Future Proposals, lnal...@gmail.com

I'm looking forward to formatting being proposed for the standard.

You touched on reducing allocations, I think their should be a variant of the interface that reuses an existing string to format into.
std::string reuse_ms;
e.g. std::fmt_existing(reuse_me, ...);

This would raise the possibility of no additional allocations in some situations.
But particularly in a loop.
The user may also resize/reserve the string before the loop then to reduce the chance of any unexpected allocations or exceptions.
They could aim to set a worst case size.

I haven't looked closely at fmt so apologies if this facility already exists.

Victor Zverovich

unread,
Jan 14, 2017, 5:03:56 PM1/14/17
to ISO C++ Standard - Future Proposals, lnal...@gmail.com, gmis...@gmail.com
I agree that it would be useful to have a formatting function that appends to existing string. Incidentally, the facility to append to strings and containers with contiguous storage was recently contributed to fmt: https://github.com/fmtlib/fmt/pull/450

gmis...@gmail.com

unread,
Jan 14, 2017, 5:33:01 PM1/14/17
to ISO C++ Standard - Future Proposals, lnal...@gmail.com, gmis...@gmail.com


On Sunday, January 15, 2017 at 11:03:56 AM UTC+13, Victor Zverovich wrote:
I agree that it would be useful to have a formatting function that appends to existing string. Incidentally, the facility to append to strings and containers with contiguous storage was recently contributed to fmt: https://github.com/fmtlib/fmt/pull/450
 
I was just in the process of replying to say that decoupling formatting from string would be great. And that seems to do that.
I didn't see exactly how it's used but it seems (but I'm sure someone will correct me here) it would be nice if this worked:

std::vector v;
So fmt(v. "whatever");

I wonder if the standard containers could expose themselves as buffers that would enable that or this if it's better:

fmt(as_buffer(v), "whatever"); or fmt(v.as_buffer(), "whatever") if that's better.

I don't see why std::array couldn't be made to work too etc. Does your wrapper support array?

It seems it would be possible, or at least if the standard could help here.
Nicol's/boosts interface to default constructed allocation would seem useful to employ in this interface for fmt.

Thiago Macieira

unread,
Jan 14, 2017, 9:31:47 PM1/14/17
to std-pr...@isocpp.org
On sábado, 14 de janeiro de 2017 14:33:01 PST gmis...@gmail.com wrote:
> I was just in the process of replying to say that decoupling formatting
> from string would be great. And that seems to do that.
> I didn't see exactly how it's used but it seems (but I'm sure someone will
> correct me here) it would be nice if this worked:
>
> std::vector v;
> So fmt(v. "whatever");

Sorry, but.. why?

Why can't you just use std::string? Why does it need to be something
different?

The formatting code is likely to be big, so it's most likely not going to be
inline. Therefore, it can't be templated.

gmis...@gmail.com

unread,
Jan 14, 2017, 10:58:30 PM1/14/17
to ISO C++ Standard - Future Proposals
I wasn't saying you can't use a string, I was trying to say that it seems to be fmt should be able to format into more than just a string.
Like an std::array, or std::vector or or whatever, via an adaptor or otherwise. anything conforming to a buffer concept of some kind.

i.e. any thing that can present a block of memory containing characters.

quite how that works I'm not sure yet, I specifically wouldn't be keen to only a format that worked only with a string and then need to copy that somewhere else unless there was a good reason. There may be a good reason, but it seems we should be aware of this.
I also don't want to format have to create that thing either, so I don't see why we should be forced to dynamically create memory in order to format which a string only thing would force.
If string has a member called fmt that defers to some outer fomat, fine if that's a helper, but if a string was the sole means to format, I think that's not a great thing without convincing motivation.

gmis...@gmail.com

unread,
Jan 14, 2017, 11:04:23 PM1/14/17
to ISO C++ Standard - Future Proposals


On Sunday, January 15, 2017 at 3:31:47 PM UTC+13, Thiago Macieira wrote:
I could also imagine wanting to have a unique_ptr or char array from C that I want to format into. I can write some adaptor to expose that if it doesn't have anything already that adapts it. So no string at at all there.
I think the OP has these situations covered. It's just about how much better they would be if the library comes into the standard and the standard can adapt to make things even smoother for the library then if need be.
I can imagine the standard help with such adaptors if that's what it needs.

Anyway I don't know how it would work but the emphasis is on not just having string as the only sink for formatting is what I'm saying. And having the standard make those other things very usable out of the box.

Thiago Macieira

unread,
Jan 15, 2017, 3:23:09 AM1/15/17
to std-pr...@isocpp.org
On sábado, 14 de janeiro de 2017 19:58:30 PST gmis...@gmail.com wrote:
> > Sorry, but.. why?
> >
> > Why can't you just use std::string? Why does it need to be something
> > different?
> >
> > The formatting code is likely to be big, so it's most likely not going to
> > be
> > inline. Therefore, it can't be templated.
>
> I wasn't saying you can't use a string, I was trying to say that it seems
> to be fmt should be able to format into more than just a string.
> Like an std::array, or std::vector or or whatever, via an adaptor or
> otherwise. anything conforming to a buffer concept of some kind.

Why can't we call that adaptor "std::string"?

> i.e. any thing that can present a block of memory containing characters.

And that can be reallocated to extend its size, or truncated once the true
size is known. And is contiguous, of course.

> quite how that works I'm not sure yet, I specifically wouldn't be keen to
> only a format that worked only with a string and then need to copy that
> somewhere else unless there was a good reason. There may be a good reason,
> but it seems we should be aware of this.

Let's start with the good reason. It has to be good enough to overcome the
need to make the library function more complex for the 99.99% of the uses.

I know he C library does it, and that it can reuse the same formatters to
output to a preallocated string (snprintf) or a file (fprintf). But that
probably only works because the C library internally writes to a buffer and
flushes it periodically. So in order to do the same for us, for multiple
different output classes, the formatting function would likely have to have
its own buffering. That's what I meant when I said it would be more complex
than it needs to be for the 99.99% of the uses.

> I also don't want to format have to create that thing either, so I don't
> see why we should be forced to dynamically create memory in order to format
> which a string only thing would force.

If it can't allocate or reallocate, then it must be prepared for failing in
case of buffer overrun. When formatting, you usually don't want that because
formatting can't be easily restarted from where it failed.

> If string has a member called fmt that defers to some outer fomat, fine if
> that's a helper, but if a string was the sole means to format, I think
> that's not a great thing without convincing motivation.

See above.

I'd like to hear the motivation for formatting to anything else, compared to
the added complexity to make it work. In other words: is it worth it?

Magnus Fromreide

unread,
Jan 15, 2017, 3:54:20 AM1/15/17
to std-pr...@isocpp.org
On Sun, Jan 15, 2017 at 12:23:04AM -0800, Thiago Macieira wrote:
> On sábado, 14 de janeiro de 2017 19:58:30 PST gmis...@gmail.com wrote:
> > > Sorry, but.. why?
> > >
> > > Why can't you just use std::string? Why does it need to be something
> > > different?
> > >
> > > The formatting code is likely to be big, so it's most likely not going to
> > > be
> > > inline. Therefore, it can't be templated.
> >
> > I wasn't saying you can't use a string, I was trying to say that it seems
> > to be fmt should be able to format into more than just a string.
> > Like an std::array, or std::vector or or whatever, via an adaptor or
> > otherwise. anything conforming to a buffer concept of some kind.
>
> Why can't we call that adaptor "std::string"?
>
> > i.e. any thing that can present a block of memory containing characters.
>
> And that can be reallocated to extend its size, or truncated once the true
> size is known. And is contiguous, of course.
>
> > quite how that works I'm not sure yet, I specifically wouldn't be keen to
> > only a format that worked only with a string and then need to copy that
> > somewhere else unless there was a good reason. There may be a good reason,
> > but it seems we should be aware of this.
>
> Let's start with the good reason. It has to be good enough to overcome the
> need to make the library function more complex for the 99.99% of the uses.

I think that one primary motivation for a fmt type interface is to ease
internationalization efforts. From there follows that one common output
target would be a streambuf. Now, I know that streams and performance in the
same sentence is dodgy at best but why make it even worse with an unneeded
temporary std::string?

/MF

gmis...@gmail.com

unread,
Jan 15, 2017, 6:00:52 AM1/15/17
to ISO C++ Standard - Future Proposals
>
> I wasn't saying you can't use a string, I was trying to say that it seems
> to be fmt should be able to format into more than just a string.
> Like an std::array, or std::vector or or whatever, via an adaptor or
> otherwise. anything conforming to a buffer concept of some kind.

Why can't we call that adaptor "std::string"?

Well nobody knows for sure until we seen the options and what the problems are with each option. And it might mean that their are several interfaces required. cpp format seems to have a few so that probably tells you that we will need a few and more than just string.

but to me in theory at least, if there is going to be only one interface, it can't be string as having to allocate to do a format is a non starter to me.
it's too slow. I don't see it as viable that you have to allocate to format.


> i.e. any thing that can present a block of memory containing characters.

And that can be reallocated to extend its size, or truncated once the true
size is known. And is contiguous, of course.

> quite how that works I'm not sure yet, I specifically wouldn't be keen to
> only a format that worked only with a string and then need to copy that
> somewhere else unless there was a good reason. There may be a good reason,
> but it seems we should be aware of this.

Let's start with the good reason. It has to be good enough to overcome the
need to make the library function more complex for the 99.99% of the uses.

I know he C library does it, and that it can reuse the same formatters to
output to a preallocated string (snprintf) or a file (fprintf). But that
probably only works because the C library internally writes to a buffer and
flushes it periodically. So in order to do the same for us, for multiple
different output classes, the formatting function would likely have to have
its own buffering. That's what I meant when I said it would be more complex
than it needs to be for the 99.99% of the uses.

> I also don't want to format have to create that thing either, so I don't
> see why we should be forced to dynamically create memory in order to format
> which a string only thing would force.

If it can't allocate or reallocate, then it must be prepared for failing in
case of buffer overrun. When formatting, you usually don't want that because
formatting can't be easily restarted from where it failed.


I think being prepared for failure is always a reality isn't it?
reallocation can fail, so?
but having to do memory allocation thing is the main issue here for me and string is all that.

if I have a C interface where the caller passes me an array that's large enough, why should I have to allocate memory to use it.
aren't I at risk of creating a string intensive api which is often a slow down for many apps?

just string isn't enough
 

Victor Zverovich

unread,
Jan 15, 2017, 10:52:03 AM1/15/17
to std-pr...@isocpp.org
 

Why can't we call that adaptor "std::string"?

We can use std::string but it will add extra copying and memory allocation when formatting anywhere else. For this reason the fmt library decouples buffer management and formatting.
 

Let's start with the good reason. It has to be good enough to overcome the
need to make the library function more complex for the 99.99% of the uses.

See above. Also the complexity added by making buffer management more generic is very small compared to the complexity of the actual formatting. And formatting to std::string, although one of the main use cases, from my experience is not anywhere close to 99%.
 

Thiago Macieira

unread,
Jan 15, 2017, 2:13:38 PM1/15/17
to std-pr...@isocpp.org
On domingo, 15 de janeiro de 2017 09:54:16 PST Magnus Fromreide wrote:
> I think that one primary motivation for a fmt type interface is to ease
> internationalization efforts. From there follows that one common output
> target would be a streambuf. Now, I know that streams and performance in the
> same sentence is dodgy at best but why make it even worse with an unneeded
> temporary std::string?

streambuf is fine, I guess. We need one output, not a template/concept.

Thiago Macieira

unread,
Jan 15, 2017, 2:17:05 PM1/15/17
to std-pr...@isocpp.org
On domingo, 15 de janeiro de 2017 03:00:51 PST gmis...@gmail.com wrote:
> I think being prepared for failure is always a reality isn't it?
> reallocation can fail, so?
> but having to do memory allocation thing is the main issue here for me and
> string is all that.

When a memory allocation fails, you get std::bad_alloc, which throws away the
entire attempt. So you don't attempt to continue, you just fail completely and
your out buffer contains unspecified contents.

Sized format functions like snprintf print as much as they can, then return it
saying they actually needed more. That's more complex.

> if I have a C interface where the caller passes me an array that's large
> enough, why should I have to allocate memory to use it.
> aren't I at risk of creating a string intensive api which is often a slow
> down for many apps?

Understood. I am saying that has to be weighed against the normal use-case
which is to output to a std::string. That has to be EASY to write and perform
really well.

> just string isn't enough


gmis...@gmail.com

unread,
Jan 15, 2017, 5:35:08 PM1/15/17
to ISO C++ Standard - Future Proposals
I don't know for sure, but if it took bets I suspect a template/concept is exactly what you want here.
Or at least one of the options.

I don't know what the template/concept would be, but at a minimum it would be a buffer template/concept.
I think we need this anyway. At a maximum maybe something like formatable or something is needed
that employed the former somehow. But I've no idea and I haven't looked at what cpp format does hardly.
But it seems to have such words there and I'm sure it has the feature set covered.
The issue to me is how much nicer/simpler it can be if it's in the standard and the standard can change to accommodate making it nicer.

If the end result looked anything less than easy to use or fast for string then I wouldn't object
to string having a special case that addressed this.

But I seems remiss if don't support these use cases:

extern "C" bool get_logfilename(char* buffer, int buffer_length)
{
   fixed_buffer fb(buffer, buffer_length), whatever);
   auto status = cpp::format(fb, whatever);
   return !status.failed();
}

std::pair<bool, std::size_t> get_logfilename(char* buffer, int buffer_length)
{
   fixed_buffer fb(buffer, buffer_length), whatever);
   auto format_status = cpp::format(fb, whatever);
   if (format_status.failed())
       return {false,{}};
   return {true, buffer, format_satus.format_length);
}

std::pair<bool, std::size_t> get_logfilename(char* buffer, int buffer_length)
{
   auto fb { std::make_fixed_buffer(buffer,, buffer_length) };
   auto format_status = cpp::format(fb, whatever);
   if (format_status.failed())
       return {false,{}};
   return {true, buffer, format_satus.format_length);
}
 
std::string get_logfilename()
{
   std::string logfilename

   strng_buffer sb(logfilename);
   auto format_status = cpp::format(sb, whatever);
   return logfilename;
}

So we have these adaptors that adapt types as needed, .e.g. string_buffer etc.
They model a buffer concept that would seem to require this:
is_fixed_size()
size()
resize()
resize_default_init(); // The boost/nicol proposal
capacity()

The fixed_buffer type provides an interface that models buffer
but allow writing into an arbitrary fixed size memory region.
A buffer that can't resize has is_fixed_size return true.

And then the other adaptors like string_buffer as needed but:
I don't see why std::unique_ptr can't also be invited to the party here. But if not, oh well.


But I see no reason why std::string, std::vector, std::array can't model a buffer directly but they don't have to.
If they did, wouldn't you be down to:

std::string get_logfilename()
{
   std::string logfilename

   auto format_status = cpp::format(logfilename, whatever);
   return logfilename;
}

I don't want anyone to get hung up on this particular interface. The main things
to me (in order of importance) is what the interface support these goals,
* we don't require memory allocation to format. (essential)
* we support types beyond string unless there is a good reason not to.
* we can format and avoid exceptions if we wish
* we can diagnose when something failed and why so we can then throw if it's important.
* we enable init/default initialized resize to be used here to help efficiency. it seems formatting can use this.

We need to know how certain formatting conditions fail and not just have only a failed bit to check.
IMO we need to know at least:
out of memory (malloc/new failure),
buffer full (i.e. wanted to resize() but is_fixed_size() said no),
bad format string,
bad argument

I think we need to maximally attempt to format as much as possible and not throw.
but format_status can contain an std::expected or something that exactly says the issue.
you can expect that but typically wouldn't.

But lets see evidence supports these positions.

All of this seems achievable and cpp format seems to provide a lot of this feature set.

To me it's just seeing if cpp format does support this needs and how and why not and then
seeing how the Standard needs to change to make formatting even easier/simpler than
whatever cpp format already does without that support today.


masse....@gmail.com

unread,
Jan 18, 2017, 5:10:47 PM1/18/17
to ISO C++ Standard - Future Proposals
Hi all,
I haven't read all the post, but from what I've read, i'd say there is 2 subject here:
- 1st Concatenating several string easily and efficiently
For doing so what i've done on my side is to implement an algorithm (called concat) able to do it.
It take as argument several strings in a container (can bean array, an initializer_list, an vector, whatever..) and return the concatenated string. The concatenation compute the required size so that it do only one allocation.
Note that I also implemented an variant able to add a separator between each std::dtring given in parameter.

- 2nd Be able to create a string from a predefined format
This is what printf is often used for. Having something similar in cpp would be nice. But I think this will require adding something on the std::string interface since I hardly see an algorithm being able to do that because it is something quite string-specific .

Just my 2 cents,
Masse Nicolas.

G M

unread,
Jan 18, 2017, 7:28:10 PM1/18/17
to std-pr...@isocpp.org
- 2nd Be able to create a string from a predefined format
This is what printf is often used for. Having something similar in cpp would be nice. But I think this will require adding something on the std::string interface since I hardly see an algorithm being able to do that because it is something quite string-specific . 

Just my 2 cents,
Masse Nicolas.


 
string exposes all that is needed to write or expand it and inquire it's size.
vector also exposes all these facilities.

So format doesn't need to be a member function of string to use these services to write data into a string,
And if it were a member of string, we'd not have the ability to use a vector even though vector offers the same services as string to access the internal array. so if vector offers the ability why shouldn't we use it.

The ability to write into a C array seems essential to me, both for writing C compatible libraries, and to avoid memory allocation. Once you want to avoid memory allocation, you may want to format into std::array also.
So supporting non memory allocating buffers is essential to my mind. And you can't do that if your format function is tied into a string.

It seems to me a buffer concept would unify these and allow format to work with anything and be no less or little less easy to use.

The cpp format has features that enable other containers to work. But I've yet to look into cpp format in detail. But if formatting gets into the standard as it surely must I'm sure it can make it even easier to use.

There may be good reasons not to go this route but I haven't heard anything that convinces me yet that this isn't the way to go. Having only a single string member function is a bad option to my mind. A member function of string and other options too, sure, if there's a good reason for it, but not the as the only option.

I note we have std::size() and std::data(). A question for the Committee, if we also had std::is_fixed_size() and std::resize() could we not make an std::format(container, ...) work with any of these types without anything concepty being required?


Edward Catmur

unread,
Jan 20, 2017, 5:58:48 AM1/20/17
to ISO C++ Standard - Future Proposals, gmis...@gmail.com
Formatting into a container is overly restrictive; one may also want to be able to format into a preallocated subrange (e.g. character positions 10-20 of a string) or to an output stream (console, file or network).

I think the more general concept is OutputRange, which could encapsulate e.g. a back inserter for string or vector (expandable storage), a pair of char pointers for an array, C array or subrange, or an ostreambuf_iterator for an output stream.

gmis...@gmail.com

unread,
Jan 20, 2017, 7:22:36 AM1/20/17
to ISO C++ Standard - Future Proposals, gmis...@gmail.com

I note we have std::size() and std::data(). A question for the Committee, if we also had std::is_fixed_size() and std::resize() could we not make an std::format(container, ...) work with any of these types without anything concepty being required?

Formatting into a container is overly restrictive; one may also want to be able to format into a preallocated subrange (e.g. character positions 10-20 of a string) or to an output stream (console, file or network).

I think the more general concept is OutputRange, which could encapsulate e.g. a back inserter for string or vector (expandable storage), a pair of char pointers for an array, C array or subrange, or an ostreambuf_iterator for an output stream.

I agree. I was experimenting today with an interface where you create a buffer object to format into that was also flushable.
A make buffer routine creates a wrapper object for a vector, array, c array, stream etc. type but exposing a common interface
of data() and resize and flush methods and also can_resize() and can_flush. and I passed that to my format routine.

The format routine just writes into the buffer extending the buffer if possible until a maximum size has been reached,
It the buffer advertised it can be flushed it is flushed and the process is repeated until the formatting is done.
If the buffer is not flushable or cannot be extended writing stops and there is an error.
An std::vector would not advertise it can flush and it's flush method would do nothing.
An std:;array would not advertise it can be extended and resizing would do nothing.
A stream might advertise itself as being a buffer that can be extended and flushed. or it may say its a fixed size but can be flushed.

vectors and strings and arrays are buffers, streams wrap buffers. so they can all provide a unified buffer interface.

I tried a little test out of that and it seemed to be workable. I just had one format routine and it was able to write to these types.
And also the buffer interface isn't format specific so any other type of append or whatever routine could use it.

ol...@join.cc

unread,
Jan 24, 2017, 10:56:00 AM1/24/17
to ISO C++ Standard - Future Proposals, gmis...@gmail.com


Op vrijdag 20 januari 2017 11:58:48 UTC+1 schreef Edward Catmur:

Formatting into a container is overly restrictive; one may also want to be able to format into a preallocated subrange (e.g. character positions 10-20 of a string) or to an output stream (console, file or network).

I think the more general concept is OutputRange, which could encapsulate e.g. a back inserter for string or vector (expandable storage), a pair of char pointers for an array, C array or subrange, or an ostreambuf_iterator for an output stream.

Sounds good, how would that interact with calls to reserve() and memcpy() optimizations (for larger strings) though? 

I don't think anything is stopping one from providing both variants.

Edward Catmur

unread,
Jan 28, 2017, 1:39:40 PM1/28/17
to std-pr...@isocpp.org, gmis...@gmail.com
That's a good point, and it's probably not enough (though theoretically correct) to say that exponential growth makes reserve() moot. What I'd like would be an OutputRange concept that allows creation of a contiguous memory range, probably via the & and += operators, so giving output iterators more of the hierarchy that input iterators demonstrate. Since a back inserter is aware of its container target, it can know whether the container is contiguous. For example:

auto it = back_inserter(s); // it models ContiguousOutputIterator
auto p = &*it; // returns s.data() + s.size()
it += 12; // performs s.resize(s.size() + 12, uninitialized)
memcpy(p, buf, 12);

olafv...@gmail.com

unread,
Jun 17, 2017, 3:30:06 AM6/17/17
to ISO C++ Standard - Future Proposals, olafv...@gmail.com
Op woensdag 28 december 2016 09:50:40 UTC+1 schreef Olaf van der Spek:
Hi,

One frequently needs to append stuff to strings, but the standard way
(s += "A" + "B" + to_string(42)) isn't optimal due to temporaries.
A variadic append() for std::string seems like the obvious solution.
It could support string_view, integers, maybe floats
but without formatting options..
It could even be extensible by calling append(s, t);

append(s, "A", "B", 42);

Would this be useful for the C++ std lib?

So I wrote some trivial functions and IMO it works nicely. The implementation of append might not be as smart as it could be but it's good enough for me and I really like the interface.

inline std::string& operator<<(std::string& a, std::string_view b)
{
return a += b;
}

inline std::string& operator<<(std::string& a, long long b)
{
return a += std::to_string(b);
}

inline void append(std::string&)
{
}

template<class T, class... A>
void append(std::string& s, const T& v, const A&... a)
{
s << v;
append(s, a...);
}

template<class... A>
std::string concat(const A&... a)
{
std::string s;
append(s, a...);
return s;
}

Tony V E

unread,
Jun 17, 2017, 6:42:13 PM6/17/17
to Standard Proposals
On Sat, Jan 14, 2017 at 9:31 PM, Thiago Macieira <thi...@macieira.org> wrote:
On sábado, 14 de janeiro de 2017 14:33:01 PST gmis...@gmail.com wrote:
> I was just in the process of replying to say that decoupling formatting
> from string would be great. And that seems to do that.
> I didn't see exactly how it's used but it seems (but I'm sure someone will
> correct me here) it would be nice if this worked:
>
> std::vector v;
> So fmt(v. "whatever");

Sorry, but.. why?

Why can't you just use std::string? Why does it need to be something
different?

I might want to format into a QString?  Without the copy in between?
 

The formatting code is likely to be big, so it's most likely not going to be
inline. Therefore, it can't be templated.


I agree there might be trade offs that need to be weighed.
I don't think we really know what they will be yet.
 
--
Be seeing you,
Tony
Reply all
Reply to author
Forward
0 new messages