Differences between C and C++

Juha Nieminen

unread,

Jul 25, 2022, 9:14:50 AM7/25/22

to

Some people often object if someone conflates C and C++, as if C++ were
just a pure superset of C, pointing out that they are, in fact, different
languages and, in some aspects, actually incompatible with each other.

That got me thinking: What are all the differences between the two
languages, in behavior and meaning, when it comes to their common syntax?
In other words, I'm not here talking about different keywords and different
syntax that compiles in one but not the other. I'm talking about code that
does compile as both C and C++, but will be interpreted or behave
differently depending on which (and this makes them incompatible).

Here are some of the things that come to mind. What other things are there?

1) A 'const' variable at the global scope will have external linkage by
default in C (unless explicitly made to have internal linkage with
'static'), but internal linkage by default in C++ (unless explicitly made
to have external linkage with 'extern').

2) The type of 'A' is int in C, but char in C++.

3) This one is really obscure: In C, this:

int f(int (*)(), double (*)[3]);
int f(int (*)(char *), double (*)[]);

is a valid function declaration, and equivalent to:

int f(int (*)(char *), double (*)[3]);

(This one is so obscure that I'm certain the vast majority of C programmers,
even experienced ones, would be surprised it even compiles. However, the
example is directly from the C standard, so it's pretty legit.)

In C++ the two first lines declare two different functions, taking different
types of parameter. Calling one is not the same thing as calling the other.

Alf P. Steinbach

unread,

Jul 25, 2022, 10:19:15 AM7/25/22

to

Yes, `void` argument list needed to say "no arguments" in C.

I guess the most crucial thing is that type punning via union is
well-defined in C, but UB in C++.

- Alf

Paavo Helde

unread,

Jul 25, 2022, 11:47:26 AM7/25/22

to

25.07.2022 16:14 Juha Nieminen kirjutas:
> Some people often object if someone conflates C and C++, as if C++ were
> just a pure superset of C, pointing out that they are, in fact, different
> languages and, in some aspects, actually incompatible with each other.
>
> That got me thinking: What are all the differences between the two
> languages, in behavior and meaning, when it comes to their common syntax?
> In other words, I'm not here talking about different keywords and different
> syntax that compiles in one but not the other. I'm talking about code that
> does compile as both C and C++, but will be interpreted or behave
> differently depending on which (and this makes them incompatible).

This little program outputs 0 when compiled as C, and 1 when compiled as
C++, at least with gcc 10.2. I'm not quite sure if this is meant to be
that way.

#include <stdio.h>
#include <stdlib.h>

int main() {

double x = -0.5;
int y = (int) 2*abs(x);
printf("%d\n", y);
}

Öö Tiib

unread,

Jul 25, 2022, 11:51:18 AM7/25/22

to

There are not too lot of little things in C that differ from C++ like that:
* C++ has lot of keywords that C does not have or has as macro (bool) or
typedef (wchar_t)
* logical expressions like 1 == 2 result with int (not bool)
* character literals like 'a' are of type int (not char)
* differences with const
* differences with inline
* differences with static
* C has VLAs
* differences in enum type size requirements

Those differences can cause same code to compile to different behaviour
in C and C++ silently but it is usually obscure, specially constructed code.

Ben Bacarisse

unread,

Jul 25, 2022, 12:14:23 PM7/25/22

to

Juha Nieminen <nos...@thanks.invalid> writes:

> Some people often object if someone conflates C and C++, as if C++ were
> just a pure superset of C, pointing out that they are, in fact, different
> languages and, in some aspects, actually incompatible with each other.
>
> That got me thinking: What are all the differences between the two
> languages, in behavior and meaning, when it comes to their common
> syntax?

Let's see...

"abc" is of type char * in C and of type const char * in C++.

Then there all the keyword differences. For example, in C, new is an
ordinary ID and in C++ restrict is an ordinary ID. There are quite a
few of these!

The rules for compatible pointer types are stronger in C++. Basically,
you can only add a top-level const when passing arguments in C.

In C++ function declarations, () means (void), but in C it means an
old-style function with unspecified arguments.

In C, at file scope, int x[]; is a "tentative definition" (which will
resolve to int x[] = {0}; if there are no further declarations of x) but
in C++ it's just an error.

In C, f( (int[]){1} ), calls f with a temporary array (it's a "compound
literal"), but that's forbidden in C++.

Some things that are compound literals in C /are/ valid in C++. For
example:

struct s { int v; };
...
f((struct s){1});

is ok in both, but add a pointer and you get the same difference as
above:

g(&(struct s){1}); // ok in C, not ok in C++

And now I'm out of time...

> In other words, I'm not here talking about different keywords and different
> syntax that compiles in one but not the other. I'm talking about code that
> does compile as both C and C++, but will be interpreted or behave
> differently depending on which (and this makes them incompatible).
>
> Here are some of the things that come to mind. What other things are there?
>
> 1) A 'const' variable at the global scope will have external linkage by
> default in C (unless explicitly made to have internal linkage with
> 'static'), but internal linkage by default in C++ (unless explicitly made
> to have external linkage with 'extern').
>
> 2) The type of 'A' is int in C, but char in C++.
>
> 3) This one is really obscure: In C, this:
>
> int f(int (*)(), double (*)[3]);
> int f(int (*)(char *), double (*)[]);
>
> is a valid function declaration, and equivalent to:
>
> int f(int (*)(char *), double (*)[3]);

Yes. This is C's rules for "composite types" in action.

> (This one is so obscure that I'm certain the vast majority of C programmers,
> even experienced ones, would be surprised it even compiles. However, the
> example is directly from the C standard, so it's pretty legit.)
>
> In C++ the two first lines declare two different functions, taking different
> types of parameter. Calling one is not the same thing as calling the
> other.

However, you can, in fact call the second f with a double (*)[] argument
because of C++'s rules about similar types (at least I think those are
the rules that apply here).

--
Ben.

Mut...@dastardlyhq.com

unread,

Jul 25, 2022, 12:20:12 PM7/25/22

to

On Mon, 25 Jul 2022 17:14:05 +0100
Ben Bacarisse <ben.u...@bsb.me.uk> wrote:

IOW mostly contorted syntax that was probably never intended to be used but
happens to be legal due to the way the C parser works.

Chris Vine

unread,

Jul 25, 2022, 12:45:40 PM7/25/22

to

One thing I don't think mentioned so far, for those who like
low-level twiddling of trivial types: in C, except for bit-fields all
objects are composed of contiguous sequences of one or more bytes,
which thereby comprise an array of bytes in C. In C++ trivially
copyable or standard layout types (ie C-like types) are required to be
comprised of contiguous bytes, but these do not comprise an "array" of
bytes. Hence you can use pointer arithmetic to access and/or modify the
bytes of object entities in C, and this is a fairly common practice for
some low-level work, but not in C++ (save that in C++, when within a
C-like entity you can at least successively increment a byte pointer by
one because in C++ every object, including a byte, can be treated as an
array of size 1).

One other related point is that in C a pointer to any narrow character
type is exempt from the strict aliasing rules, whereas in C++ pointers
to signed char are excluded from the exemption.

Paavo Helde

unread,

Jul 25, 2022, 12:49:46 PM7/25/22

to

25.07.2022 18:47 Paavo Helde kirjutas:
> 25.07.2022 16:14 Juha Nieminen kirjutas:
>> Some people often object if someone conflates C and C++, as if C++ were
>> just a pure superset of C, pointing out that they are, in fact, different
>> languages and, in some aspects, actually incompatible with each other.
>>

> #include <stdio.h>
> #include <stdlib.h>
>
> int main() {
>
>     double x = -0.5;
>     int y = (int) 2*abs(x);
>     printf("%d\n", y);
> }
>

This example can be actually made a bit simpler, messing with ints is
actually not needed. I think this is now similar to what I stomped on
myself in real code.

#include <stdio.h>
#include <stdlib.h>

int main() {

double x = -0.5;

double y = 2.0 * abs(x);
printf("%g\n", y);
}

Ben Bacarisse

unread,

Jul 25, 2022, 2:28:27 PM7/25/22

to

Mut...@dastardlyhq.com writes:

> On Mon, 25 Jul 2022 17:14:05 +0100
> Ben Bacarisse <ben.u...@bsb.me.uk> wrote:

<cut>

>> Some things that are compound literals in C /are/ valid in C++. For
>> example:
>>
>> struct s { int v; };
>> ...
>> f((struct s){1});
>>
>>is ok in both, but add a pointer and you get the same difference as
>>above:
>>
>> g(&(struct s){1}); // ok in C, not ok in C++
>
> IOW mostly contorted syntax that was probably never intended to be used but
> happens to be legal due to the way the C parser works.

Unlikely. It can be handy to be able to pass a temporary object by
"reference" (technically by pointer value). I very much doubt it was
never intended to be done. The C standard defines the object's
lifetime in a way that makes it both safe and useful.

--
Ben.

Ben Bacarisse

unread,

Jul 25, 2022, 2:38:24 PM7/25/22

to

I would have hoped for a warning about that! gcc gives me one (two, in
fact) if I ask for the kitchen sink:

warning: using integer absolute value function ‘abs’ when argument is of floating-point type ‘double’ [-Wabsolute-value]
7 | double y = 2.0 * abs(x);
| ^~~
warning: conversion from ‘double’ to ‘int’ may change value [-Wfloat-conversion]
7 | double y = 2.0 * abs(x);
| ^

--
Ben.

Keith Thompson

unread,

Jul 25, 2022, 3:21:37 PM7/25/22

to

Mut...@dastardlyhq.com writes:
> On Mon, 25 Jul 2022 17:14:05 +0100
> Ben Bacarisse <ben.u...@bsb.me.uk> wrote:

[...]

>>Some things that are compound literals in C /are/ valid in C++. For
>>example:
>>
>> struct s { int v; };
>> ...
>> f((struct s){1});
>>
>>is ok in both, but add a pointer and you get the same difference as
>>above:
>>
>> g(&(struct s){1}); // ok in C, not ok in C++
>
> IOW mostly contorted syntax that was probably never intended to be used but
> happens to be legal due to the way the C parser works.

Not at all.

In C, `(struct s){1}` is a compound literal, and it's an lvalue so
applying `&` to it is perfectly valid. There's nothing accidental about
it.

C++ doesn't have compound literals, but it has some other features (that
C doesn't) that make `f((struct s){1})` valid, but in C++ it's not an
lvalue, so `&(struct s){1}` is not valid.

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

Juha Nieminen

unread,

Jul 26, 2022, 2:10:07 AM7/26/22

to

Ben Bacarisse <ben.u...@bsb.me.uk> wrote:
> "abc" is of type char * in C and of type const char * in C++.

Actually "abc" is an array-of-char (or array-of-const-char in the latter
case). You can discern this by eg. printing what sizeof("12345") is
(quite a nice quiz question).

Mut...@dastardlyhq.com

unread,

Jul 26, 2022, 3:54:46 AM7/26/22

to

On Mon, 25 Jul 2022 12:21:21 -0700
Keith Thompson <Keith.S.T...@gmail.com> wrote:
>Mut...@dastardlyhq.com writes:
>> On Mon, 25 Jul 2022 17:14:05 +0100
>> Ben Bacarisse <ben.u...@bsb.me.uk> wrote:
>[...]
>>>Some things that are compound literals in C /are/ valid in C++. For
>>>example:
>>>
>>> struct s { int v; };
>>> ...
>>> f((struct s){1});
>>>
>>>is ok in both, but add a pointer and you get the same difference as
>>>above:
>>>
>>> g(&(struct s){1}); // ok in C, not ok in C++
>>
>> IOW mostly contorted syntax that was probably never intended to be used but
>> happens to be legal due to the way the C parser works.
>
>Not at all.
>
>In C, `(struct s){1}` is a compound literal, and it's an lvalue so
>applying `&` to it is perfectly valid. There's nothing accidental about
>it.

I'm not saying the syntax was accidental, just the fact that this particular
construct also works. I've never yet seen anyone cast literal values to a
struct in a function call.

>C++ doesn't have compound literals, but it has some other features (that
>C doesn't) that make `f((struct s){1})` valid, but in C++ it's not an
>lvalue, so `&(struct s){1}` is not valid.

Certainly doesn't work with Clang though you'd think if the same compiler
can do it in C mode it should do it in C++ mode if there are no side effects
of doing so.

Juha Nieminen

unread,

Jul 26, 2022, 4:00:40 AM7/26/22

to

Mut...@dastardlyhq.com wrote:
>>C++ doesn't have compound literals, but it has some other features (that
>>C doesn't) that make `f((struct s){1})` valid, but in C++ it's not an
>>lvalue, so `&(struct s){1}` is not valid.
>
> Certainly doesn't work with Clang though you'd think if the same compiler
> can do it in C mode it should do it in C++ mode if there are no side effects
> of doing so.

If the C++ standard doesn't consider it valid, then the compiler shouldn't
consider it valid (C++) either, even if it just so happens to be valid C.

Mut...@dastardlyhq.com

unread,

Jul 26, 2022, 4:09:14 AM7/26/22

to

Is that specifically states its not valid or simply doesn't mention it.

Most C++ compilers will compile some C constructs which are not technically
legal in C++ such as non consts pointing to string literals, variable length
arrays and variadic macros.

Öö Tiib

unread,

Jul 26, 2022, 5:42:17 AM7/26/22

to

It is then clearly said in documentation of such compilers. Also most
compilers provide standard compliant mode (like -pedantic of gcc) that
issues warnings about extensions used in code (and standard mandates
nothing else).

Mut...@dastardlyhq.com

unread,

Jul 26, 2022, 11:29:57 AM7/26/22

to

And? Juha said:

"If the C++ standard doesn't consider it valid, then the compiler shouldn't
consider it valid (C++) either"

Clearly plenty of compilers do consider non standard C++ valid.

Scott Lurndal

unread,

Jul 26, 2022, 12:31:15 PM7/26/22

to

However, those compilers can be configured to reject non-standard C++
if the programmer requirements demand. Any compiler is allowed to
accept additional features at their discretion.

Keith Thompson

unread,

Jul 26, 2022, 2:59:37 PM7/26/22

to

Plenty of C++ compilers accept non standard C++ *in certain modes*.

The C++ standard requires certain errors to be diagnosed. A C++
compiler that fails to diagnose, for example, an attempt to use a VLA
is not a conforming compiler.

Most C++ compilers are non-conforming by default, quietly accepting
extensions that are incompatible with the standard. All conforming C++
compilers provide a way to issue all required diagnostics.

Mut...@dastardlyhq.com

unread,

Jul 27, 2022, 3:52:22 AM7/27/22

to

On Tue, 26 Jul 2022 11:59:21 -0700
Keith Thompson <Keith.S.T...@gmail.com> wrote:
>Mut...@dastardlyhq.com writes:
>> On Tue, 26 Jul 2022 02:42:09 -0700 (PDT)
>> =?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
>>>On Tuesday, 26 July 2022 at 11:09:14 UTC+3, Mut...@dastardlyhq.com wrote:
>>>> On Tue, 26 Jul 2022 08:00:25 -0000 (UTC)
>>>> Juha Nieminen <nos...@thanks.invalid> wrote:
>>>> >Mut...@dastardlyhq.com wrote:
>>>> >>>C++ doesn't have compound literals, but it has some other features
>(that
>>>> >>>C doesn't) that make `f((struct s){1})` valid, but in C++ it's not an
>>>> >>>lvalue, so `&(struct s){1}` is not valid.
>>>> >>
>>>> >> Certainly doesn't work with Clang though you'd think if the same
>compiler

:
:

>> And? Juha said:
>>
>> "If the C++ standard doesn't consider it valid, then the compiler shouldn't
>> consider it valid (C++) either"
>>
>> Clearly plenty of compilers do consider non standard C++ valid.
>
>Plenty of C++ compilers accept non standard C++ *in certain modes*.
>
>The C++ standard requires certain errors to be diagnosed. A C++
>compiler that fails to diagnose, for example, an attempt to use a VLA
>is not a conforming compiler.
>
>Most C++ compilers are non-conforming by default, quietly accepting
>extensions that are incompatible with the standard. All conforming C++
>compilers provide a way to issue all required diagnostics.

I'm aware of all this. My point is why are some C semantics considered
compilable (albeit with warnings) but the original example of &(struct s){1}
isn't and causes an error even though its perfectly legal in C.

Öö Tiib

unread,

Jul 27, 2022, 4:56:31 AM7/27/22

to

It is because while the compilers implemented extension of supporting
compound literals almost like in C, the result in those extensions is
temporary whose lifetime will last only to end of full-expression. Taking
address of temporary is illegal and so &(struct s){1} is illegal.

Why they did not implement the semantics fully like in C is because they
used it like kind of alternative syntax sugar to list-initialization. Since
C++14 however they can not heap-allocate initializer-lists (and did not
do it even before C++14). So the compound literal extension is temporary
for ease of implementation. Some of compilers that do it are open source
so if you want you can perhaps implement more close to C extension
to those.

Mut...@dastardlyhq.com

unread,

Jul 27, 2022, 10:51:30 AM7/27/22

to

On Wed, 27 Jul 2022 01:56:23 -0700 (PDT)
=?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
>On Wednesday, 27 July 2022 at 10:52:22 UTC+3, Mut...@dastardlyhq.com wrote:
>> >Most C++ compilers are non-conforming by default, quietly accepting
>> >extensions that are incompatible with the standard. All conforming C++
>> >compilers provide a way to issue all required diagnostics.
>> I'm aware of all this. My point is why are some C semantics considered
>> compilable (albeit with warnings) but the original example of &(struct s){1}
>
>> isn't and causes an error even though its perfectly legal in C.
>
>It is because while the compilers implemented extension of supporting
>compound literals almost like in C, the result in those extensions is
>temporary whose lifetime will last only to end of full-expression. Taking

The expressions lifetime - and hence the temporary shouldn't end until the
function being called returns. It should remain on the stack of the calling
function which is probably what happens in C.

>Why they did not implement the semantics fully like in C is because they
>used it like kind of alternative syntax sugar to list-initialization. Since
>C++14 however they can not heap-allocate initializer-lists (and did not
>do it even before C++14). So the compound literal extension is temporary
>for ease of implementation. Some of compilers that do it are open source
>so if you want you can perhaps implement more close to C extension
>to those.

Why? Since they're also already C compilers the code to do it already exists
inside them and is probably a case of setting an internal flag to call it.

Ben Bacarisse

unread,

Jul 27, 2022, 12:34:02 PM7/27/22

to

Yes, I should have added "converts to in most contexts" because that's
usually how the difference gets spotted. To be clear, the string
literal above has type char[4] in C and const char[4] in C++.

--
Ben.

Paul N

unread,

Jul 27, 2022, 1:09:29 PM7/27/22

to

On Monday, July 25, 2022 at 2:14:50 PM UTC+1, Juha Nieminen wrote:
> Some people often object if someone conflates C and C++, as if C++ were
> just a pure superset of C, pointing out that they are, in fact, different
> languages and, in some aspects, actually incompatible with each other.
>
> That got me thinking: What are all the differences between the two
> languages, in behavior and meaning, when it comes to their common syntax?
> In other words, I'm not here talking about different keywords and different
> syntax that compiles in one but not the other. I'm talking about code that
> does compile as both C and C++, but will be interpreted or behave
> differently depending on which (and this makes them incompatible).

As I understand it, the two languages have quite different philosophies. In BCPL there is a stated aim of eliminating hidden overhead. There seems nothing in the history of C to suggest that this aim was dropped, certainly B was developed on a very spartan machine. In contrast the aim in C++ is to try to hide the underlying details.

For example, in C, a = b + c is a straight-forward addition, though depending on the types it might be a floating-point addition or it might involve a hidden multiplication by a pointer size. In C++, a = b + c could be anything - for instance, it might concatenate two files to form a third one, involving thousands of disk accesses.

Bo Persson

unread,

Jul 27, 2022, 1:29:03 PM7/27/22

to

On 2022-07-27 at 19:09, Paul N wrote:
> On Monday, July 25, 2022 at 2:14:50 PM UTC+1, Juha Nieminen wrote:
>> Some people often object if someone conflates C and C++, as if C++ were
>> just a pure superset of C, pointing out that they are, in fact, different
>> languages and, in some aspects, actually incompatible with each other.
>>
>> That got me thinking: What are all the differences between the two
>> languages, in behavior and meaning, when it comes to their common syntax?
>> In other words, I'm not here talking about different keywords and different
>> syntax that compiles in one but not the other. I'm talking about code that
>> does compile as both C and C++, but will be interpreted or behave
>> differently depending on which (and this makes them incompatible).
>
> As I understand it, the two languages have quite different philosophies. In BCPL there is a stated aim of eliminating hidden overhead. There seems nothing in the history of C to suggest that this aim was dropped, certainly B was developed on a very spartan machine. In contrast the aim in C++ is to try to hide the underlying details.

Yes, we call that abstraction. :-)

Bjarne uses an onion as his metafor. We don't want to see all the way to
the core all the time.

>
> For example, in C, a = b + c is a straight-forward addition, though depending on the types it might be a floating-point addition or it might involve a hidden multiplication by a pointer size. In C++, a = b + c could be anything - for instance, it might concatenate two files to form a third one, involving thousands of disk accesses.

I have never understood why this is a problem. In C you can write a =
add(b, c) and it can do anything. How is that easier to see through than
using the + operator?

Keith Thompson

unread,

Jul 27, 2022, 2:11:48 PM7/27/22

to

Because they're not also already C compilers. A C and C++ compiler
might share the same backend and a lot of other infrastructure, but the
front ends are going to be separate.

Paul N

unread,

Jul 27, 2022, 2:43:48 PM7/27/22

to

As I understand it, the point is that in C you can see that a = add(b, c) is calling a function that may or may not be long-winded. And you can be fairly confident that something simple like a = b + c is something quick.

In C++ you get the advantage that you can simplify all sorts of stuff by over-loading operators. The downside (or trade-off) is that you can't tell at a glance whether something is quick or not, and it is a problem *if* you wrongly assume that it will be. It's a different language, with different trade-offs, some of which might be a problem.

Juha Nieminen

unread,

Jul 28, 2022, 2:53:42 AM7/28/22

to

Paul N <gw7...@aol.com> wrote:
> For example, in C, a = b + c is a straight-forward addition, though depending on the types it might be a floating-point addition or it might involve a hidden multiplication by a pointer size. In C++, a = b + c could be anything - for instance, it might concatenate two files to form a third one, involving thousands of disk accesses.

If the types of a, b and c are the same as in the C code, then it cannot
"be anything". I'm not aware of operator overloading being possible for
basic types in C++.

As you say, even in C, if you don't know what the types are, you don't
know what operation that is actually doing. Could be pointer arithmetic
for all we know. Knowing the types is kind of crucial to understand
what's being done there.

Mut...@dastardlyhq.com

unread,

Jul 28, 2022, 4:03:33 AM7/28/22

to

On Wed, 27 Jul 2022 19:28:15 +0200
Bo Persson <b...@bo-persson.se> wrote:
>On 2022-07-27 at 19:09, Paul N wrote:
>> On Monday, July 25, 2022 at 2:14:50 PM UTC+1, Juha Nieminen wrote:
>>> Some people often object if someone conflates C and C++, as if C++ were
>>> just a pure superset of C, pointing out that they are, in fact, different
>>> languages and, in some aspects, actually incompatible with each other.
>>>
>>> That got me thinking: What are all the differences between the two
>>> languages, in behavior and meaning, when it comes to their common syntax?
>>> In other words, I'm not here talking about different keywords and different
>>> syntax that compiles in one but not the other. I'm talking about code that
>>> does compile as both C and C++, but will be interpreted or behave
>>> differently depending on which (and this makes them incompatible).
>>
>> As I understand it, the two languages have quite different philosophies. In
>BCPL there is a stated aim of eliminating hidden overhead. There seems nothing
>in the history of C to suggest that this aim was dropped, certainly B was
>developed on a very spartan machine. In contrast the aim in C++ is to try to
>hide the underlying details.
>
>Yes, we call that abstraction. :-)
>
>Bjarne uses an onion as his metafor. We don't want to see all the way to
>the core all the time.

Unfortunately C++ doesn't hide it well enough because you need to understand
whats happening all the way to the core or you'll end up with unexpected
results - eg implicit conversions, deep vs shallow copies.

Öö Tiib

unread,

Jul 28, 2022, 4:58:02 AM7/28/22

to

C++ does dictate nothing about paradigms and idioms we follow or ignore.
Any fundamental type or aggregate can be wrapped into class and all unsafe
operations or conversions can be made safe or not exposed in interface.
The standard library does it quite a lot. Unfortunately some programmers do
not understand difference between freedom and anarchy well enough, but
there is always freedom to not cooperate with them.

Mut...@dastardlyhq.com

unread,

Jul 28, 2022, 5:04:22 AM7/28/22

to

Sometimes the STL doesn't help itself. The disaster that was auto_ptr caught
out a lot of people since - quite reasonably - they expected something that
was in the STL to work well with STL containers.

Richard Damon

unread,

Jul 28, 2022, 7:34:06 AM7/28/22

to

The difference is that in C, if you see in the code a + b, then you KNOW
that a and b need to be primitive types and the operation will be just a
couple of operations long (the WORSE it can be is a floating point
addition).

In C++, we don't have that knowledge, and we need to think about what
the types actually are and what that means for the code.

C allows you to more lightly skim code to see what it might be
happening, C++ requires you to know a bit more about the code, but lets
the code express higher level operations compactly.

Bo Persson

unread,

Jul 28, 2022, 9:04:29 AM7/28/22

to

On 2022-07-28 at 13:33, Richard Damon wrote:
> On 7/28/22 2:53 AM, Juha Nieminen wrote:
>> Paul N <gw7...@aol.com> wrote:
>>> For example, in C, a = b + c is a straight-forward addition, though
>>> depending on the types it might be a floating-point addition or it
>>> might involve a hidden multiplication by a pointer size. In C++, a =
>>> b + c could be anything - for instance, it might concatenate two
>>> files to form a third one, involving thousands of disk accesses.
>>
>> If the types of a, b and c are the same as in the C code, then it cannot
>> "be anything". I'm not aware of operator overloading being possible for
>> basic types in C++.
>>
>> As you say, even in C, if you don't know what the types are, you don't
>> know what operation that is actually doing. Could be pointer arithmetic
>> for all we know. Knowing the types is kind of crucial to understand
>> what's being done there.
>
> The difference is that in C, if you see in the code a + b, then you KNOW
> that a and b need to be primitive types and the operation will be just a
> couple of operations long (the WORSE it can be is a floating point
> addition).
>
> In C++, we don't have that knowledge, and we need to think about what
> the types actually are and what that means for the code.

Yes, but if I have

std::string full_name = first_name + last_name;

I wouldn't expect a floating point addition.

>
> C allows you to more lightly skim code to see what it might be
> happening, C++ requires you to know a bit more about the code, but lets
> the code express higher level operations compactly.

The abstraction part is that you shouldn't *have* to consider what
happens down to the hardware level - not right now. Rather you should
trust that the next level - which you wrote yesterday - works just as
well as if you were to write it now.

Malcolm McLean

unread,

Jul 28, 2022, 9:17:03 AM7/28/22

to

The issue is that if in C, you have an object that supports arithmetical
operations, but isn't a built-in type, you have to write functions like
complex_mul() to implement those operations. It can then be very difficult
to read the expressions and check them for correctness.

C++ operator overloading overcomes this problem, but the cost is that it
isn't obvious that user code is being called. It's also prone to abuse.

Mut...@dastardlyhq.com

unread,

Jul 28, 2022, 11:11:41 AM7/28/22

to

Unfortunately even in C++ 2020 that won't compile unless first_name is a
string and won't work if its char* due to precendence rules. It would be nice
if a solution was found so C++ did a silent conversion of char* in this sort of
situation instead of having to do std::string(first_name) which is ugly and
inefficient. Maybe something like:

std::string (s = "hello") + "world";

Paavo Helde

unread,

Jul 28, 2022, 1:03:57 PM7/28/22

to

28.07.2022 18:11 Mut...@dastardlyhq.com kirjutas:
> On Thu, 28 Jul 2022 15:04:12 +0200
> Bo Persson <b...@bo-persson.se> wrote:
>
>> std::string full_name = first_name + last_name;
>>
>> I wouldn't expect a floating point addition.
>
> Unfortunately even in C++ 2020 that won't compile unless first_name is a
> string and won't work if its char* due to precendence rules.

Why on earth are you using char* in a C++ program? The only thing they
bring is problems and slowdowns (because string length needs to be
re-calculated all the time).

#include <string>
using namespace std::string_literals;

int main() {
auto first_name = "Bernhard"s, last_name = "Shaw"s;

Juha Nieminen

unread,

Jul 29, 2022, 3:57:06 AM7/29/22

to

Paavo Helde <ees...@osa.pri.ee> wrote:
> Why on earth are you using char* in a C++ program? The only thing they
> bring is problems and slowdowns (because string length needs to be
> re-calculated all the time).

Incorrect. In many situations using char* "strings" is more efficient than
using std::string because of the elided dynamic memory allocation. Also,
not in all situations is the length of the string needed to be separately
calculated in order to operate with the string, as many operations do not
need to know the length in advance, and can just stop when they encounter
the null byte. (For example printing a char* string doesn't need to know
its length in advance. Many other operations likewise.)

If you have some function void foo(const std::string&), and you wanted
to call that function by giving it a string literal, how is that going
to be more (or even equally) efficient than if it were foo(const char*)?
The string literal would be needlessly copied into a dynamically
allocated memory block (and immediately deleted after the function returns).

There are also some situations where, for example, a class needs a small
string as a member variable (which maximum length, or even outright length,
is fixed and very small). Using a char array is usually much more efficient
for this than a std::string, and operating on this small string will be
more efficient with a char* than creating a std::string every time.
(If in this example the length of the string would vary and there are
tons of situations where the length is needed, you could simply add
a length member variable and use that. Still a lot more efficient than
using std::string because you are not dynamically allocating memory.)

Paavo Helde

unread,

Jul 29, 2022, 5:03:29 AM7/29/22

to

29.07.2022 10:56 Juha Nieminen kirjutas:
> Paavo Helde <ees...@osa.pri.ee> wrote:
>> Why on earth are you using char* in a C++ program? The only thing they
>> bring is problems and slowdowns (because string length needs to be
>> re-calculated all the time).
>
> Incorrect. In many situations using char* "strings" is more efficient than
> using std::string because of the elided dynamic memory allocation. Also,
> not in all situations is the length of the string needed to be separately
> calculated in order to operate with the string, as many operations do not
> need to know the length in advance, and can just stop when they encounter
> the null byte. (For example printing a char* string doesn't need to know
> its length in advance. Many other operations likewise.)

Yes, in some situations char* strings can be more efficient. But these
situations are pretty rare, and in order to outweigh the drawbacks the
performance benefits must be pretty large. Plus, on that level of
performance tuning, one needs to really know what one is doing, which Mr
Muttley quite clearly does not.

>
> If you have some function void foo(const std::string&), and you wanted
> to call that function by giving it a string literal, how is that going
> to be more (or even equally) efficient than if it were foo(const char*)?
> The string literal would be needlessly copied into a dynamically
> allocated memory block (and immediately deleted after the function returns).

First, if I have a std::string literal and pass it to foo(const
std::string&), then there is no copies, just a reference is passed. If I
have a char* literal, then there is indeed strlen+copy, that's what I
was talking about. I think your example actually proves my point.

Second, for such functions it would be more appropriate to declare them
as taking a std::string_view parameter. For a string_view, you can also
pass either a std::string or a char* literal, and there will be no
copies made, just an extra strlen() in case of char* literal.

>
> There are also some situations where, for example, a class needs a small
> string as a member variable (which maximum length, or even outright length,
> is fixed and very small). Using a char array is usually much more efficient
> for this than a std::string, and operating on this small string will be
> more efficient with a char* than creating a std::string every time.
> (If in this example the length of the string would vary and there are
> tons of situations where the length is needed, you could simply add
> a length member variable and use that. Still a lot more efficient than
> using std::string because you are not dynamically allocating memory.)

Nowadays std::string is typically using small string optimization,
meaning that for sufficiently short strings there is no dynamic
allocation either.

Mut...@dastardlyhq.com

unread,

Jul 29, 2022, 5:22:24 AM7/29/22

to

On Thu, 28 Jul 2022 20:03:41 +0300
Paavo Helde <ees...@osa.pri.ee> wrote:
>28.07.2022 18:11 Mut...@dastardlyhq.com kirjutas:
>> On Thu, 28 Jul 2022 15:04:12 +0200
>> Bo Persson <b...@bo-persson.se> wrote:
>>
>>> std::string full_name = first_name + last_name;
>>>
>>> I wouldn't expect a floating point addition.
>>
>> Unfortunately even in C++ 2020 that won't compile unless first_name is a
>> string and won't work if its char* due to precendence rules.
>
>Why on earth are you using char* in a C++ program? The only thing they

Is that supposed to be a serious question?

>bring is problems and slowdowns (because string length needs to be
>re-calculated all the time).

Depends on what you're doing with it. You do realise char* don't always
just point to a text string?

>#include <string>
>using namespace std::string_literals;
>
>int main() {
> auto first_name = "Bernhard"s, last_name = "Shaw"s;
> std::string full_name = first_name + last_name;

Hmm, I wonder whats more efficient. Counting the length of 2 short strings or
creating 2 objects...

Mut...@dastardlyhq.com

unread,

Jul 29, 2022, 5:29:09 AM7/29/22

to

On Fri, 29 Jul 2022 12:03:12 +0300
Paavo Helde <ees...@osa.pri.ee> wrote:
>29.07.2022 10:56 Juha Nieminen kirjutas:
>> Paavo Helde <ees...@osa.pri.ee> wrote:
>>> Why on earth are you using char* in a C++ program? The only thing they
>>> bring is problems and slowdowns (because string length needs to be
>>> re-calculated all the time).
>>
>> Incorrect. In many situations using char* "strings" is more efficient than
>> using std::string because of the elided dynamic memory allocation. Also,
>> not in all situations is the length of the string needed to be separately
>> calculated in order to operate with the string, as many operations do not
>> need to know the length in advance, and can just stop when they encounter
>> the null byte. (For example printing a char* string doesn't need to know
>> its length in advance. Many other operations likewise.)
>
>Yes, in some situations char* strings can be more efficient. But these
>situations are pretty rare, and in order to outweigh the drawbacks the
>performance benefits must be pretty large. Plus, on that level of
>performance tuning, one needs to really know what one is doing, which Mr
>Muttley quite clearly does not.

You're in no position to be patronising my friend since you seem to be
unaware of basic char* use cases and how they work.

>First, if I have a std::string literal and pass it to foo(const
>std::string&), then there is no copies, just a reference is passed. If I
>have a char* literal, then there is indeed strlen+copy, that's what I
>was talking about. I think your example actually proves my point.

If you pass a char* there are no copies either. You ever heard of pointers?

>Nowadays std::string is typically using small string optimization,
>meaning that for sufficiently short strings there is no dynamic
>allocation either.

Moving the stack pointer to allocate stack memory takes non zero time compared
to using a raw pointer which is already set.

Paavo Helde

unread,

Jul 29, 2022, 6:54:06 AM7/29/22

to

Earlier you complained that you need to needlessly write
std::string(first_name). When I showed you a way how to not write it,
and produce faster code, you still complain. About the performance, this
time. Well, that's not wise from you because performance can be measured
quite easily. Here is a demo showing that using std::string literals is
more than twice faster than char* literals. Feel free to test it by
yourself.

#include <iostream>
#include <string>
#include <chrono>
using namespace std::string_literals;

const int N = 10'000'000;

size_t test1(const char* first_name, const char* last_name) {
size_t result = 0;
for (int i = 0; i < N; ++i) {
std::string full_name = std::string(first_name) + last_name;
result += full_name.length();
}
return result;
}

size_t test2(const std::string& first_name, const std::string& last_name) {
size_t result = 0;
for (int i = 0; i < N; ++i) {

std::string full_name = first_name + last_name;

result += full_name.length();
}
return result;
}

int main() {
auto start1 = std::chrono::steady_clock::now();
size_t n1 = test1("Hello", "World");
auto finish1 = std::chrono::steady_clock::now();

auto start2 = std::chrono::steady_clock::now();
size_t n2 = test2("Hello"s, "World"s);
auto finish2 = std::chrono::steady_clock::now();

std::cout << "const char* literal: " <<
std::chrono::duration_cast<std::chrono::nanoseconds>(finish1 - start1)/N
<< "\n";
std::cout << "std::string literal: " <<
std::chrono::duration_cast<std::chrono::nanoseconds>(finish2 - start2)/N
<< "\n";

return int(n1 - n2);
}

Output (MSVC2019 x64 Release mode):

const char* literal: 25ns
std::string literal: 11ns

Paavo Helde

unread,

Jul 29, 2022, 7:13:18 AM7/29/22

to

29.07.2022 13:53 Paavo Helde kirjutas:

OK, I have to admit my last demo did not really test literals. Here is
another variant which tests them more directly. std::string literals
still are a bit faster than char*, but only marginally this time. The
point is they are not slower, so there is no need to avoid using them.

#include <iostream>
#include <string>
#include <chrono>
using namespace std::string_literals;

const int N = 10'000'000;

size_t test1() {

size_t result = 0;
for (int i = 0; i < N; ++i) {

std::string full_name = std::string("Hello") + "World";

result += full_name.length();
}
return result;
}

size_t test2() {

size_t result = 0;
for (int i = 0; i < N; ++i) {

std::string full_name = "Hello"s + "World"s;

result += full_name.length();
}
return result;
}

int main() {
auto start1 = std::chrono::steady_clock::now();

size_t n1 = test1();

auto finish1 = std::chrono::steady_clock::now();

auto start2 = std::chrono::steady_clock::now();

size_t n2 = test2();

auto finish2 = std::chrono::steady_clock::now();

std::cout << "const char* literal: " <<
std::chrono::duration_cast<std::chrono::nanoseconds>(finish1 - start1)/N
<< "\n";
std::cout << "std::string literal: " <<
std::chrono::duration_cast<std::chrono::nanoseconds>(finish2 - start2)/N
<< "\n";

return int(n1 - n2);
}

const char* literal: 25ns
std::string literal: 24ns

Öö Tiib

unread,

Jul 29, 2022, 8:16:21 AM7/29/22

to

On Friday, 29 July 2022 at 10:57:06 UTC+3, Juha Nieminen wrote:
> Paavo Helde <ees...@osa.pri.ee> wrote:
> > Why on earth are you using char* in a C++ program? The only thing they
> > bring is problems and slowdowns (because string length needs to be
> > re-calculated all the time).
>
> Incorrect. In many situations using char* "strings" is more efficient than
> using std::string because of the elided dynamic memory allocation. Also,
> not in all situations is the length of the string needed to be separately
> calculated in order to operate with the string, as many operations do not
> need to know the length in advance, and can just stop when they encounter
> the null byte. (For example printing a char* string doesn't need to know
> its length in advance. Many other operations likewise.)

In such cases the cost may be hidden from eye in C++ because of legacy
implicit conversion from char [const] * to std::string. If performance matters
then I suggest to use std::string_view that knows length, has no dynamic
allocations, does no implicit conversions or copies, can be constexpr. All
three issues removed and bonus. The code will look less elegant but will
usually beat both std::string and char* in performance. That was the
preposition that it matters.

> If you have some function void foo(const std::string&), and you wanted
> to call that function by giving it a string literal, how is that going
> to be more (or even equally) efficient than if it were foo(const char*)?
> The string literal would be needlessly copied into a dynamically
> allocated memory block (and immediately deleted after the function returns).

You can write overload that takes std::string_view when performance
matters.

> There are also some situations where, for example, a class needs a small
> string as a member variable (which maximum length, or even outright length,
> is fixed and very small). Using a char array is usually much more efficient
> for this than a std::string, and operating on this small string will be
> more efficient with a char* than creating a std::string every time.

The std string is typically 32 bytes for cache friendliness and contains very
small strings consisting of up to 23 characters locally as small string
optimisation so it must be quite a case. But of course use
std::array<char, 16> if that suits you better. That is orthogonal to char*.

> (If in this example the length of the string would vary and there are
> tons of situations where the length is needed, you could simply add
> a length member variable and use that. Still a lot more efficient than
> using std::string because you are not dynamically allocating memory.)

Yes. Profile and when it matters do whatever. No silver bullets.

Juha Nieminen

unread,

Jul 29, 2022, 8:27:58 AM7/29/22

to

Paavo Helde <ees...@osa.pri.ee> wrote:
>> If you have some function void foo(const std::string&), and you wanted
>> to call that function by giving it a string literal, how is that going
>> to be more (or even equally) efficient than if it were foo(const char*)?
>> The string literal would be needlessly copied into a dynamically
>> allocated memory block (and immediately deleted after the function returns).
>
> First, if I have a std::string literal and pass it to foo(const
> std::string&), then there is no copies, just a reference is passed. If I
> have a char* literal, then there is indeed strlen+copy, that's what I
> was talking about. I think your example actually proves my point.

How exactly do you create "a std::string literal" that doesn't involve
creating a normal std::string object and giving it a C string literal,
which will cause it to allocate memory and (needlessly) copy the contents
of the C string literal into it (something we were trying to avoid here
in the first place)?

> Second, for such functions it would be more appropriate to declare them
> as taking a std::string_view parameter. For a string_view, you can also
> pass either a std::string or a char* literal, and there will be no
> copies made, just an extra strlen() in case of char* literal.

I doubt you can create a std::string object from a string_view without
incurring the exact same operations and penalties as if you were using
a char* literal (ie. allocating memory, needlessly copying the contents
of the string_view into it).

> Nowadays std::string is typically using small string optimization,
> meaning that for sufficiently short strings there is no dynamic
> allocation either.

I'm not sure how common that is. And even if it is, you are still incurring
a penalty that doesn't exist for const char*'s (namely, conditionals that
the std::string code has to do in order to check what kind of storage
it's using, which it has to do in every single member function).

You were talking about "slowdowns", so let's talk about slowdowns.

Öö Tiib

unread,

Jul 29, 2022, 8:44:06 AM7/29/22

to

On Friday, 29 July 2022 at 15:27:58 UTC+3, Juha Nieminen wrote:
>
> You were talking about "slowdowns", so let's talk about slowdowns.

Yes lets consider:

#include <iostream>
#include <string>
#include <string_view>
using namespace std::literals;

int main() {
// we have fully compile time literals in language
constexpr auto first_name = "Bernhard"sv, last_name = "Shaw"sv;

// useful with (bit inconvenient semantics) in std::string operations
std::string full_name{first_name};
full_name += last_name;

// useful everywhere else
std::cout << first_name << last_name << " " << full_name << '\n';
}

So why we need char* ?

Mut...@dastardlyhq.com

unread,

Jul 29, 2022, 10:11:30 AM7/29/22

to

On Fri, 29 Jul 2022 13:53:49 +0300
Paavo Helde <ees...@osa.pri.ee> wrote:
>Earlier you complained that you need to needlessly write
>std::string(first_name). When I showed you a way how to not write it,
>and produce faster code, you still complain. About the performance, this

I never mentioned performance, you did. I just said it would be nice if C++
did hidden convertions of char* to a string in that particular example. Though
that might actually be useful so the steering committee wouldn't be interested.

>time. Well, that's not wise from you because performance can be measured
>quite easily. Here is a demo showing that using std::string literals is
>more than twice faster than char* literals. Feel free to test it by
>yourself.

fenris$ c++ -std=c++17 t.cc
t.cc:36:46: error: invalid operands to binary expression ('basic_ostream<char,
std::__1::char_traits<char> >' and 'typename
[snip pages of warnings]
operator<<(basic_ostream<_CharT, _Traits>& __os, unique_ptr<_Yp, _Dp> co...
^
t.cc:39:46: error: invalid operands to binary expression ('basic_ostream<char,
std::__1::char_traits<char> >' and 'typename
[more snipping]
2 errors generated.
fenris$ c++ -v
Apple clang version 11.0.3 (clang-1103.0.32.62)
Target: x86_64-apple-darwin21.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

So that went well.

Anyway, its easy to pick examples where char* would be faster than string
and vice versa. Horses for courses.

Mut...@dastardlyhq.com

unread,

Jul 29, 2022, 10:14:22 AM7/29/22

to

argv, envp, network packets, navigating binary data (though that would more
likely be u_char*), pretty much every C API function that requires text etc.

If you don't want to deal with pointers perhaps you'd be happier using Java
or C#.

Öö Tiib

unread,

Jul 29, 2022, 11:25:30 AM7/29/22

to

Every standard library template has methods to provide raw pointers if
some legacy API that needs those is used. That is in thin interface layer
with legacy API and everywhere else I can enjoy that my code can deal
with things that matter.

> If you don't want to deal with pointers perhaps you'd be happier using Java
> or C#.

It is kind of nineties, pre-standard view at C++ that I have any need to
deal with pointers. My C++ typically beats good C# or Java 2 to 5 times
regardless if it is compiled to native binary before or just in time. Those
languages have unneeded dynamic layers of indirection, compiler has
itself to figure all compile time processing opportunities OTOH that is
tricky with run-time reflection in language so those are doomed to lose
forever both in performance and memory usage.

Mut...@dastardlyhq.com

unread,

Jul 29, 2022, 11:48:55 AM7/29/22

to

On Fri, 29 Jul 2022 08:25:21 -0700 (PDT)

Great idea, put a layer on top, that'll make things more efficient. Sometimes
things that matter include raw data (if you do systems or low level programming
which I suspect you don't) and good luck processing that efficiently in
std::string.

David Brown

unread,

Jul 29, 2022, 12:04:22 PM7/29/22

to

Try "a = b + c;" on an 8-bit AVR microcontroller, with "b" being a
64-bit "long long int" and "c" being a "double". The result will be
very large and slow library calls. Simple operators being "quick" in C,
or that "it's obvious what object code you get in C" is often an
unwarranted assumption.

Öö Tiib

unread,

Jul 29, 2022, 4:13:03 PM7/29/22

to

You suspect wrongly. The effect of most library classes is just that
those compile to about same code what you would do in C, just look
easier to read. The hand-written code is commonly less efficient, more
verbose, more error-prone, harder for compiler to optimize and maintainer
to fix.

Juha Nieminen

unread,

Jul 29, 2022, 6:47:34 PM7/29/22

to

Öö Tiib <oot...@hot.ee> wrote:
> On Friday, 29 July 2022 at 15:27:58 UTC+3, Juha Nieminen wrote:
>>
>> You were talking about "slowdowns", so let's talk about slowdowns.
>
> Yes lets consider:
>
> #include <iostream>
> #include <string>
> #include <string_view>
> using namespace std::literals;
>
> int main() {
> // we have fully compile time literals in language
> constexpr auto first_name = "Bernhard"sv, last_name = "Shaw"sv;

He said "std::string literal", not "std::string_view literal".

If he meant the latter then fine, but my response was made assuming he
meant what he wrote.

> So why we need char* ?

Sometimes having a pointer is more useful than having whatever
string_view is. Also, string_view is not yet universally supported, so
if you are eg. writing a library for others to use it might still be
a good idea to not demand it. char* is also useful when interfacing
with C libraries (or the C++ standard library functions that were
"inherited" from C, such as fopen().).

Richard Damon

unread,

Jul 29, 2022, 6:48:08 PM7/29/22

to

It may be a large number of instructions, but not THAT large of a
number, and unlikely to be slower that a "double" operation, which
(well, maybe long double will be slower, it suppported as something
different than double) is sort of a guideline for what is considered
"quick" by that measure.

Manfred

unread,

Jul 29, 2022, 8:15:17 PM7/29/22

to

On 7/29/2022 9:56 AM, Juha Nieminen wrote:
> Paavo Helde <ees...@osa.pri.ee> wrote:
>> Why on earth are you using char* in a C++ program? The only thing they
>> bring is problems and slowdowns (because string length needs to be
>> re-calculated all the time).
>
> Incorrect. In many situations using char* "strings" is more efficient than
> using std::string because of the elided dynamic memory allocation.

That, and below, is generally true for a generic dynamic container.
However, std::string is not a generic container. It is a very specific
and very optimized container of chars.
So, if you use a decent implementation of std the performance of
std::string and char* is usually pretty close.
Then, of course there are always corner cases where char* is faster, as
there are cases where knowing the length in advance is a deal breaker.
I'd regard these as the proverbial exceptions that confirm the rule (use
std::string in C++ and char* in C, that is).

Mut...@dastardlyhq.com

unread,

Jul 30, 2022, 5:28:45 AM7/30/22

to

On Fri, 29 Jul 2022 13:12:55 -0700 (PDT)

=?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
>On Friday, 29 July 2022 at 18:48:55 UTC+3, Mut...@dastardlyhq.com wrote:
>> On Fri, 29 Jul 2022 08:25:21 -0700 (PDT)

>> Great idea, put a layer on top, that'll make things more efficient.
>Sometimes
>> things that matter include raw data (if you do systems or low level
>programming
>> which I suspect you don't) and good luck processing that efficiently in
>> std::string.
>
>You suspect wrongly. The effect of most library classes is just that
>those compile to about same code what you would do in C, just look

If there's an interface layer there's a non zero time penalty for going through
it.

>easier to read. The hand-written code is commonly less efficient, more
>verbose, more error-prone, harder for compiler to optimize and maintainer
>to fix.

Really? Here's some code from a utility I wrote that reads network data into
a u_char frame buffer. Feel free to tell us how using std::string or similar
would improve the speed or make the code any more readable:

switch((rlen = read(bpf_fd,frame_buffer,frame_buffsize)))
{
case 0:
assert(0);
case -1:
perror("ERROR: read()");
exit(1);
}
for(ptr=frame_buffer;
ptr < frame_buffer + rlen;
ptr += BPF_WORDALIGN(bhdr->bh_hdrlen + bhdr->bh_caplen))
{
bhdr = (struct bpf_hdr *)ptr;
recv_pkt = (union un_recv_pkt *)(ptr + frame_hdr_len + bhdr->bh_
hdrlen);
processPacket(rlen);
}

Öö Tiib

unread,

Jul 30, 2022, 6:39:19 AM7/30/22

to

On Saturday, 30 July 2022 at 12:28:45 UTC+3, Mut...@dastardlyhq.com wrote:
> On Fri, 29 Jul 2022 13:12:55 -0700 (PDT)
> =?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
> >On Friday, 29 July 2022 at 18:48:55 UTC+3, Mut...@dastardlyhq.com wrote:
> >> On Fri, 29 Jul 2022 08:25:21 -0700 (PDT)
> >> Great idea, put a layer on top, that'll make things more efficient.
> >Sometimes
> >> things that matter include raw data (if you do systems or low level
> >programming
> >> which I suspect you don't) and good luck processing that efficiently in
> >> std::string.
> >
> >You suspect wrongly. The effect of most library classes is just that
> >those compile to about same code what you would do in C, just look
>
> If there's an interface layer there's a non zero time penalty for going through
> it.

On majority of cases with actual compilers that claim is incorrect.
Compilers inline function calls and optimize code not on narrow cases but
aggressively and massively. The libraries are typically written in manner to
support it. Naive code can use some construct that makes it hard for
compiler. Profilers have shown me plenty of that during decades.

> >easier to read. The hand-written code is commonly less efficient, more
> >verbose, more error-prone, harder for compiler to optimize and maintainer
> >to fix.
> Really? Here's some code from a utility I wrote that reads network data into
> a u_char frame buffer. Feel free to tell us how using std::string or similar
> would improve the speed or make the code any more readable:

Odd request. The std::string is meant for processing texts. If someone puts
some other data into it then they make their life harder, not easier.

>
> switch((rlen = read(bpf_fd,frame_buffer,frame_buffsize)))
> {
> case 0:
> assert(0);

That asserts that there are never end of file?

> case -1:
> perror("ERROR: read()");
> exit(1);

Any error ends the program?

Further code assumes that all read packets fit to buffer. I don't have such code
What embedded device it is that dies down on usual situations? Speed or
readability matter only after it is useful.

Öö Tiib

unread,

Jul 30, 2022, 6:48:45 AM7/30/22

to

On Saturday, 30 July 2022 at 01:47:34 UTC+3, Juha Nieminen wrote:
> Öö Tiib <oot...@hot.ee> wrote:
> > On Friday, 29 July 2022 at 15:27:58 UTC+3, Juha Nieminen wrote:
> >>
> >> You were talking about "slowdowns", so let's talk about slowdowns.
> >
> > Yes lets consider:
> >
> > #include <iostream>
> > #include <string>
> > #include <string_view>
> > using namespace std::literals;
> >
> > int main() {
> > // we have fully compile time literals in language
> > constexpr auto first_name = "Bernhard"sv, last_name = "Shaw"sv;
> He said "std::string literal", not "std::string_view literal".
>
> If he meant the latter then fine, but my response was made assuming he
> meant what he wrote.

Fair enough. The std::string may be used in constant evaluation
in C++20 but only if it is destroyed by the end of that evaluation. So
no constexpr std::string literals are possible.

> > So why we need char* ?
>
> Sometimes having a pointer is more useful than having whatever
> string_view is. Also, string_view is not yet universally supported, so
> if you are eg. writing a library for others to use it might still be
> a good idea to not demand it. char* is also useful when interfacing
> with C libraries (or the C++ standard library functions that were
> "inherited" from C, such as fopen().).

Yes, it was in boost around 2015 IIRC (from what it was taken to std)
but there are projects that have to use even older compilers. That
is still exception rather than rule.

David Brown

unread,

Jul 30, 2022, 8:08:26 AM7/30/22

to

I don't have a modern AVR toolchain on this machine, but that one
statement would require library code too large for a large proportion of
AVR microcontrollers. People used to PC programming consider anything
measured in kilobytes to be negligible size - people working with
microcontrollers with 8 or 16 KB code flash have a rather different
viewpoint. Similarly, an 8-bit register-to-register add instruction on
an AVR takes 1 clock cycles. No more, no less - no cache effects, or
pipelining, or sequencing. If you have a 20 MHz clock, it takes 50 ns.
But adding a 64-bit integer and a 64-bit floating point will take
vastly longer, with huge variation - maybe something between 300 and
1000 clock cycles.

Of course you don't use these types on such a microcontroller if you
don't have need of them, and you have to accept the price. The point is
that the assumption some people have that C is simple, and that
operators on fundamental types is always small, fast, and predictable,
can be a wildly incorrect assumption. It is wrong by several orders of
magnitude in this case.

(The AVR is perhaps the most modern 8-bit microcontroller core in
popular use, and new devices are released regularly. It is the only
8-bit target with full gcc support, AFAIK. So this is not just a legacy
or dinosaur core.)

Mut...@dastardlyhq.com

unread,

Jul 30, 2022, 10:23:52 AM7/30/22

to

On Sat, 30 Jul 2022 03:39:11 -0700 (PDT)

=?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
>On Saturday, 30 July 2022 at 12:28:45 UTC+3, Mut...@dastardlyhq.com wrote:
>> On Fri, 29 Jul 2022 13:12:55 -0700 (PDT)
>> =?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
>> >On Friday, 29 July 2022 at 18:48:55 UTC+3, Mut...@dastardlyhq.com wrote:
>> >> On Fri, 29 Jul 2022 08:25:21 -0700 (PDT)
>> >> Great idea, put a layer on top, that'll make things more efficient.
>> >Sometimes
>> >> things that matter include raw data (if you do systems or low level
>> >programming
>> >> which I suspect you don't) and good luck processing that efficiently in
>> >> std::string.
>> >
>> >You suspect wrongly. The effect of most library classes is just that
>> >those compile to about same code what you would do in C, just look
>>
>> If there's an interface layer there's a non zero time penalty for going
>through
>> it.
>
>On majority of cases with actual compilers that claim is incorrect.
>Compilers inline function calls and optimize code not on narrow cases but
>aggressively and massively. The libraries are typically written in manner to
>support it. Naive code can use some construct that makes it hard for
>compiler. Profilers have shown me plenty of that during decades.

Inlined functions still have code to be executed. You can't get an API
translation layer down to zero instructions if you're converting to complex
types such as std::string.

>> >easier to read. The hand-written code is commonly less efficient, more
>> >verbose, more error-prone, harder for compiler to optimize and maintainer
>> >to fix.
>> Really? Here's some code from a utility I wrote that reads network data into
>
>> a u_char frame buffer. Feel free to tell us how using std::string or similar
>
>> would improve the speed or make the code any more readable:
>
>Odd request. The std::string is meant for processing texts. If someone puts
>some other data into it then they make their life harder, not easier.

Not something that occured to you when you were asking what the point of
char* was.

>
>>
>> switch((rlen = read(bpf_fd,frame_buffer,frame_buffsize)))
>> {
>> case 0:
>> assert(0);
>
>That asserts that there are never end of file?

Its reading a berkeley packet filter intercepting UDP packets which has got to
this point via select() first. If read() returns 0 at this point then something
is fucked inside the API or OS.

>> case -1:
>> perror("ERROR: read()");
>> exit(1);
>
>Any error ends the program?

Yes, because it can't progress. Its a client, not a server.

>Further code assumes that all read packets fit to buffer. I don't have such
>code
>What embedded device it is that dies down on usual situations? Speed or
>readability matter only after it is useful.

Point proven I think.

Öö Tiib

unread,

Jul 30, 2022, 10:08:06 PM7/30/22

to

Why the code has to convert anything to std::string for to call legacy API
that takes char*? You were arguing with my claim that "Every standard

library template has methods to provide raw pointers if some legacy

API that needs those is used." Those methods do not do anything complex
and so are typically inlined into just passing pointer to legacy API.
Converting iterator to pointer is no operation as the address value
is same.

> >> >easier to read. The hand-written code is commonly less efficient, more
> >> >verbose, more error-prone, harder for compiler to optimize and maintainer
> >> >to fix.
> >> Really? Here's some code from a utility I wrote that reads network data into
> >
> >> a u_char frame buffer. Feel free to tell us how using std::string or similar
> >
> >> would improve the speed or make the code any more readable:
> >
> >Odd request. The std::string is meant for processing texts. If someone puts
> >some other data into it then they make their life harder, not easier.
> Not something that occured to you when you were asking what the point of
> char* was.

There is not only std::string. Standard library contains std::array<char, N>,
std::vector<char> etc. I honestly don't use raw pointers much ... for more
than 15 years already.

> >
> >>
> >> switch((rlen = read(bpf_fd,frame_buffer,frame_buffsize)))
> >> {
> >> case 0:
> >> assert(0);
> >
> >That asserts that there are never end of file?
> Its reading a berkeley packet filter intercepting UDP packets which has got to
> this point via select() first. If read() returns 0 at this point then something
> is fucked inside the API or OS.
> >> case -1:
> >> perror("ERROR: read()");
> >> exit(1);
> >
> >Any error ends the program?
> Yes, because it can't progress. Its a client, not a server.
> >Further code assumes that all read packets fit to buffer. I don't have such
> >code
> >What embedded device it is that dies down on usual situations? Speed or
> >readability matter only after it is useful.
> Point proven I think.

Some sub-part of some function I do not see usage for proves some point?

Mut...@dastardlyhq.com

unread,

Jul 31, 2022, 3:22:22 AM7/31/22

to

On Sat, 30 Jul 2022 19:07:58 -0700 (PDT)
=?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
>On Saturday, 30 July 2022 at 17:23:52 UTC+3, Mut...@dastardlyhq.com wrote:
>> Inlined functions still have code to be executed. You can't get an API
>> translation layer down to zero instructions if you're converting to complex
>> types such as std::string.
>
>Why the code has to convert anything to std::string for to call legacy API
>that takes char*? You were arguing with my claim that "Every standard
>library template has methods to provide raw pointers if some legacy
>API that needs those is used." Those methods do not do anything complex
>and so are typically inlined into just passing pointer to legacy API.

So whats the point then?

>> >some other data into it then they make their life harder, not easier.
>> Not something that occured to you when you were asking what the point of
>> char* was.
>
>There is not only std::string. Standard library contains std::array<char, N>,
>std::vector<char> etc. I honestly don't use raw pointers much ... for more
>than 15 years already.

Because as I said, you obviously don't do system or low level programming.
Certainly you've never been anywhere near code for a device driver.

>> >readability matter only after it is useful.
>> Point proven I think.
>
>Some sub-part of some function I do not see usage for proves some point?

There was enough code to provide us with your amazing alternative to not using
pointers. After all, you don't need pointers, right?

Öö Tiib

unread,

Jul 31, 2022, 9:07:08 AM7/31/22

to

On Sunday, 31 July 2022 at 10:22:22 UTC+3, Mut...@dastardlyhq.com wrote:
> On Sat, 30 Jul 2022 19:07:58 -0700 (PDT)
> =?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
> >On Saturday, 30 July 2022 at 17:23:52 UTC+3, Mut...@dastardlyhq.com wrote:
> >> Inlined functions still have code to be executed. You can't get an API
> >> translation layer down to zero instructions if you're converting to complex
> >> types such as std::string.
> >
> >Why the code has to convert anything to std::string for to call legacy API
> >that takes char*? You were arguing with my claim that "Every standard
> >library template has methods to provide raw pointers if some legacy
> >API that needs those is used." Those methods do not do anything complex
> >and so are typically inlined into just passing pointer to legacy API.

> So whats the point then?

Point is to remove possibility to do something stupid by typo. Raw pointer
can be used in lot of different roles and there are no role where all the
operations of it make sense. In C++ these operations don't compile.
So it++ is internally doing different things for std::set and for std::array but
I don't need to think about those details. A std::array does not silently
decay to iterator of first element and that iterator does not implicitly
convert into pointer of base class sub-object ... so I don't need to worry
about those possibilities. For unfortunate legacy there is implicit
constructor from char* to std::string. That may cause some illusions,
that usually do not matter. If these matter then use std::string_view.

> >> >some other data into it then they make their life harder, not easier.
> >> Not something that occured to you when you were asking what the point of
> >> char* was.
> >
> >There is not only std::string. Standard library contains std::array<char, N>,
> >std::vector<char> etc. I honestly don't use raw pointers much ... for more
> >than 15 years already.
> Because as I said, you obviously don't do system or low level programming.
> Certainly you've never been anywhere near code for a device driver.
> >> >readability matter only after it is useful.
> >> Point proven I think.
> >
> >Some sub-part of some function I do not see usage for proves some point?
> There was enough code to provide us with your amazing alternative to not using
> pointers. After all, you don't need pointers, right?

There was enough code to see that it is for something I don't sell. Raw pointers
I avoid, for rather long time. There are nothing amazing in it, just bit easier and
safer to program.

Juha Nieminen

unread,

Jul 31, 2022, 10:39:47 AM7/31/22

to

Manfred <non...@add.invalid> wrote:
> That, and below, is generally true for a generic dynamic container.
> However, std::string is not a generic container. It is a very specific
> and very optimized container of chars.
> So, if you use a decent implementation of std the performance of
> std::string and char* is usually pretty close.

I don't know if you are referring to it, but many people seem to think that
"short string optimization" makes std::string pretty much as efficient
as an array of char (for strings that are short enough).

They fail to take into consideration that short string optimization
requires conditionals in almost all member functons that access the
string data. Conditionals are not free.

mut...@dastardlyhq.com

unread,

Jul 31, 2022, 12:37:00 PM7/31/22

to

On Sun, 31 Jul 2022 06:07:00 -0700 (PDT)

=?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
>On Sunday, 31 July 2022 at 10:22:22 UTC+3, Mut...@dastardlyhq.com wrote:
>> On Sat, 30 Jul 2022 19:07:58 -0700 (PDT)
>> =?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
>> >On Saturday, 30 July 2022 at 17:23:52 UTC+3, Mut...@dastardlyhq.com wrote:
>> >> Inlined functions still have code to be executed. You can't get an API
>> >> translation layer down to zero instructions if you're converting to
>complex
>> >> types such as std::string.
>> >
>> >Why the code has to convert anything to std::string for to call legacy API
>> >that takes char*? You were arguing with my claim that "Every standard
>> >library template has methods to provide raw pointers if some legacy
>> >API that needs those is used." Those methods do not do anything complex
>> >and so are typically inlined into just passing pointer to legacy API.
>
>> So whats the point then?
>
>Point is to remove possibility to do something stupid by typo. Raw pointer

Example?

>> >Some sub-part of some function I do not see usage for proves some point?
>> There was enough code to provide us with your amazing alternative to not
>using
>> pointers. After all, you don't need pointers, right?
>
>There was enough code to see that it is for something I don't sell. Raw

You don't sell? What?

>pointers
>I avoid, for rather long time. There are nothing amazing in it, just bit
>easier and
>safer to program.

You can't always avoid pointers unless you only do high level programming.

Fred. Zwarts

unread,

Jul 31, 2022, 12:49:22 PM7/31/22

to

Op 31.jul..2022 om 16:39 schreef Juha Nieminen:

I wonder if that is true. The string object has a pointer to its buffer
and a length. I assume that if the short string optimization is used,
this pointer points to the internal buffer. Only member functions that
want to increase the size of the buffer need those conditionals. I
wonder whether "almost all member functions" modify the buffer size.

Malcolm McLean

unread,

Jul 31, 2022, 5:50:02 PM7/31/22

to

The std::string is (in C)
struct string
{
char *buff;
size_t len;
}
so 16 bytes on a 64-bit machine.
However in reality not all 64 bits of a pointer are wired to memory addresses.
It's likely that the most significant bit has to be clear in a valid address.
So we can exploit this by (in C)
struct shortstring
{
bool flag; // set when the shortstring is valid
int pad:7; // Maybe use for length or encoding or other stuff
char data[15]; // UP to 14 characters of ASCII string data.
};

union
{
struct string s;
struct shortstring ss;
} std_string;

Now string::size is implemented as
if(s->ss.flag)
return strlen(s->ss.data);
else
return s->s.len;

The other string member functions are implemented similarly.

Keith Thompson

unread,

Jul 31, 2022, 8:04:09 PM7/31/22

to

That's one possible implementation.

> However in reality not all 64 bits of a pointer are wired to memory addresses.
> It's likely that the most significant bit has to be clear in a valid address.
> So we can exploit this by (in C)
> struct shortstring
> {
> bool flag; // set when the shortstring is valid
> int pad:7; // Maybe use for length or encoding or other stuff
> char data[15]; // UP to 14 characters of ASCII string data.
> };
>
> union
> {
> struct string s;
> struct shortstring ss;
> } std_string;
>
> Now string::size is implemented as
> if(s->ss.flag)
> return strlen(s->ss.data);
> else
> return s->s.len;
>
>
> The other string member functions are implemented similarly.

That won't work without extra code to ensure that the short string
optimization isn't used in all cases. The length of a std::string
is not determined by a null character. This program must print 3 :

#include <iostream>
#include <string>

int main() {
std::string s = "a";
s += '\0';
s += 'b';
std::cout << s.size() << '\n';
}

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

Malcolm McLean

unread,

Aug 1, 2022, 3:16:38 AM8/1/22

to

You're right. std::size includes embedded nuls. So in fact you need to use four bits
for the "pad" field to store the length. Or nul-pad the string and work backwards
until you hit a set byte.

Ike Naar

unread,

Aug 1, 2022, 3:27:53 AM8/1/22

to

On 2022-08-01, Malcolm McLean <malcolm.ar...@gmail.com> wrote:
> You're right. std::size includes embedded nuls. So in fact you need to use four bits
> for the "pad" field to store the length. Or nul-pad the string and work backwards
> until you hit a set byte.

Working backwards will give the wrong result if there are embedded nuls
at the end of the string.

Paavo Helde

unread,

Aug 1, 2022, 3:34:45 AM8/1/22

to

01.08.2022 00:49 Malcolm McLean kirjutas:

> The std::string is (in C)
> struct string
> {
> char *buff;
> size_t len;
> }
> so 16 bytes on a 64-bit machine.

In reality some popular implementations have sizeof(std::string)==32
(e.g. MSVC2019 on Windows, gcc 8.3 on Linux). When trying to study the
internals I see lots of attention to allocators. Indeed, all string
constructors also take an allocator argument which must be stored somewhere.

With 32 bytes, there would be a lot more room for SSO. Unfortunately it
seems the implementations are not very eager to make use of that space,
I think only 16 bytes gets typically used as SSO. Presumably they have
the same char* buff and size_t len in an SSO string, it's just that the
char* buff pointer just points 16 bytes ahead, into the internal SSO
buffer of 16 bytes. This ensures there would be zero overhead for SSO
and no conditionals involved for non-mutable access.

Paavo Helde

unread,

Aug 1, 2022, 3:47:52 AM8/1/22

to

Or check for the presence of embedded nuls and switch off SSO for such
strings. Alas, that would mean non-zero overhead.

Öö Tiib

unread,

Aug 1, 2022, 4:13:05 AM8/1/22

to

On Sunday, 31 July 2022 at 19:37:00 UTC+3, mut...@dastardlyhq.com wrote:
> On Sun, 31 Jul 2022 06:07:00 -0700 (PDT)
> =?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
> >On Sunday, 31 July 2022 at 10:22:22 UTC+3, Mut...@dastardlyhq.com wrote:
> >> On Sat, 30 Jul 2022 19:07:58 -0700 (PDT)
> >> =?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> wrote:
> >> >On Saturday, 30 July 2022 at 17:23:52 UTC+3, Mut...@dastardlyhq.com wrote:
> >> >> Inlined functions still have code to be executed. You can't get an API
> >> >> translation layer down to zero instructions if you're converting to
> >complex
> >> >> types such as std::string.
> >> >
> >> >Why the code has to convert anything to std::string for to call legacy API
> >> >that takes char*? You were arguing with my claim that "Every standard
> >> >library template has methods to provide raw pointers if some legacy
> >> >API that needs those is used." Those methods do not do anything complex
> >> >and so are typically inlined into just passing pointer to legacy API.
> >
> >> So whats the point then?
> >
> >Point is to remove possibility to do something stupid by typo. Raw pointer
>
> Example?

Good programmer makes at least one typo in every 10 lines of code. Weak
programmer lot more. If you have ever tried to program something then you
have plenty of examples. Now if compiler does compile that typo then hopefully
unit test catches it. If unit tests lack the check then it goes to testers and
so on. Price of fixing it grows the farther it reaches and the longer it takes to
discover it. So when interface of type provides only operations that make sense,
then resulting program is less expensive, there are no differences in performance.

> >> >Some sub-part of some function I do not see usage for proves some point?
> >> There was enough code to provide us with your amazing alternative to not
> >using
> >> pointers. After all, you don't need pointers, right?
> >
> >There was enough code to see that it is for something I don't sell. Raw
>
> You don't sell? What?

I sell software. Have had share in company for close to 30 years. Too sloppy
code causes more issues than opportunities to raise wealth. World is already
full to brim of weak code.

> >pointers
> >I avoid, for rather long time. There are nothing amazing in it, just bit
> >easier and
> >safer to program.
> You can't always avoid pointers unless you only do high level programming.

True. Avoiding is hard at very low level for what standard library does not
contain good tools. Then we have to write such tools and so to deal with
pointers ourselves and better abstract these away ourselves. These are
usually pointers to volatile on such cases.

Juha Nieminen

unread,

Aug 1, 2022, 7:30:08 AM8/1/22

to

Fred. Zwarts <F.Zw...@kvi.nl> wrote:
>> I don't know if you are referring to it, but many people seem to think that
>> "short string optimization" makes std::string pretty much as efficient
>> as an array of char (for strings that are short enough).
>>
>> They fail to take into consideration that short string optimization
>> requires conditionals in almost all member functons that access the
>> string data. Conditionals are not free.
>
>
> I wonder if that is true. The string object has a pointer to its buffer
> and a length. I assume that if the short string optimization is used,
> this pointer points to the internal buffer. Only member functions that
> want to increase the size of the buffer need those conditionals. I
> wonder whether "almost all member functions" modify the buffer size.

I wonder why they don't just outright add "static" versions of all the
data containers. As in, you specify the maximum size of the container
as a template parameter, and the side of the object itself will be
that much. Just like std::array, but containing all the member
functions of the data container. So for example you could have a
"static" std::string of a maximum length that you specify as a
template parameter, which does no dynamic memory allocations.

(Of course nothing stops me from implementing such a thing
myself, but...)

Bo Persson

unread,

Aug 1, 2022, 8:39:30 AM8/1/22

to

One inconvenience would be that each different length would be a
separate type. That would make many string ops really awkward, if it
changes type halfway through.

You can already reserve() the space for a std::string, at the cost of a
single dynamic allocation.

Juha Nieminen

unread,

Aug 2, 2022, 1:43:55 AM8/2/22

to

Bo Persson <b...@bo-persson.se> wrote:
> You can already reserve() the space for a std::string, at the cost of a
> single dynamic allocation.

The main idea would be that if it's eg. a member variable of a class,
its maximum length is something like 20 characters, and the class is
instantiated millions of times, it will save a lot of memory, cause
less memory fragmentation and be more efficient.

But yeah, in such situations it would probably not be a huge amount
of trouble to create a custom implementation.

Manfred

unread,

Aug 2, 2022, 7:13:33 PM8/2/22

to

It is more general than plain SSO.

My point is that std::string is different from a generic container when
it comes to performance, because implementors know two things:
1) std::string is made to contain char's (and, more rarely, wchar_t's) -
it's not made to contain other stuff (well, maybe technically it can,
but then it can happily perform poorly)
2) users are more eager to get performance out of std::string than out
of say std::vector.

So, SSO is /one/ of the technologies that implementors can use to
improve performance, but they (expecially the good ones) are known to be
quite creative when it comes to performance. SSO is a good example to
explain my point, though: while it makes sense for strings, I guess very
few, if anyone, would be willing to pay for some sort of SVO (Short
Vector Optimization)

Manfred

unread,

Aug 2, 2022, 7:20:32 PM8/2/22

to

On most modern architectures, it's even more likely that the /least/
significant bit is zero, which is in fact what your implementation does
on a LE architecture.

Paavo Helde

unread,

Aug 3, 2022, 2:35:17 AM8/3/22

to

Right, SSO makes more sense for strings, mostly because the character
size is small, compared to a typical vector element size (as long as one
is using ASCII or UTF-8 and not something like std::u32string).

From what I see in common C++ implementations, sizeof(std::string) is
32, from which 16 bytes gets reused as the SSO buffer, meaning that SSO
applies to strings up to length 15. That's something.

Sizeof(std::vector) seems to be something like 24, from which 8 bytes
could be used as "SVO" buffer, meaning that this optimization would
apply for vectors of max 1-2 elements only. Not so useful.

Juha Nieminen

unread,

Aug 3, 2022, 4:00:02 AM8/3/22

to

Manfred <non...@add.invalid> wrote:
>> However in reality not all 64 bits of a pointer are wired to memory addresses.
>> It's likely that the most significant bit has to be clear in a valid address.
>
> On most modern architectures, it's even more likely that the /least/
> significant bit is zero, which is in fact what your implementation does
> on a LE architecture.

That would mean that a pointer can only point to even addresses. I don't
think that's the case. Rather obviously a pointer must be able to point
to any address (else you would be able to eg. traverse a string with
a pointer).

IIRC in Linux the kernel reserves the upper half of the memory range to the
kernel space and the lower half to user space (and IIRC user code has no
business using any kernel space pointer for any reason in any situation)
so it may be that at least in Linux the most-significant bit of pointers
in user code is indeed always zero. However, I would never write code
that makes this assumptions because it's not really an assumption you
can make at any level (I don't think even the Linux kernel makes that
absolute guarantee).

Bo Persson

unread,

Aug 3, 2022, 6:45:49 AM8/3/22

to

On 2022-08-03 at 09:59, Juha Nieminen wrote:
> Manfred <non...@add.invalid> wrote:
>>> However in reality not all 64 bits of a pointer are wired to memory addresses.
>>> It's likely that the most significant bit has to be clear in a valid address.
>>
>> On most modern architectures, it's even more likely that the /least/
>> significant bit is zero, which is in fact what your implementation does
>> on a LE architecture.
>
> That would mean that a pointer can only point to even addresses. I don't
> think that's the case. Rather obviously a pointer must be able to point
> to any address (else you would be able to eg. traverse a string with
> a pointer).
>

The observation is about *heap memory* used by the containers, where you
are likley not allowed to alloocate a single byte, but will get a
pointer properly aligned at 8 or 16 bytes. And therefore has some zeros
at the lower end.

Juha Nieminen

unread,

Aug 3, 2022, 6:53:34 AM8/3/22

to

The topic is short-string-optimization of a std::string-like class.
If the internal pointer used in that class is pointing to either allocated
memory or an internal buffer, you can't be sure that the internal buffer
will be aligned to a 16-byte, 8-byte or even a 2-byte boundary. That's
because the class instance itself may not be allocated on the heap.

Malcolm McLean

unread,

Aug 3, 2022, 6:58:33 AM8/3/22

to

But only if the architecture doesn't require alignment. If a structure has a
member requiring 4 byte alignment, then the whole structure has to be 4-byte
aligned when created on the stack.

Paavo Helde

unread,

Aug 3, 2022, 7:50:10 AM8/3/22

to

Even when allocated on stack, it will be aligned properly as needed for
the pointer or size_t members, enforcing at least 8-byte alignment in
64-bit.

That's not to say SSO ought to use such bit-fiddling, the current SSO
implementations apparently do no such thing (presumably because it would
cause overhead even for non-mutable access).

Öö Tiib

unread,

Aug 4, 2022, 2:31:51 AM8/4/22

to

Yes it could be possible to fit 30 character SSO into 32 byte std::string
but most current implementations have 15 character SSO.
That means as it is performance optimisation they deliver performance
not storage.