Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

special string-implementation

59 views
Skip to first unread message

Bonita Montero

unread,
Apr 19, 2019, 10:40:57 AM4/19/19
to
I often asked myself if there is something like basic_string, but with a
template-parameter for the size of an an internal buffer which holds the
string instead of allocating the content on the heap. Optionally this
could include allocating storage for strings which exceed the internal
capaciy (otherwise the string could throw an exception if it will be
grown beyond the limit)
Such a class could have a number of non-member operator-overloads for
adding strings and whatever (thereby yieding a usual basic_string of
course). And a number of member-operators like += which operate on the
internal buffer.
When correctly used, f.e. for strings which are temporary and could
reside on the stack anyway, this type of strings with an internal buffer
could increase the performance of string-handling significantly. On the
other side, there are many standard-library facilities that only accept
a basic_string so this type of string would be incompatible.

Marcel Mueller

unread,
Apr 19, 2019, 2:41:06 PM4/19/19
to
Am 19.04.19 um 16:40 schrieb Bonita Montero:
> I often asked myself if there is something like basic_string, but with a
> template-parameter for the size of an an internal buffer which holds the
> string instead of allocating the content on the heap.

std::array<char,N>?

> Optionally this
> could include allocating storage for strings which exceed the internal
> capaciy (otherwise the string could throw an exception if it will be
> grown beyond the limit)

Some string implementations have handling for fast in place storage for
short strings.

> Such a class could have a number of non-member operator-overloads for
> adding strings and whatever (thereby yieding a usual basic_string of
> course). And a number of member-operators like += which operate on the
> internal buffer.
> When correctly used, f.e. for strings which are temporary and could
> reside on the stack anyway, this type of strings with an internal buffer
> could increase the performance of string-handling significantly.

Have a look at the short string optimization.

> On the
> other side, there are many standard-library facilities that only accept
> a basic_string so this type of string would be incompatible.

Exactly. And converting strings over and over (with allocations) can be
much more impact than the optimization gain.

As long as there is no need to pass /mutable/ strings to library
functions the use of const char* is the least common denominator. It is
quite easy to provide zero copy conversion operators to this type for
any string implementation.


Marcel

Bonita Montero

unread,
Apr 19, 2019, 2:59:03 PM4/19/19
to
>> I often asked myself if there is something like basic_string, but with a
>> template-parameter for the size of an an internal buffer which holds the
>> string instead of allocating the content on the heap.

> std::array<char,N>?

Thats much less than the functionality I have in mind.

>> Optionally this
>> could include allocating storage for strings which exceed the internal
>> capaciy (otherwise the string could throw an exception if it will be
>> grown beyond the limit)

> Some string implementations have handling for fast in place storage for
> short strings.

Short stings woulf fit only for very short strings.

>> On the
>> other side, there are many standard-library facilities that only accept
>> a basic_string so this type of string would be incompatible.

> Exactly. And converting strings over and over (with allocations) can be
> much more impact than the optimization gain.

There would be no additional conversions.

> As long as there is no need to pass /mutable/ strings to library
> functions the use of const char* is the least common denominator.

Maybe, but there are other cases. Memory-allocation is simply slow.

> It is quite easy to provide zero copy conversion operators to this
> type for any string implementation.

An overloaded += and other operatoes also wouldn't copy if the capacity
would be suffient.

Marcel Mueller

unread,
Apr 20, 2019, 4:47:40 AM4/20/19
to
Am 19.04.19 um 20:58 schrieb Bonita Montero:
>> Some string implementations have handling for fast in place storage
>> for short strings.
>
> Short stings woulf fit only for very short strings.

Reserving unnecessarily large storage for every string (for fixed sized
strings) can have a large impact as well. From my experience this is
either counterproductive or just a bad workaround for other problems.

>>> On the
>>> other side, there are many standard-library facilities that only accept
>>> a basic_string so this type of string would be incompatible.
>
>> Exactly. And converting strings over and over (with allocations) can
>> be much more impact than the optimization gain.
>
> There would be no additional conversions.

You just mentioned that library functions will not be compatible to
whatever other string implementation you use. This is an issue.

>> As long as there is no need to pass /mutable/ strings to library
>> functions the use of const char* is the least common denominator.
>
> Maybe, but there are other cases. Memory-allocation is simply slow.

It depends. But indeed, the standard allocators of several platforms are
not exactly brilliant.


>> It is quite easy to provide zero copy conversion operators to this
>> type for any string implementation.
>
> An overloaded += and other operatoes also wouldn't copy if the capacity
> would be suffient.

+= always requires some copying unless have a rope implementation with
immutable fragments.

And no rule says that std::string needs to do an extra allocation for
every += call. In fact it does not. Of course it cannot estimate the
final size of your string. But feel free to call .reserve() to ensure
enough space before the concatenation. This is quite close to your
requirement, especially the one that requires allocation if the buffer
gets too small.

It is true that you cannot simply inject an optimized custom allocator
into basic_string without breaking type compatibility. But you can reuse
an instance for building several strings. The reduces allocations a lot
as well. At least if your platforms string implementation supports COW.


You are right, std::string is not the optimum for every purpose. But
from my point of view it does not mainly lack from an optimized allocator.
The main issue is that there is no concept of *immutable strings* as a
distinct type in the standard library. Carrying the overhead of mutable
strings everywhere in the application is the largest impact. This
especially applies in multi-threaded context, i.e almost any application
nowadays.

A simple base class that consists just of the size and the reference to
a data storage could reduce the need for copying a lot. Even COW
optimizations (that add further complexity) are now superfluous.
Especially different string implementations could provide implicit
conversions to that type as long as they use the compatible storage layout.


In fact I made the best experience with an immutable, strongly thread
safe string class. It uses exactly one storage for every string that
consists of the length and the content and a reference counter.
Instances of the class are just a pointer to a storage object. Storage
objects are shared between instances. Actually these pointers are binary
compatible to const char* so a conversion to this type is just a NOP.

You can get further gain if you deduplicate identical strings in memory
for strings. This can reduce the memory footprint a lot. While saving
some memory might not be of primary interest the side effect of
significantly higher cache hit rate can give a remarkable gain to
application performance. The working set size simply decreases. For
relational database applications with many users with typically large
redundancy the memory gain can be up to one order of magnitude.
The additional effort for deduplication is usually insignificant. It can
also avoid the allocation if the factory function already does the
deduplication when converting from raw storage.
And there is another effect: scalability becomes /below/ linear. The
more data you load the higher is the probability that you already have
some of its content in memory.

But this concept requires /immutability/. It cannot be implemented on
top of std::string.


So from my point of view the problem with allocations of std::string is
mainly a consequence of its mutability.

Of course, you always need a second, mutable string class. But this one
can use completely different allocation strategies. But if your
implementation is smart enough, the storage object behind the string
builder class can be the same than the storage object behind the
immutable string. So extracting an immutable string from the string
builder could be a trivial O(1) operation with no allocation as long as
you accept that the string builder is empty afterwards which is the
typical use case anyway.

On the other side, if you intend to build a large number of strings with
different size it could be more helpful to allocate a shared buffer once
and reuse it for every string. This buffer could also be on the stack.
Now the conversion to an immutable string always requires an allocation
(unless you use deduplication of course) but now the creation of the
string is likely to cause no more allocations which might be more important.
std::string could be used for this purpose if you pass a custom
allocator. But you should always convert this to a more compact form
when you are done rather than carrying the backpack of mutability and
custom allocators everywhere in your application.


I have built at least two larger applications that use the above
concepts. One in C# which already has an immutable string class and
another one in C++ with a custom string class. Both provide very high
performance ate quite low memory usage.
Loading about 10GB data from an RDBMS implodes to something about 1GB
memory. Even several dozen concurrent users create no significant CPU
load. And the application does not wait for I/O either, as it mainly
operates in memory. Only remote calls to other applications can slow
down response times.
There are some other deduplication concepts in the application but the
largest gain is from strings.

Furthermore once you have deduplication other features become quite
easy. E.g. if you want to want to provide full versioning in case of
changes to objects this now takes almost non memory as new versions of
objects likely share most of their properties with the older ones. In
fact keeping 5 years of history of really everything did not cause any
performance issue at all. And access to this data is transparent. Simply
choose date and time and you see an old snapshot of all data including
all work items that were post deadline at this point and whatever.


Marcel

P.S.: There is another thing I do to prevent unnecessary allocations
when building strings: printf-like formatting allows the implementation
to allocate one final buffer of adequate size. This is impossible if you
add compontents with subsequent method calls. So creating an immutable
string from a printf like format string is not that bad either. Of
course this is not always reasonable. Many platforms provide printf
format checking, so UB from the missing type safety could be mostly
eliminated.

Bonita Montero

unread,
Apr 20, 2019, 5:02:32 AM4/20/19
to
> Reserving unnecessarily large storage for every string (for fixed
> sized strings) can have a large impact as well. ...

Yes, but there are other cases like temporary strings on the stack where
my solution could be suitable. Or with strings that usually have a cer-
tain size. Or with strings where performance is more important than the
waste of space.

> You just mentioned that library functions will not be compatible to
> whatever other string implementation you use. This is an issue.

In this case you won't use my idea.

> += always requires some copying unless have a rope implementation with
> immutable fragments.

+= won't have any copying if the capacity of the left string would be
sufficient.

> You are right, std::string is not the optimum for every purpose. But
> from my point of view it does not mainly lack from an optimized allocator.

You can't build an allocator with internal storage to replace my idea.
I tried this and found that some runtime-libraries additionally allocate
storage for debugging-purposes. So allocator can't say which of the
allocations is for the characters of the string.

Marcel Mueller

unread,
Apr 20, 2019, 5:19:48 AM4/20/19
to
Am 20.04.19 um 11:02 schrieb Bonita Montero:
>> You just mentioned that library functions will not be compatible to
>> whatever other string implementation you use. This is an issue.
>
> In this case you won't use my idea.

Your idea is not compatible with std::string or did I miss something?


>> += always requires some copying unless have a rope implementation with
>> immutable fragments.
>
> += won't have any copying if the capacity of the left string would be
> sufficient.

If you compose a string with += /every/ content is copied at least once. ;-)



>> You are right, std::string is not the optimum for every purpose. But
>> from my point of view it does not mainly lack from an optimized
>> allocator.
>
> You can't build an allocator with internal storage to replace my idea.
> I tried this and found that some runtime-libraries additionally allocate
> storage for debugging-purposes. So allocator can't say which of the
> allocations is for the characters of the string.

That's true. But where is the problem? If you provide a fixed size
storage it is sufficient for /some/ strings. Additional information
slightly decrease the maximum string size the storage can hold. no more
no less.

By the way, I never had any platform that allocated additional storage
for debugging purposes /before/ the call to operator new. Of course,
operator new does this behind the scenes.
Are you sure that the additional storage did not come from a vtable or
something like that? You should use POD types for this purpose.


Marcel

Bonita Montero

unread,
Apr 20, 2019, 7:20:31 AM4/20/19
to
>> In this case you won't use my idea.

> Your idea is not compatible with std::string or did I miss something?

It would be partitially compatible if you overload the global operators.

>> You can't build an allocator with internal storage to replace my idea.
>> I tried this and found that some runtime-libraries additionally allocate
>> storage for debugging-purposes. So allocator can't say which of the
>> allocations is for the characters of the string.

> That's true. But where is the problem? If you provide a fixed size
> storage it is sufficient for /some/ strings. Additional information
> slightly decrease the maximum string size the storage can hold. no
> more no less.

I cant recognize if you misunderstood me. There might me more than one
allocation from the string-object so that you can't distinguish between
the one for the characters and those for whatever.

> Are you sure that the additional storage did not come from a vtable
> or something like that? ...

I don't analyzed the of the runtime-libary. I just wrote an allocator
just for fun that allocates in multiples of pages with mmap() of
VirtualAlloc() and I found, that some standard-library-implementations
additionally allocate a second block with a small size so that my
allocator wouldn't fit. But I tested this only with std::vector.

Melzzzzz

unread,
Apr 20, 2019, 12:11:30 PM4/20/19
to
You have heard of small string optimisation invented by Andrei
Alexandrescu?

--
press any key to continue or any other to quit...

Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
bili naoruzani. -- Mladen Gogala

Bonita Montero

unread,
Apr 20, 2019, 12:17:20 PM4/20/19
to
> You have heard of small string optimisation invented by Andrei
> Alexandrescu?

This might be called a small string optimizatzion not only because of
it is optimized for small strings but it also has a small effect since
it works only with very small strings. I already said that in the con-
ver-ation with Marcel.
And I think this is not a good idea to have *generally* such an
"optimization" because it generally might have the effect of wasting
space. That's also true for my idea, but as you could have noticed
also from what I told Marcel I don't think this would be a type of
string for special circumstances only.

Melzzzzz

unread,
Apr 20, 2019, 12:20:28 PM4/20/19
to
Most strings are small. Large ones are rare. You don't place large thing
on stack because stack is small usually.

Bonita Montero

unread,
Apr 20, 2019, 12:36:16 PM4/20/19
to
> Most strings are small.

I wouldn't bet on that.
And the small string optimization with the glibc++ on 64-bit-systems
is only 22 characters. But nevertheless this optimization wouldn't make
my idea useless.

> Large ones are rare. You don't place
> large thing on stack because stack is small usually.

The stack is usually extremely large in the user-space. The smallest
default stack size of the common operating systems is macOS with 512kB.
That's possible because stackspace is overcommitted at least on Unix
-systems; on Windows-systems (default stack-size 1MB) the stack pages
aren't allocated on construction but subtracted from the pagefile for
to guarantee that the stack could theoretically grow to its maximum.

Melzzzzz

unread,
Apr 20, 2019, 2:21:58 PM4/20/19
to
On 2019-04-20, Bonita Montero <Bonita....@gmail.com> wrote:
And that's large?

Bonita Montero

unread,
Apr 20, 2019, 3:16:43 PM4/20/19
to
>> The stack is usually extremely large in the user-space. The smallest
>> default stack size of the common operating systems is macOS with 512kB.
>> That's possible because stackspace is overcommitted at least on Unix
>> -systems; on Windows-systems (default stack-size 1MB) the stack pages
>> aren't allocated on construction but subtracted from the pagefile for
>> to guarantee that the stack could theoretically grow to its maximum.

> And that's large?

Yes, 1MB stack-size is very large. That would be consumend
to a significant extent only when havin massive alloca()s.

Melzzzzz

unread,
Apr 20, 2019, 3:50:42 PM4/20/19
to
On 2019-04-20, Bonita Montero <Bonita....@gmail.com> wrote:
Or your strings...

Bonita Montero

unread,
Apr 20, 2019, 4:22:17 PM4/20/19
to
>>> And that's large?

>> Yes, 1MB stack-size is very large. That would be consumend
>> to a significant extent only when havin massive alloca()s.

> Or your strings...

Even if you end up allocating as much stack space through this kind of
"stacticized" strings like in C this wouldn't blow up the stack to its
maximum size.

Melzzzzz

unread,
Apr 20, 2019, 4:32:29 PM4/20/19
to
On 2019-04-20, Bonita Montero <Bonita....@gmail.com> wrote:
Arrays on stack is always bad idea.

Paavo Helde

unread,
Apr 20, 2019, 4:33:05 PM4/20/19
to
You must be kidding. A random cat picture from the intertubes is likely
more than 1 MB nowaydays.

More to the point, there are many problems which could be elegantly
solved by recursive algorithms in C++, but technically they cannot
because the hardware recursion depth is severely limited, maybe already
in the range of thousands or even hundreds of recursions, depending on
the algorithm.

Been there, seen that, and what's worse, I will be the one who will fix
the code (to use heap instead of stack). There is no alloca() around
unfortunately, replacing it with std::vector would be an easy fix.

Bonita Montero

unread,
Apr 20, 2019, 4:37:35 PM4/20/19
to
>> Yes, 1MB stack-size is very large. That would be consumend
>> to a significant extent only when havin massive alloca()s.

> You must be kidding. A random cat picture from the intertubes is likely
> more than 1 MB nowaydays.

You won't store this and everything in the size-magnitude of this on the
stack.

> More to the point, there are many problems which could be elegantly
> solved by recursive algorithms in C++, but technically they cannot
> because the hardware recursion depth is severely limited, ...

This is not a practical boundary since you rarely allocate so much data
per recursion-level on the stack.

Bonita Montero

unread,
Apr 20, 2019, 4:39:37 PM4/20/19
to
>>>> Yes, 1MB stack-size is very large. That would be consumend
>>>> to a significant extent only when havin massive alloca()s.

>>> Or your strings...

>> Even if you end up allocating as much stack space through this kind of
>> "stacticized" strings like in C this wouldn't blow up the stack to its
>> maximum size.

> Arrays on stack is always bad idea.

That depends on the size. And if you don't touch the elements of the
array yourself but use something like += on on my idea there won't be
something like buffer-overflows.

Chris M. Thomasson

unread,
Apr 20, 2019, 5:29:07 PM4/20/19
to
Use a region allocator? Here is an older one I created:

<beware, it uses a hack for alignment...>

https://pastebin.com/f37a23918

https://groups.google.com/d/msg/comp.lang.c/7oaJFWKVCTw/sSWYU9BUS_QJ

One can break a region allocator apart into sub regions that can be
reaped, or even garbage collected...

For Bonita:

The region can be fed with stack based memory... ;^)

Paavo Helde

unread,
Apr 20, 2019, 5:43:01 PM4/20/19
to
On 20.04.2019 23:37, Bonita Montero wrote:
>>> Yes, 1MB stack-size is very large. That would be consumend
>>> to a significant extent only when havin massive alloca()s.
>
>> You must be kidding. A random cat picture from the intertubes is
>> likely more than 1 MB nowaydays.
>
> You won't store this and everything in the size-magnitude of this on the
> stack.

And why not? The only problem is the technical limitation.

>
>> More to the point, there are many problems which could be elegantly
>> solved by recursive algorithms in C++, but technically they cannot
>> because the hardware recursion depth is severely limited, ...
>
> This is not a practical boundary since you rarely allocate so much data
> per recursion-level on the stack.

One rarely allocates much data on stack because one knows there are
technical limitations, and tries to stay clear of them.

Even if there is not much data on stack the issue still remains. Let's
say I have 4 bytes local data per stack frame, and the stack is 1MB.
This means I am limited to ca 250,000 recursions. At the same time my
machine has 16 GB of memory, meaning in principle I could have ca
4,000,000,000 recursions instead. Why am I limited to 0.006 % of the
machine capabilities?



Bonita Montero

unread,
Apr 21, 2019, 3:21:08 AM4/21/19
to
>>> You must be kidding. A random cat picture from the intertubes is
>>> likely more than 1 MB nowaydays.

>> You won't store this and everything in the size-magnitude of this on the
>> stack.

> And why not? The only problem is the technical limitation.

When you do an allocation with alloca() you do it because of the
performance. But for larger allocations the performance-difference
on the heap is negligible.

> Even if there is not much data on stack the issue still remains. Let's
> say I have 4 bytes local data per stack frame, and the stack is 1MB.
> This means I am limited to ca 250,000 recursions.

... very realistic.

> At the same time my machine has 16 GB of memory, meaning in principle
> I could have ca 4,000,000,000 recursions instead. Why am I limited to
> 0.006 % of the machine capabilities?

Because no one will do a recursion with 4e9 levels, even not with 2,5E4
levels.

Thiago Adams

unread,
Apr 22, 2019, 8:08:14 AM4/22/19
to
In C, I use my own version called "LocalStringBuilder"
http://thradams.com/localstrbuilder.htm

It uses a buffer in the stack but if necessary it allocates
memory on heap.

In C++ (maybe?) this can be done changing the allocator of std::string
or creating a new specialized template class.



James Kuyper

unread,
Apr 22, 2019, 9:03:07 AM4/22/19
to
On 4/20/19 4:32 PM, Melzzzzz wrote:
> On 2019-04-20, Bonita Montero <Bonita....@gmail.com> wrote:
>>>>> And that's large?
>>
>>>> Yes, 1MB stack-size is very large. That would be consumend
>>>> to a significant extent only when havin massive alloca()s.
>>
>>> Or your strings...
>>
>> Even if you end up allocating as much stack space through this kind of
>> "stacticized" strings like in C this wouldn't blow up the stack to its
>> maximum size.
>
> Arrays on stack is always bad idea.

Can you give a reason for that judgment that isn't addressed by choosing
a large stack size and allocating only small arrays on the stack (for
suitable values of "large" and "small")?

Öö Tiib

unread,
Apr 22, 2019, 10:22:43 AM4/22/19
to
Sure there are such classes. Fair amount of optimizations can be
done by using std::array or mundane array plus std::string_view.
That combo fits fully to stack. Alternatively (for fans of
compile-time processing) that combo can be fully constexpr.
When there is issue that even the length of string (and its buffer
needed) is uncertain compile-time then unfortunately there are
no standard solutions since neither VLA nor alloca() are in C++
(but one and/or other is usually supported as extension).

However first I always make unoptimized version (that just uses
std::strings) fully ready and covered with tests. Budget for
performance works is lot easier to get when there is something
that is worth to optimize. Otherwise such tinkering wastes too
lot of time first and makes C++ team to look sluggish compared
to Python or Java team.

Bonita Montero

unread,
Apr 23, 2019, 2:38:52 PM4/23/19
to
> Sure there are such classes. Fair amount of optimizations can be
> done by using std::array or mundane array plus std::string_view.

string_view isn't mutable.

Bonita Montero

unread,
Apr 23, 2019, 2:40:00 PM4/23/19
to
> Use a region allocator? Here is an older one I created:

An std::allocator<T> replacement has to have a usual heap-behaviour.

Thiago Adams

unread,
Apr 23, 2019, 2:47:24 PM4/23/19
to
On Tuesday, April 23, 2019 at 3:40:00 PM UTC-3, Bonita Montero wrote:
> > Use a region allocator? Here is an older one I created:
>
> An std::allocator<T> replacement has to have a usual heap-behaviour.

What is your view about using C and making everything
tailored for your needs?

I never wrote a C++ compatible allocator for instance,
but this doesn't matter if you are using C you can
write an much simpler allocator.


Bonita Montero

unread,
Apr 23, 2019, 2:57:06 PM4/23/19
to
> What is your view about using C and making everything
> tailored for your needs?

... less convenient than what I suggest here.

Öö Tiib

unread,
Apr 24, 2019, 7:25:33 AM4/24/19
to
How so? It is not Java String. It has remobe_prefix(),
remove_suffix() and operator=(). I meant string_view is
supported in lot of places of standard library as alternative
to std::string. So you can use it as interface to array (that
isn't accepted in those places).

It is sure, less flexible than std::string but you need it
only as performance optimization. Performance optimizations
are worth to be done only in about 5% of code base.
That 5% is what takes up 95% of run-time when we profile
our product's run. So 95% of our std::strings can remain
std::strings forever without any difference and only about
5% we can consider if there is point to optimize or not.

If you really need full flexibility of appends, erases, inserts
and replaces of string then you can consider combo of
char array, custom allocator for "managing" the array,
basic_string using that allocator for mutating operations
and std::string_view for passing the result to outside world.
That combo can all be enwrapped to class and can nicely
sit in stack as whole but takes some coding and testing
for to combine it nicely together.
0 new messages