Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

"Inside STL: The string" by Raymond Chen

40 views
Skip to first unread message

Lynn McGuire

unread,
Aug 3, 2023, 8:42:34 PM8/3/23
to
"Inside STL: The string" by Raymond Chen
https://devblogs.microsoft.com/oldnewthing/20230803-00/?p=108532

"You might think that a std::string (and all of its friends in the
std::basic_string family) are basically a vector of characters
internally. But strings are organized differently due to specific
optimizations permitted for strings but not for vectors."

I've always thought the internal buffer was a cool idea.

Lynn

Paavo Helde

unread,
Aug 4, 2023, 2:17:52 AM8/4/23
to
You mean small string optimization? Yes, that's nifty. Still, I think it
could be made better.

Current mainstream (64-bit) implementations use SSO buffer of 16 bytes.
However, when a string is used inside an union which is larger, it could
well use a larger buffer, but there is no way to set this up.

A polymorphic variant class which I once made is 24 bytes. The last byte
in the class is the variant type tag, which is chosen to be 0 for small
strings, so that I can store zero-terminated small UTF-8 strings of up
to 23 bytes in it. I do not record the string length separately for
small strings as it is cheap to just calculate it by strlen() whenever
needed.


Lynn McGuire

unread,
Aug 4, 2023, 3:56:18 PM8/4/23
to
We compress large strings of more than 1,000 bytes so this is
interesting to me. Some of our strings go up to a GB in size.

Lynn


Richard

unread,
Aug 8, 2023, 11:33:51 AM8/8/23
to
[Please do not mail me a copy of your followup]

Paavo Helde <ees...@osa.pri.ee> spake the secret code
<uai55v$15u7i$1...@dont-email.me> thusly:

>04.08.2023 03:42 Lynn McGuire kirjutas:
>> I've always thought the internal buffer was a cool idea.
>
>You mean small string optimization? Yes, that's nifty. Still, I think it
>could be made better.

The advantage of string coming from a library and not from the
language is that you can create custom string types specific to your
use case. LLVM/Clang does this for all the internal string
manipulation that they do in order to optimize better than
std::string.
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Terminals Wiki <http://terminals-wiki.org>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>
0 new messages