Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

std::string on "const char *"

202 views
Skip to first unread message

Jarek Blakarz

unread,
Jan 21, 2013, 4:47:46 AM1/21/13
to
Hi

Consider the following program:

const char *str = "my name";
std::string s(str);

std::string allocates the new memory on a heap.
I would like to force the std::string to initially work directly on a string
pointed to by "const char *str" and not allocate anything on a heap.
Of course later on when writing to a string occurs the COW is allowed.

Can I do that ? If so, HOW ?

thanks for answer

Juha Nieminen

unread,
Jan 21, 2013, 5:22:30 AM1/21/13
to
By implementing your own string class which does that.

--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---

Nobody

unread,
Jan 21, 2013, 6:28:05 AM1/21/13
to
On Mon, 21 Jan 2013 01:47:46 -0800, Jarek Blakarz wrote:

> I would like to force the std::string to initially work directly on a string
> pointed to by "const char *str" and not allocate anything on a heap.
> Of course later on when writing to a string occurs the COW is allowed.
>
> Can I do that ? If so, HOW ?

No; std::string doesn't support that. It always allocates its own memory.

You can write your own string class, but that won't work with functions
which expect a std::string argument. And subclassing std::string won't
work as none of its methods are virtual, including its destructor.

Öö Tiib

unread,
Jan 21, 2013, 8:47:29 AM1/21/13
to
On Monday, 21 January 2013 11:47:46 UTC+2, Jarek Blakarz wrote:
> Hi
>
> Consider the following program:
>
> const char *str = "my name";
> std::string s(str);

That is not C++ program, it does not compile on any C++ compilers I know of.
You probably meant:

#include <string>
int main()
{
const char *str = "my name";
std::string s(str);
}


> std::string allocates the new memory on a heap.

In lot (majority?) of implementations it does not. Lot of std::string
implementations use short string optimization and that means that the
"my name" is copied into stack (where that s resides). Other reason is
that C++ compiler may optimize the above code totally away since it
does nothing externally observable.

> I would like to force the std::string to initially work directly on a string
> pointed to by "const char *str" and not allocate anything on a heap.

Use the str then, why you need that s? A class that depends on external
management of resources that it "holds" is not worth making.

> Of course later on when writing to a string occurs the COW is allowed.
>
> Can I do that ? If so, HOW ?

Can you tell us why you need such a monster? I have not met much issues with
performance of std::string during past 15 years or so. My personal feeling
is that you are fixing something that is not broken by creating something
that IS broken.

Marcel Müller

unread,
Jan 21, 2013, 11:41:51 AM1/21/13
to
On 21.01.2013 10:47, Jarek Blakarz wrote:
> Hi
>
> Consider the following program:
>
> const char *str = "my name";
> std::string s(str);
>
> Can I do that ? If so, HOW ?

As the others suggested you need to write your own string class for this
purpose.

But if you want to go this way, I have some hints. (I have already done
this before.)

You will need *two* separate classes. One for ordinary strings and one
for compile time constant strings. The reason is quite easy: both accept
const char* as source for construction, but only one of them requires
that the lifetime of the storage behind the pointer exceeds the lifetime
of your string and maybe also copies of the string. C++11's constexpr
could be helpful.

Furthermore you need to decide whether you convert your constant string
to mutable strings at some place or if you modify you mutable string
class in a way that it does not free the storage of your constant
strings. The latter requires that length information and possibly a
reference count is not allocated in the same chunk of memory than the
string content. Fortunately this is common practice.

In fact you need a good reason to do all that, since it breaks
compatibility with std::string. Of course, you could provide a
conversion to std::string, but this would require a new allocation on
each conversion - a really bad idea.

In my case the reason was a C style plug-in interface that did not allow
dynamic allocations of storage that is shared between plug-in and main
program before an initialization function has been called.


Marcel

Paavo Helde

unread,
Jan 21, 2013, 12:37:34 PM1/21/13
to
Jarek Blakarz <jumi...@gmail.com> wrote in
news:b281c5c8-1d2e-4cf1...@googlegroups.com:
Not easily as others have pointed out. And you have not presented any
possible reason for this, at the moment it looks like yet another
unjustified premature optimization attempt.

Note that a program needs to convert static strings into std::strings
only once. Static strings are part of the binary image on disk. Reading
the data in from the disk is orders of magnitude slower than performing
the in-memory copy for initializing std::string, so there is hardly any
point in trying to optimize the latter. If loading static data is too
slow you most probably need to get a faster hard drive instead.

hth
Paavo

Seungbeom Kim

unread,
Jan 21, 2013, 9:46:54 PM1/21/13
to
On 2013-01-21 09:37, Paavo Helde wrote:
>
> Note that a program needs to convert static strings into std::strings
> only once. Static strings are part of the binary image on disk. Reading
> the data in from the disk is orders of magnitude slower than performing
> the in-memory copy for initializing std::string, so there is hardly any
> point in trying to optimize the latter. If loading static data is too
> slow you most probably need to get a faster hard drive instead.

But there's a difference in the memory footprint. With another layer
of dynamic allocation, the process consumes twice the address space
for each such string that could have remained only in the read-only
segment. In low-memory situations, this could cause other pages to be
swapped out to disk.

Of course, it is another story how likely is a program with such a
large amount of static data to affect the overall system performance.

--
Seungbeom Kim

Richard Damon

unread,
Jan 21, 2013, 11:46:47 PM1/21/13
to
The other side of the issue is that the constructor for this string
needs to know it its parameter really is a static string that will stay
around "forever", or is a temporary buffer that does need to be copied.

You can't really even count on using the const attribute, as it isn't
too hard to get that applied to a temporary buffer. Take the following
code as an example:


string makestring(const char* data) {
string s(data);
return s;
}



...
char buffer[30];
strcpy(buffer, "String 1");

string s1 = makestring(buffer);
strcpy(buffer, "String 2");


if the string constructor just uses the fact that it's parm has type
const char*, then at the end of the code, s1 holds the value
"String 2", since it would have thought that its input was a const
static string when it wasn't, and thus not make a copy of its input.


Also, if it did somehow have a way to really distinguish static strings
from other character buffer, than it would need to somehow store a flag
to determine if the old data pointer needs to be deleted, which may well
add a cost to every occurrence of the class.

88888 Dihedral

unread,
Jan 22, 2013, 1:45:42 AM1/22/13
to
在 2013年1月22日星期二UTC+8下午12时46分47秒,Richard Damon写道:
There are subtle differences in wirting c++ programs
to be compiled in a library to be used by others,
and those with the main program with everything
compiled optimized extremely for those constants
not be exposed to others.



Paavo Helde

unread,
Jan 22, 2013, 2:16:17 AM1/22/13
to
Seungbeom Kim <musi...@bawi.org> wrote in news:kdkuj2$dio$1
@usenet.stanford.edu:

> On 2013-01-21 09:37, Paavo Helde wrote:
>>
>> Note that a program needs to convert static strings into std::strings
>> only once. Static strings are part of the binary image on disk.
Reading
>> the data in from the disk is orders of magnitude slower than
performing
>> the in-memory copy for initializing std::string, so there is hardly
any
>> point in trying to optimize the latter. If loading static data is too
>> slow you most probably need to get a faster hard drive instead.
>
> But there's a difference in the memory footprint. With another layer
> of dynamic allocation, the process consumes twice the address space
> for each such string that could have remained only in the read-only
> segment. In low-memory situations, this could cause other pages to be
> swapped out to disk.

Yes, the memory footprint may be a problem in case of some programs.
However, this depends on the amount of static data and its usage. If a
program uses gigabytes of static data which goes mostly unmodified, then
yes, it might be appropriate to avoid copying it into std::strings.
Somehow I suspect though that OP was not talking about gigabytes.

Even if he was (talking about gigabytes), the key point would be not to
access the data at all until needed. This can be done with std::strings
as well.

Note that once the data has been copied, the read-only pages can be
swapped out from the working set again (i.e. discarded). This is done
automatically by the OS in case of low memory AFAIK. So the memory
consumption is not doubled in principle, it is only just the read-write
pages are more expensive to deal with in case of memory exhaustion. IOW,
if the memory is exhausted and all programs start trashing, a program
using dynamic memory will trash worse than the one using static memory.
Not sure if this scenario is worth optimizing.


> Of course, it is another story how likely is a program with such a
> large amount of static data to affect the overall system performance.

If most of the static data is never accessed, it should be OK with
current OS-es, the unused parts of the executable file ought to be never
read into the physical RAM.

Cheers
Paavo

Juha Nieminen

unread,
Jan 22, 2013, 3:33:26 AM1/22/13
to
ᅵᅵ Tiib <oot...@hot.ee> wrote:
> On Monday, 21 January 2013 11:47:46 UTC+2, Jarek Blakarz wrote:
>> Hi
>>
>> Consider the following program:
>>
>> const char *str = "my name";
>> std::string s(str);
>
> That is not C++ program, it does not compile on any C++ compilers I know of.
> You probably meant:

Are you seriously being that nitpicky?

Jorgen Grahn

unread,
Jan 22, 2013, 5:36:48 AM1/22/13
to
On Tue, 2013-01-22, Richard Damon wrote:
...
> The other side of the issue is that the constructor for this string
> needs to know it its parameter really is a static string that will stay
> around "forever", or is a temporary buffer that does need to be copied.
>
> You can't really even count on using the const attribute, as it isn't
> too hard to get that applied to a temporary buffer.
[snip]

It's simply the usual old semantics: 'const Foo* foo;' doesn't in any
way guarantee that '*foo' cannot legally change. It just says you
cannot legally change it by using just 'foo'.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Richard Damon

unread,
Jan 22, 2013, 8:48:05 AM1/22/13
to
On 1/22/13 2:16 AM, Paavo Helde wrote:
>
> Note that once the data has been copied, the read-only pages can be
> swapped out from the working set again (i.e. discarded). This is done
> automatically by the OS in case of low memory AFAIK. So the memory
> consumption is not doubled in principle, it is only just the read-write
> pages are more expensive to deal with in case of memory exhaustion. IOW,
> if the memory is exhausted and all programs start trashing, a program
> using dynamic memory will trash worse than the one using static memory.
> Not sure if this scenario is worth optimizing.
>
> Cheers
> Paavo
>

It is worth pointing out that not all machines work this way. Most of
the programs I write will never meet a hard disk, and memory
availability is limited.

In this environment, I do strongly try to avoid making a std:string out
of static strings. If I have a string member that is always getting
initialized to a static string, it may be better to make that member a
char const*

If it is just most cases are static strings, than it may be worth
looking at ways to hold the result for the few dynamic cases to allow
the use of char const*

Jeff Flinn

unread,
Jan 22, 2013, 9:15:13 AM1/22/13
to
Not sure it helps in you case but:

Google "string_ref". There is a standard proposal:
www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3442.html, and a
recently added implementation in boost.

string_ref is separate from std::string though IIUC, has a common
interface and will be usable by templated code where a const
std::string& could be used.

Jeff

Gerald Breuer

unread,
Jan 22, 2013, 10:24:51 AM1/22/13
to
It is possible when you write your own allocator; but it's not
just three lines of code.

Bo Persson

unread,
Jan 22, 2013, 2:20:56 PM1/22/13
to
Jarek Blakarz skrev 2013-01-21 10:47:
> Hi
>
> Consider the following program:
>
> const char *str = "my name";
> std::string s(str);
>
> std::string allocates the new memory on a heap.
> I would like to force the std::string to initially work directly on a string
> pointed to by "const char *str" and not allocate anything on a heap.
> Of course later on when writing to a string occurs the COW is allowed.
>

For such a short string there is generally no heap allocation. Most
string implementations use a small-string-optimization where short
strings are stored inside the std::string object.

Here is a post in another forum, showing that constructing or copying a
small string only uses 4-5 machine instructions, and executes in
nanoseconds.

http://stackoverflow.com/a/11639305/597607



Bo Persson


goran...@gmail.com

unread,
Jan 23, 2013, 5:48:41 AM1/23/13
to
On Tuesday, January 22, 2013 4:24:51 PM UTC+1, Gerald Breuer wrote:
> It is possible when you write your own allocator; but it's not
>
> just three lines of code.

It's not an std::string anymore though ;-)

Goran.

Nobody

unread,
Jan 23, 2013, 1:33:55 PM1/23/13
to
On Tue, 22 Jan 2013 20:20:56 +0100, Bo Persson wrote:

> For such a short string there is generally no heap allocation. Most string
> implementations use a small-string-optimization where short strings are
> stored inside the std::string object.

For a counterpoint, GNU libstdc++ always allocates. Specifically, a
std::string consists of a single pointer which points to the first byte of
the string data, which is preceded by a 3-word header containing the
length, the capacity and a reference count.

The advantages are that a std::string is only as large as a pointer, and
can in fact be cast to a char*, so if you have a pointer to an array of
std::string you can pass it to a function expecting a "const char * const *".

Jarek Blakarz

unread,
Jan 29, 2013, 5:46:24 AM1/29/13
to
Thanks to all of you for pointing out a lot of interesting details.
Currently I'm refactoring the C++ C-style code.
The code contains a lot of static strings that are pointed to by a class
"const char*" members. There are a lot of such static strings but not megabytes
of them. Now
thanks to the information you provided me I'm
considering not to change those "const char*" members at all.

Öö Tiib

unread,
Jan 29, 2013, 12:45:46 PM1/29/13
to
On Tuesday, 22 January 2013 10:33:26 UTC+2, Juha Nieminen wrote:
> ᅵᅵ Tiib <oot...@hot.ee> wrote:
> > On Monday, 21 January 2013 11:47:46 UTC+2, Jarek Blakarz wrote:
> >> Hi
> >>
> >> Consider the following program:
> >>
> >> const char *str = "my name";
> >> std::string s(str);
> >
> > That is not C++ program, it does not compile on any C++ compilers I know of.
> > You probably meant:
>
> Are you seriously being that nitpicky?

Nay, I was it jokingly ... major reason was that a piece of real code was needed to
support the other points (stack? optimizations?) that I made in my answer. Two lines
out of context whatsoever didn't form enough substance to discuss what compilers
do with those.

Öö Tiib

unread,
Jan 29, 2013, 1:43:21 PM1/29/13
to
That feels terrible if someone really casts std::string into 'char const*' .
That would not pass my review: "use c_str()".

If someone casts pointer of first element of vector of strings into
'char const* const*' and it works thanks to an extension ... then I
would require commented static_asserts close-by that detect that it is
indeed an implementation with such extension.

Also, I haven't seen useful functions that take 'char const* const*'
as parameters for a decade or so. Length information readily available
in std::string is on most cases worth its price and so strings commonly
outperform usage of raw char const*. Can you bring example where
lot of strings are used but size of those does not matter?
0 new messages