Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

std::basic_string<> contiguous data?

14 views
Skip to first unread message

Adam White

unread,
Jan 15, 2007, 11:55:52 AM1/15/07
to
Hello,

I've been writing a NetString class for network transmissions, which has
been internally implemented using a std::string.

To read data into the NetString from a network socket, I'd planned to
pre-allocate a large std::string (to use as a buffer), then slurp data
directly from the socket into the string via pointer indirection or by
using the std::string::data() array.

But is std::basic_string<>'s data guaranteed to be in a contiguous array?
I believe that TR1 clarified the intention that std::vector<> data be
stored contiguously - is that also true of std::basic_string<> ?

Thanks,
Adam

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

James Kanze

unread,
Jan 15, 2007, 6:14:48 PM1/15/07
to
Adam White wrote:

> I've been writing a NetString class for network transmissions, which has
> been internally implemented using a std::string.

> To read data into the NetString from a network socket, I'd planned to
> pre-allocate a large std::string (to use as a buffer), then slurp data
> directly from the socket into the string via pointer indirection or by
> using the std::string::data() array.

The pointer returned by std::string::data() is const; you can't
modify the memory it points to.

> But is std::basic_string<>'s data guaranteed to be in a contiguous array?

No.

> I believe that TR1 clarified the intention that std::vector<> data be
> stored contiguously - is that also true of std::basic_string<> ?

No.

Even if it is contiguous, there is nothing which guarantees that
modifications through the pointer returned by data() will
affect the value of the string itself.

(In practice, of course, I think you'll probably be able to get
away with it in all real implementations:-).)

--
James Kanze (GABI Software) email:james...@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Howard Hinnant

unread,
Jan 15, 2007, 7:24:22 PM1/15/07
to
In article <pan.2007.01.15....@iinet.net.au>,
Adam White <spu...@iinet.net.au> wrote:

> Hello,
>
> I've been writing a NetString class for network transmissions, which has
> been internally implemented using a std::string.
>
> To read data into the NetString from a network socket, I'd planned to
> pre-allocate a large std::string (to use as a buffer), then slurp data
> directly from the socket into the string via pointer indirection or by
> using the std::string::data() array.
>
> But is std::basic_string<>'s data guaranteed to be in a contiguous array?
> I believe that TR1 clarified the intention that std::vector<> data be
> stored contiguously - is that also true of std::basic_string<> ?

The current standard does not guarantee a contiguous std::basic_string.
However from a practical viewpoint, there are no commercial
implementations of a non-contiguous basic_string. And C++0X is set to
standardize that fact:

http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#530

So I think you are good to go.

-Howard

Ben Craig

unread,
Jan 15, 2007, 7:23:03 PM1/15/07
to

Adam White wrote:
> To read data into the NetString from a network socket, I'd planned to
> pre-allocate a large std::string (to use as a buffer), then slurp data
> directly from the socket into the string via pointer indirection or by
> using the std::string::data() array.
>
> But is std::basic_string<>'s data guaranteed to be in a contiguous array?
> I believe that TR1 clarified the intention that std::vector<> data be
> stored contiguously - is that also true of std::basic_string<> ?

Yes, the data stored in a std::basic_string is guaranteed to be
contiguous. However, std::basic_string is still not suited to your
task. You can read data from a string using .data() or .c_str() and
nothing bad will happen, but you are not allowed to write into the
buffer directly because the underlying string could be a copy-on-write
implementation. This is enforced by making the .data() and .c_str()
return types const T *.

Here are a couple of ways to solve your problem, each with it's
disadvantages. You could use temporary buffers and
std::basic_string.assign() to maintain your current design, but with a
performance penalty. You could also retrofit your design to hold a
std::vector<char> instead of a std::string. Writing directly into a
std::vector (&v[0] or &v.front()) works just fine, but you lose some of
the convenience std::string functions. Be sure to use resize and not
reserve for this use of a vector though, as the direct access would not
correctly update the size.

Item 16 (How to pass vector and string data to legacy APIs) of Scott
Meyer's Effective STL has a good treatment of this topic.

Adam White

unread,
Jan 15, 2007, 7:54:15 PM1/15/07
to
On Mon, 15 Jan 2007 18:14:48 -0500, James Kanze wrote:


>> I believe that TR1 clarified the intention that std::vector<> data be
>> stored contiguously - is that also true of std::basic_string<> ?
>
> No.

Darn. So *theoretically* there's not much stopping the DeathStation 9000
implementation from using a data structure something like SGI's rope?

> Even if it is contiguous, there is nothing which guarantees that
> modifications through the pointer returned by data() will affect the
> value of the string itself.

Whoops! I've never used the data() function, so wasn't aware of that.
Sloppy reading on my part.
I guess I'll have to use std::vector<char> instead.

Thanks James,
Adam

shablool

unread,
Jan 16, 2007, 4:34:37 AM1/16/07
to

Adam White wrote:
> Hello,
>
> I've been writing a NetString class for network transmissions, which has
> been internally implemented using a std::string.
>
> To read data into the NetString from a network socket, I'd planned to
> pre-allocate a large std::string (to use as a buffer), then slurp data
> directly from the socket into the string via pointer indirection or by
> using the std::string::data() array.
>
> But is std::basic_string<>'s data guaranteed to be in a contiguous array?
> I believe that TR1 clarified the intention that std::vector<> data be
> stored contiguously - is that also true of std::basic_string<> ?
>
> Thanks,
> Adam
>
> { clc++m banner removed -mod }


Do not use std::string; its inadequate for this kind of tasks. The best
solution for socket reading is to use low-level chars array. I would
also recommend you to take a look at the Strinx library which is more
adequate for low-level character manipulations then std::string (see:
http://strinx.sourceforge.net/)

Alternatively you could use stream-buffers for socket reading. Chapter
13 of N. Josuttis "The C++ Standard Library - A Tutorial and
Reference" describes in details I/O using stream classes, including
user defined stream-buffers.

S.

Alf P. Steinbach

unread,
Jan 16, 2007, 10:53:33 AM1/16/07
to
* Adam White:

>
> I've been writing a NetString class for network transmissions, which has
> been internally implemented using a std::string.
>
> To read data into the NetString from a network socket, I'd planned to
> pre-allocate a large std::string (to use as a buffer), then slurp data
> directly from the socket into the string via pointer indirection or by
> using the std::string::data() array.
>
> But is std::basic_string<>'s data guaranteed to be in a contiguous array?
> I believe that TR1 clarified the intention that std::vector<> data be
> stored contiguously - is that also true of std::basic_string<> ?

As Howard Hinnant remarked else-thread, and as I've mentioned in an
earlier thread in this group, yes in practice: for all widely used
compilers, and slated for adoption in C++0x standard, voted in by the
library working group early this year.

I'm repeating and elaborating on this because follow-ups after Howard's
article have seemingly ignored the fact that only for the purpose of
writing an academic article or book on what the current standard
formally guarantees, ignoring actual compilers and the next standard, is
there any point in allowing the possibility of a non-contiguous buffer.

You're OK with assuming contigous string buffer storage.

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Adam White

unread,
Jan 17, 2007, 12:26:26 AM1/17/07
to
On Mon, 15 Jan 2007 19:24:22 -0500, Howard Hinnant wrote:

[snippery]

> The current standard does not guarantee a contiguous std::basic_string.
> However from a practical viewpoint, there are no commercial
> implementations of a non-contiguous basic_string. And C++0X is set to
> standardize that fact:
>
> http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#530

Thanks Howard and Alf. For my problem, I decided to refactor using a
std::vector<char> - in the end I didn't really need the "convenience"
functions of std::string.

Plus using std::vector<>::reserve(100) might be a little more efficient
for initialisation than a std::string(100, '\0') when I'm planning on
replacing the std::string contents anyway.

Cheers,
Adam

peter koch larsen

unread,
Jan 17, 2007, 1:15:34 PM1/17/07
to

Adam White skrev:

> On Mon, 15 Jan 2007 19:24:22 -0500, Howard Hinnant wrote:
>
> [snippery]
>
> > The current standard does not guarantee a contiguous std::basic_string.
> > However from a practical viewpoint, there are no commercial
> > implementations of a non-contiguous basic_string. And C++0X is set to
> > standardize that fact:
> >
> > http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#530
>
> Thanks Howard and Alf. For my problem, I decided to refactor using a
> std::vector<char> - in the end I didn't really need the "convenience"
> functions of std::string.
>
> Plus using std::vector<>::reserve(100) might be a little more efficient
> for initialisation than a std::string(100, '\0') when I'm planning on
> replacing the std::string contents anyway.
Just remember that you can not use the memory just because you reserve
it:

std::vector<char> v;
v.reserve(100);
v[0] = '?'; //bug!

This is a (small) problem when you need a buffer. You wmll have to
either use raw operator new[] or use std::vector and live with a
default initialisation.

/Peter

Steven E. Harris

unread,
Jan 17, 2007, 3:47:58 PM1/17/07
to
Adam White <spu...@iinet.net.au> writes:

> Plus using std::vector<>::reserve(100) might be a little more
> efficient for initialisation than a std::string(100, '\0') when I'm
> planning on replacing the std::string contents anyway.

std::basic_string also has a reserve() member function.ą


Footnotes:
ą http://www.dinkumware.com/manuals/default.aspx?page=string2.html#basic_string::reserve

--
Steven E. Harris

Gennaro Prota

unread,
Jan 17, 2007, 7:26:23 PM1/17/07
to
On 15 Jan 2007 19:24:22 -0500, Howard Hinnant wrote:

>The current standard does not guarantee a contiguous std::basic_string.
>However from a practical viewpoint, there are no commercial
>implementations of a non-contiguous basic_string. And C++0X is set to
>standardize that fact:
>
>http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#530

I'm a little perplexed, by two things:

* the DR wording "I don't believe it's possible to write a useful and
standard-conforming basic_string that isn't contiguous". That's not a
proof; any more details which can convince (me, but possibly others as
well) that a non-contiguous implementation is impossible?

* what's the real usefulness of mandating contiguity? I know of many
good uses for a contiguous vector< T > (particularly vector< unsigned
char >), but I have a hard time figuring out why one would want a
contiguous basic_string< T >.

Adam White

unread,
Jan 18, 2007, 2:39:31 AM1/18/07
to
On Wed, 17 Jan 2007 13:15:34 -0500, peter koch larsen wrote:

> Just remember that you can not use the memory just because you reserve
> it:
>
> std::vector<char> v;
> v.reserve(100);
> v[0] = '?'; //bug!

Thanks Peter,

Yeah... I'm all over that one. I haven't had a chance to write the
slurpification(tm) procedure yet, but I had planned to work in a
std::back_inserter<>. But on reflection, that probably won't work,
the POSIX/Winsock2 socket API read() works on raw buffers, not via
iterators.

> This is a (small) problem when you need a buffer. You wmll have to
> either use raw operator new[] or use std::vector and live with a
> default initialisation.

I think you're right. I'll have to live with the bogus default
initialisation. The extra work to use operator new[] wrapped in a
boost::shared_array<> would definitely hurt code clarity.

Adam

Timo Geusch

unread,
Jan 19, 2007, 1:24:31 AM1/19/07
to
Gennaro Prota wrote:

> On 15 Jan 2007 19:24:22 -0500, Howard Hinnant wrote:
>
> > The current standard does not guarantee a contiguous
> > std::basic_string. However from a practical viewpoint, there are
> > no commercial implementations of a non-contiguous basic_string.
> > And C++0X is set to standardize that fact:
> >
> > http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#530
>
> I'm a little perplexed, by two things:
>
> * the DR wording "I don't believe it's possible to write a useful and
> standard-conforming basic_string that isn't contiguous". That's not a
> proof; any more details which can convince (me, but possibly others as
> well) that a non-contiguous implementation is impossible?
>
> * what's the real usefulness of mandating contiguity? I know of many
> good uses for a contiguous vector< T > (particularly vector< unsigned
> char >), but I have a hard time figuring out why one would want a
> contiguous basic_string< T >.

I think we're back to codifying existing practise here - my guess (as I
don't have enough C++ compilers on this here laptop to verify my
assumptions) is that most current implementation are using contiguous
memory for the simple reason that it makes certain operations like
copying, assignment, searching and of course c_str() a lot easier to
implement. However that would be a trade-off inasmuch as you'd pay for
it due to the more expensive growing and shrinking operations.

using non-contiguous memory would certain make any operation that
involves growing or shrinking the string a lot more efficient (but not
necessarily easier to implement) but you'd end up complicating all
operations that require string traversal; Even simple ones like
some_str[10] may suddenly require traversal of several chunks of memory.

--
The lone C++ coder's blog: http://www.bsdninjas.co.uk/codeblog/

Gennaro Prota

unread,
Jan 19, 2007, 11:27:09 AM1/19/07
to
On 19 Jan 2007 01:24:31 -0500, Timo Geusch wrote:

>Gennaro Prota wrote:
>
>> > http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#530
>>
>> I'm a little perplexed, by two things:
>>
>> * the DR wording "I don't believe it's possible to write a useful and
>> standard-conforming basic_string that isn't contiguous". That's not a
>> proof; any more details which can convince (me, but possibly others as
>> well) that a non-contiguous implementation is impossible?
>>
>> * what's the real usefulness of mandating contiguity? I know of many
>> good uses for a contiguous vector< T > (particularly vector< unsigned
>> char >), but I have a hard time figuring out why one would want a
>> contiguous basic_string< T >.
>
>I think we're back to codifying existing practise here

But that's not codifying existing practice; or we would codify that
virtual function calls have to be implemented using vtables, or that
the one-argument isalpha(), isspace() etc. have to use a lookup table.

The point is IMHO: if contiguity is already implied (by consequence of
other requirements) then it is a good thing to spell that out; if it
isn't I don't see why it should be, given that we already have vector<
unsigned char >.

Genny.

--

Carlos Moreno

unread,
Jan 21, 2007, 10:54:22 AM1/21/07
to
Alf P. Steinbach wrote:

> is
> there any point in allowing the possibility of a non-contiguous buffer.

Well yes! Independently of whether or not the advantages of allowing
non-contiguous storage outweight the disadvantages, there are certain
situations where the string could do really better if it were allowed
to store data non-contiguously.

string s (" ..... .... .... ");

s.replace (0, 1, "xx");

This operation takes linear time on the length of the string --- it
would be done in constant time if the string class were allowed to
keep a sequence of chunks; the complexity of most operations
(including those that require the "value" of the string as a whole)
would remained approximately the same (for instance, string::str()
would take now O(N + Nc) instead of O(N) --- where N is the number
of characters, and Nc is the number of chunks; but presumably, Nc
should be much lower than N, so it's still close to O(N) --- slower,
yes, but *much* slower??).

I know, I know --- most likely, this advantage is *far* outweighted
by the disadvantages and by the advantages of having everything in
contiguous storage. What I'm trying to say is that it's not like
there is no single advantage in allowing non-contiguous storage.

Carlos
--

--

Timo Geusch

unread,
Jan 22, 2007, 9:57:47 PM1/22/07
to
Gennaro Prota wrote:

> But that's not codifying existing practice; or we would codify that
> virtual function calls have to be implemented using vtables, or that
> the one-argument isalpha(), isspace() etc. have to use a lookup table.

Hmm...

I can see what you're saying here and I guess that from this
perspective, "codifying existing practise" is probably the wrong
terminology.

Maybe "codifying existing practise" would have been better but I guess
that would have someone point me at a std::string which is actually
implmented using non-contiguous memory.

> The point is IMHO: if contiguity is already implied (by consequence of
> other requirements) then it is a good thing to spell that out; if it
> isn't I don't see why it should be, given that we already have vector<
> unsigned char >.

But vector< unsigned char > isn't exactly a drop-in replacement for
std::string, is it? Having a std::string layed out in pretty much the
same way as a C char[] probably has additional advantages in
interaction with functions that expect C-style strings (because the
cost of converting the string is minimal) but of course this doesn't
work for strings that are updated inside these functions.

I'm just trying to come up a scenario which would absolutely require
the contiguous array approach for a string but apart from the
aforementioned string::c_str() function and potentially a couple of
string matching algorithms that most people have forgotten a long time
ago, I really can't come up with a good reason. Actually, if you're
doing a lot of string chopping and splicing, having non-contiguous
strings is most likely beneficial from a performance perspective.


--
The lone C++ coder's blog: http://www.bsdninjas.co.uk/codeblog/

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Daniel T.

unread,
Jan 23, 2007, 1:03:51 AM1/23/07
to
"Timo Geusch" <tn...@unixconsult.co.uk> wrote:

> I'm just trying to come up a scenario which would absolutely require
> the contiguous array approach for a string but apart from the
> aforementioned string::c_str() function and potentially a couple of
> string matching algorithms that most people have forgotten a long time
> ago, I really can't come up with a good reason. Actually, if you're
> doing a lot of string chopping and splicing, having non-contiguous
> strings is most likely beneficial from a performance perspective.

string::c_str() and string::data() do not require the storage to be
contiguous. One could say that reserve and capacity imply a contiguous
block of memory, or at least imply that string and vector have some sort
of implementation in common.

I have implemented a string in terms of std::deque<char> before. I think
it works great when you are dealing with large strings in a potentially
fragmented heap for a system that doesn't have virtual memory.

--

James Kanze

unread,
Jan 23, 2007, 11:40:42 AM1/23/07
to
Timo Geusch wrote:
> Gennaro Prota wrote:

> > But that's not codifying existing practice; or we would codify that
> > virtual function calls have to be implemented using vtables, or that
> > the one-argument isalpha(), isspace() etc. have to use a lookup table.

> Hmm...

> I can see what you're saying here and I guess that from this
> perspective, "codifying existing practise" is probably the wrong
> terminology.

> Maybe "codifying existing practise" would have been better but I guess
> that would have someone point me at a std::string which is actually
> implmented using non-contiguous memory.

The SGI rope class is very close to std::string, and uses
non-contiguous memory. Unlike the case with vector, this
possibility was actually discussed at the time; it was an
intentional decision not to require contiguous memory, because
requiring contiguous memory imposed certain trade-offs that the
committee didn't want to impose.

> > The point is IMHO: if contiguity is already implied (by consequence of
> > other requirements) then it is a good thing to spell that out; if it
> > isn't I don't see why it should be, given that we already have vector<
> > unsigned char >.

> But vector< unsigned char > isn't exactly a drop-in replacement for
> std::string, is it?

No, but if you want contiguous memory, it's a possible
replacement. (I use std::vector< char >, rather than
std::string, a lot.)

> Having a std::string layed out in pretty much the
> same way as a C char[] probably has additional advantages in
> interaction with functions that expect C-style strings (because the
> cost of converting the string is minimal) but of course this doesn't
> work for strings that are updated inside these functions.

> I'm just trying to come up a scenario which would absolutely require
> the contiguous array approach for a string but apart from the
> aforementioned string::c_str() function and potentially a couple of
> string matching algorithms that most people have forgotten a long time
> ago, I really can't come up with a good reason. Actually, if you're
> doing a lot of string chopping and splicing, having non-contiguous
> strings is most likely beneficial from a performance perspective.

The idea, doubtlessly, is to allow using std::string for such
things as std::strftime, e.g.:

std::string results( 100, ' ' ) ;
strftime( &results[ 0 ], results.size(), "%H:%M:%S", &timestruct )
;

Presumably, they will also add a non-const data() function as
well, to replace the use of &s[0].

--
James Kanze (GABI Software) email:james...@gmail.com

Conseils en informatique orientie objet/
Beratung in objektorientierter Datenverarbeitung
9 place Simard, 78210 St.-Cyr-l'Icole, France, +33 (0)1 30 23 00 34

--

Timo Geusch

unread,
Jan 24, 2007, 6:14:11 PM1/24/07
to
Daniel T. wrote:

> "Timo Geusch" <tn...@unixconsult.co.uk> wrote:
>
> > I'm just trying to come up a scenario which would absolutely require
> > the contiguous array approach for a string but apart from the
> > aforementioned string::c_str() function and potentially a couple of
> > string matching algorithms that most people have forgotten a long
> > time ago, I really can't come up with a good reason. Actually, if
> > you're doing a lot of string chopping and splicing, having
> > non-contiguous strings is most likely beneficial from a performance
> > perspective.
>
> string::c_str() and string::data() do not require the storage to be
> contiguous.

But that wasn't really what I was trying to say either - what I said
was that they're potentially easier to implement and faster if your
string is actually already stored in contiguous memory as you could
potentially just return a pointer to the internal buffer in response
the c_str()/data() requests as the internal representation matches the
expected externally visible represenation. If you're going for
non-contiguous memory you'll have to do some copying or shuffling
around before you can return the pointer to the data.

> One could say that reserve and capacity imply a contiguous
> block of memory, or at least imply that string and vector have some
> sort of implementation in common.

I actually hadn't thought of that - but then again these idioms are
equally valid in the context of a string that's implemented in, say,
chunks of 64 char fragments...

> I have implemented a string in terms of std::deque<char> before. I
> think it works great when you are dealing with large strings in a
> potentially fragmented heap for a system that doesn't have virtual
> memory.

I can see that and as I mentioned in other posts of this thread I can
also see the non-contiguous approach yielding benefits if you're doing
a lot of string manipulation as I'd expect that you'll gain more on the
concatenation/deletion/splicing that you would lose on the string
reads. And that's on the assumption that read access to a
non-contiguous string using the string member functions (with the
potentially exceptions of c_str and data) would actually be a tad
slower in the first place.

Hmm. I'm suddenly tempted to try and implement such a string class.

--
The lone C++ coder's blog: http://www.bsdninjas.co.uk/codeblog/

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Timo Geusch

unread,
Jan 24, 2007, 6:18:05 PM1/24/07
to
James Kanze wrote:

> Timo Geusch wrote:
> > But vector< unsigned char > isn't exactly a drop-in replacement for
> > std::string, is it?
>
> No, but if you want contiguous memory, it's a possible
> replacement. (I use std::vector< char >, rather than
> std::string, a lot.)

Agreed, it's certainly preferable over some of the more 'interesting'
bodges I've seen that abused std::string...

> > Having a std::string layed out in pretty much the
> > same way as a C char[] probably has additional advantages in
> > interaction with functions that expect C-style strings (because the
> > cost of converting the string is minimal) but of course this doesn't
> > work for strings that are updated inside these functions.
>
> > I'm just trying to come up a scenario which would absolutely require
> > the contiguous array approach for a string but apart from the
> > aforementioned string::c_str() function and potentially a couple of
> > string matching algorithms that most people have forgotten a long
> > time ago, I really can't come up with a good reason. Actually, if
> > you're doing a lot of string chopping and splicing, having
> > non-contiguous strings is most likely beneficial from a performance
> > perspective.
>
> The idea, doubtlessly, is to allow using std::string for such
> things as std::strftime, e.g.:
>
> std::string results( 100, ' ' ) ;
> strftime( &results[ 0 ], results.size(), "%H:%M:%S", &timestruct )
> ;
>
> Presumably, they will also add a non-const data() function as
> well, to replace the use of &s[0].

That, plus passing them into functions exposed by a C API that expects
char */const char * as a parameter. Although I must say I'm a tad
dubious about functions like that having direct interactions with
std::string. I'd feel more comfortable using the std::vector< char >
approach in that case.

--
The lone C++ coder's blog: http://www.bsdninjas.co.uk/codeblog/

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Eugene Gershnik

unread,
Jan 26, 2007, 5:08:37 AM1/26/07
to
On Jan 17, 11:39 pm, Adam White <spud...@iinet.net.au> wrote:
> I'll have to live with the bogus default
> initialisation. The extra work to use operator new[] wrapped in a
> boost::shared_array<> would definitely hurt code clarity.

Why can't you just write a Buffer class that will do what you need? All
you have to do is to tweak any good vector implementation a bit. In
particular you can assume that T parameter is a POD and use realloc().
At the same time remove the default initialization everywhere. This is
a pretty trivial task and I don't understand why do you have to use a
standard class. This reminds me of an anecdotal VB programmer that used
a GUI Listbox widget with a 'sorted' flag set to sort a list of strings
;-)

--
Eugene

Gennaro Prota

unread,
Jan 30, 2007, 4:09:04 AM1/30/07
to
On Tue, 23 Jan 2007 10:40:42 CST, James Kanze wrote:

>[about contiguous basic_strings]


>
>The SGI rope class is very close to std::string, and uses
>non-contiguous memory. Unlike the case with vector, this
>possibility was actually discussed at the time; it was an
>intentional decision not to require contiguous memory, because
>requiring contiguous memory imposed certain trade-offs that the
>committee didn't want to impose.

It is a great fortune that you recall this (if only someone had
volunteered to write a C++ rationale...) Perhaps you have access to
the committee reflectors to point this out? I'm sure that would be
carefully considered. Alternatively I think you can send a note to
Howard Hinnant.

>[...]


>
>The idea, doubtlessly, is to allow using std::string for such
>things as std::strftime, e.g.:
>
> std::string results( 100, ' ' ) ;
> strftime( &results[ 0 ], results.size(), "%H:%M:%S", &timestruct )
>;

Well, that's still easy without requiring contiguity. FWIW, I wrote
something to that effect for Boost.Inspect (possibly to be shared with
other Boost tools):


<http://boost.cvs.sourceforge.net/boost/boost/tools/common/time_string.hpp>

I was also experimenting with a refinement of this with a fallible<
std::string > return type; very low on my todo list, though :-/

--
HELP: many of this group's participants might know that I'm the legit
~~~~ owner of the yahoo account with id 'gennaro_prota'; I would be
immensely grateful to anyone who might help me gaining access
to it again (note that I'm now using the id 'gennaro.prota').
Thanks!

Genny.

Howard Hinnant

unread,
Jan 30, 2007, 9:54:39 PM1/30/07
to
In article <127tr2pjlnhia6mgi...@4ax.com>,
Gennaro Prota <clcppm...@this.is.invalid> wrote:

> On Tue, 23 Jan 2007 10:40:42 CST, James Kanze wrote:
>
> >[about contiguous basic_strings]
> >
> >The SGI rope class is very close to std::string, and uses
> >non-contiguous memory. Unlike the case with vector, this
> >possibility was actually discussed at the time; it was an
> >intentional decision not to require contiguous memory, because
> >requiring contiguous memory imposed certain trade-offs that the
> >committee didn't want to impose.
>
> It is a great fortune that you recall this (if only someone had
> volunteered to write a C++ rationale...) Perhaps you have access to
> the committee reflectors to point this out? I'm sure that would be
> carefully considered. Alternatively I think you can send a note to
> Howard Hinnant.

Fwiw the LWG is aware of the existence of rope, and the history and
rationale for std::string. Though no doubt we welcome input from both
Gennaro and James.

A decade ago we didn't have the experience that we do now with
std::string. After a decade, no vendor has shipped a non-contiguous
string. Vendors have in that time period, done complete rewrites of
string (I've personally rewritten it 3 or 4 times). The spec all but
requires contiguous memory. No vendor has shipped, or has any plans to
ship, a non-contiguous string. Given this existing practice, it makes
sense to just standardize existing practice so that clients can take
advantage of the de-facto standard with more confidence in the future of
their code.

That isn't to say that containers such as rope are not valuable and have
good applications. They do. This is only to say that std::string is
not a rope, or a deque. It is contiguous. It is also different from a
vector. Optimizations such as reference counting or
short-string-optimization are not allowed with vector.

> >[...]
> >
> >The idea, doubtlessly, is to allow using std::string for such
> >things as std::strftime, e.g.:
> >
> > std::string results( 100, ' ' ) ;
> > strftime( &results[ 0 ], results.size(), "%H:%M:%S", &timestruct )
> >;
>
> Well, that's still easy without requiring contiguity. FWIW, I wrote
> something to that effect for Boost.Inspect (possibly to be shared with
> other Boost tools):
>
>
> <http://boost.cvs.sourceforge.net/boost/boost/tools/common/time_string.hpp>

Thank you for the link. With a contiguous string one can now write this
function so that it is higher performance (avoiding a copy) and/or more
robust (reallocating if the pre-guessed const sz is not large enough).

This alternative may have some disadvantages compared to your design.
But it also has some advantages. Standardizing the existing practice of
a contiguous string puts more design trade-off choices for "time_string"
in the hands of the "time_string" author.

Not standardizing this existing practice does not give implementors more
freedom. We thought it might a decade ago. But the vendors have spoken
both with their implementations, and with their voice at the LWG
meetings. The implementations are going to be contiguous no matter what
the standard says. So the standard might as well give you, the author
of "time_string" choices.

-Howard

James Kanze

unread,
Jan 31, 2007, 8:12:02 AM1/31/07
to
Howard Hinnant wrote:
> In article <127tr2pjlnhia6mgi...@4ax.com>,
> Gennaro Prota <clcppm...@this.is.invalid> wrote:

> > On Tue, 23 Jan 2007 10:40:42 CST, James Kanze wrote:

> > >[about contiguous basic_strings]

> > >The SGI rope class is very close to std::string, and uses
> > >non-contiguous memory. Unlike the case with vector, this
> > >possibility was actually discussed at the time; it was an
> > >intentional decision not to require contiguous memory, because
> > >requiring contiguous memory imposed certain trade-offs that the
> > >committee didn't want to impose.

> > It is a great fortune that you recall this (if only someone had
> > volunteered to write a C++ rationale...) Perhaps you have access to
> > the committee reflectors to point this out? I'm sure that would be
> > carefully considered. Alternatively I think you can send a note to
> > Howard Hinnant.

> Fwiw the LWG is aware of the existence of rope, and the history and
> rationale for std::string. Though no doubt we welcome input from both
> Gennaro and James.

Well, I really don't have any strong opinion vis-a-vis the
contiguous memory requirement. My own feelings concerning
std::string are:
-- I don't like the interface---I definitly would have done it
differently---, but
-- it's quite usable anyway, and
-- special requirements will require a special, user defined
class, but that would be true for any general purpose string
class.

I know that the committee was aware of rope when the standard
was first adopted. That's what I meant when I said that the
decision not to require contiguity was intentional (unlike the
case with std::vector).

> A decade ago we didn't have the experience that we do now with
> std::string. After a decade, no vendor has shipped a non-contiguous
> string. Vendors have in that time period, done complete rewrites of
> string (I've personally rewritten it 3 or 4 times). The spec all but
> requires contiguous memory. No vendor has shipped, or has any plans to
> ship, a non-contiguous string. Given this existing practice, it makes
> sense to just standardize existing practice so that clients can take
> advantage of the de-facto standard with more confidence in the future of
> their code.

Certainly. What good would experience be if we didn't learn
from it? As I say, this one doesn't seem that important to me,
but that's not a valid argument against it.

--
James Kanze (GABI Software) email:james...@gmail.com

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Gennaro Prota

unread,
Jan 31, 2007, 4:31:14 PM1/31/07
to
On Tue, 30 Jan 2007 20:54:39 CST, Howard Hinnant wrote:

>In article <127tr2pjlnhia6mgi...@4ax.com>,
> Gennaro Prota <clcppm...@this.is.invalid> wrote:
>
>> On Tue, 23 Jan 2007 10:40:42 CST, James Kanze wrote:
>>
>> >[about contiguous basic_strings]
>> >
>> >The SGI rope class is very close to std::string, and uses
>> >non-contiguous memory. Unlike the case with vector, this
>> >possibility was actually discussed at the time; it was an
>> >intentional decision not to require contiguous memory, because
>> >requiring contiguous memory imposed certain trade-offs that the
>> >committee didn't want to impose.
>>
>> It is a great fortune that you recall this (if only someone had
>> volunteered to write a C++ rationale...) Perhaps you have access to
>> the committee reflectors to point this out? I'm sure that would be
>> carefully considered. Alternatively I think you can send a note to
>> Howard Hinnant.
>
>Fwiw the LWG is aware of the existence of rope, and the history and
>rationale for std::string. Though no doubt we welcome input from both
>Gennaro and James.

Thanks. Apologies if my comment eventually came through as offensive,
by the way. It certainly wasn't meant as such: I was just thinking
that it would have been easy to forget some good point which was
raised in the original discussions and led to allow non-contiguity. As
Beman Dawes expressed it (referring to software, but I believe the
standard document is very similar to software in this and other
respects):

Rationale is fairly easy to provide at the time decisions are
made, but very hard to accurately recover even a short time later.

--from http://www.boost.org/more/lib_guide.htm

More on this later :-)

> [...]


>> <http://boost.cvs.sourceforge.net/boost/boost/tools/common/time_string.hpp>
>
>Thank you for the link. With a contiguous string one can now write this
>function so that it is higher performance (avoiding a copy) and/or more
>robust (reallocating if the pre-guessed const sz is not large enough).
>
>This alternative may have some disadvantages compared to your design.
>But it also has some advantages. Standardizing the existing practice of
>a contiguous string puts more design trade-off choices for "time_string"
>in the hands of the "time_string" author.
>
>Not standardizing this existing practice does not give implementors more
>freedom. We thought it might a decade ago. But the vendors have spoken
>both with their implementations, and with their voice at the LWG
>meetings. The implementations are going to be contiguous no matter what
>the standard says. So the standard might as well give you, the author
>of "time_string" choices.

Makes a lot of sense, thanks for the nice summary (sorry for some
overquoting but this was so beautifully stated that I didn't feel like
making cuts). It helps considering real examples :-) (Needless to say,
we know that the committee *does* consider real examples; it's just
that we (non wg21 members) don't usually see them. And in this case
the summary in the public issues list is really very different from
yours: it's basically a decorated form of "Perhaps it's already
implied. We should just require it", which led me to suppose that the
rationale could have been lost along the way)

--
HELP: many of this group's participants might know that I'm the legit

owner of the yahoo account with id 'gennaro_prota'; I would be
immensely grateful to anyone who might help me gaining access
to it again (note that I'm now using the id 'gennaro.prota').
Thanks!

Genny.

--

Howard Hinnant

unread,
Feb 1, 2007, 4:10:34 AM2/1/07
to
In article <07u1s2hd57rp05h80...@4ax.com>,
Gennaro Prota <clcppm...@this.is.invalid> wrote:

> Thanks. Apologies if my comment eventually came through as offensive,

No apologies necessary. I did not interpret anything you said as
offensive.

> >Not standardizing this existing practice does not give implementors more
> >freedom. We thought it might a decade ago. But the vendors have spoken
> >both with their implementations, and with their voice at the LWG
> >meetings. The implementations are going to be contiguous no matter what
> >the standard says. So the standard might as well give you, the author
> >of "time_string" choices.
>
> Makes a lot of sense, thanks for the nice summary (sorry for some
> overquoting but this was so beautifully stated that I didn't feel like
> making cuts). It helps considering real examples :-) (Needless to say,
> we know that the committee *does* consider real examples; it's just
> that we (non wg21 members) don't usually see them. And in this case
> the summary in the public issues list is really very different from
> yours: it's basically a decorated form of "Perhaps it's already
> implied. We should just require it", which led me to suppose that the
> rationale could have been lost along the way)

You make a very good point, and I believe it is one that I can try to
improve our process on. I've recently taken to previewing unofficial
issues lists between meetings in the hopes of getting it right for the
official meetings. Your post encouraged me to add rationale to this
issue (and redouble my efforts to do so for other issues). Hope I got
it right. :-)

http://home.twcny.rr.com/hinnant/cpp_extensions/issues_preview/lwg-defect
s.html#530

-Howard

wa...@stoner.com

unread,
Feb 2, 2007, 5:58:58 PM2/2/07
to
On Jan 23, 10:40 am, "James Kanze" <james.ka...@gmail.com> wrote:

> Presumably, they will also add a non-const data() function as
> well, to replace the use of &s[0].

Thats not what the proposed resolution to 530 says (but I think it
would be nice).

If you are going to provide non-const data(), you should add it to the
list of functions that make the string unsharable (4th and 5th bullets
of 21.3/5). data()nonconst should certainly be excluded from the
third bullet of 21.3/5.

Note that if strings are required to be contiguous there is no need
for data()const to invalidate existing iterators and references (and
in practice you could extend this to c_str()const, unless there is
some implementation which is lazy about appending the trailing
zero.). Meaning the third bullet of 21.3/5 could go away, or be
modified.

James Kanze

unread,
Feb 3, 2007, 6:56:27 PM2/3/07
to
wa...@stoner.com wrote:
> On Jan 23, 10:40 am, "James Kanze" <james.ka...@gmail.com> wrote:

> > Presumably, they will also add a non-const data() function as
> > well, to replace the use of &s[0].

> Thats not what the proposed resolution to 530 says (but I think it
> would be nice).

Presumably an oversight. It's been added to std::vector, in
any case.

> If you are going to provide non-const data(), you should add it to the
> list of functions that make the string unsharable (4th and 5th bullets
> of 21.3/5). data()nonconst should certainly be excluded from the
> third bullet of 21.3/5.

> Note that if strings are required to be contiguous there is no need
> for data()const to invalidate existing iterators and references (and
> in practice you could extend this to c_str()const, unless there is
> some implementation which is lazy about appending the trailing
> zero.). Meaning the third bullet of 21.3/5 could go away, or be
> modified.

That's a good point. With regards to c_str(), there's no
problem with regards to being lazy about appending the '\0',
as long as the implementation isn't lazy about allocating
it.

--
James Kanze (Gabi Software) email: james...@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

0 new messages