Using <string> and c

Ron

unread,

Jan 13, 2012, 9:39:43 AM1/13/12

to

Hello,

I am working with the <string> class and thinking about
workarounds for the unfinished I/O functionallity. I was
reading some older posts that mentioned the use of c_str
to convert string objects but can not find the c_str function
in the help files or the related header files. If anyone can
point me in the right direction it would be greatly appreciated.

Best regards,
Ron

Paul S. Person

unread,

Jan 13, 2012, 2:17:41 PM1/13/12

to

In N3092, it is part of the basic_string template (what it's not in
ours?) and is discussed in 21.4.7:

21.4.7.1 basic_string accessors [string.accessors]
const charT* c_str() const;
const charT* data() const;

1 Returns: a pointer p such that p + i == &operator[](i) for each i in
[0,size()].

2 Throws: nothing.

3 Complexity: constant time.

4 Requires: The program shall not alter any of the values stored in
the character array.

A very long time ago (the source file is dated 12/17/00 -- yes, that's
over 11 years ago!), I implemented a /non-template version/ of class
string (IIRC, this was because Watcom did not, at the time I started
it, provide one).

FWIW, my implementation /copied/ the current content of the string
into a separate buffer and returned a pointer to that buffer.

NOTE (important): since the data in a string can be anything, it can
both include null bytes and not be terminated by one. At least, the
string class I wrote would allow this. I copied all the bytes
controlled by the string and then appended a null byte if one was not
already present.

This, of course, is one reason I returned a pointer to a copy: I
didn't want to insert null bytes into the actual data, since I would
then have to remove them when adding new text (I think I tried that
originally but, trust me on this, it wasn't worth the effort,
particularly when it is kept in mind that the data in a string can
include null bytes).

And, of course, the fourth condition would be violated if a null were
added at the end of the actual data.

This means that each value returned by c_str() is a snapshot, as it
were, of the value of the string object at the time it is invoked.
This copy will not be affected by subsequent use of string, nor will
modifying this copy affect the string object in any way, which may or
may not be what is needed.

However, if you simple malloc a buffer and return its address and use
that in a C++ output sequence, memory will be leaked. This is probably
why I used this dodge at the top of the string class:

private:

mutable vector<char> cStringData; // Used
by c_str() only
vector<char> theData; //
Contains the data

That is, on the one hand, each time c_str() was invoked, the content
potentially overwrote the prior content (so that, if the programmer
wanted to save the state of a string object, it would be necessary to
make another copy) but, on the other hand, any memory allocated was
released when the string object was destructed, thus allowing c_str()
to be used in C++ output sequences.

Since cStringData was initialized by
cStringData = theData;
this also requires the assignment operator for vector<char> to do a
deep copy, but perhaps that is a given (it has been a long long time
since I knew this stuff).

In fact, you might as well see the rest of the body:

vector<char>::iterator theEnd(cStringData.end());

if(empty() || (*(--theEnd) != '\0'))
{
cStringData.insert(cStringData.end(), 1, '\0');
}
return(cStringData.begin());

since it might still work even when embedded in the basic_string
template. That is a decrement operator in front of theEnd, by the way
(in the font I am using, it looks like a single long line, which might
be unclear if your reader does the same thing.) Well, you'd need to
add some spaces here and there to comply with the OW style guide, but
that is relatively minor. As shown, f there is no data, a pointer to
an empty string is returned.
--
"Nature must be explained in
her own terms through
the experience of our senses."

Ron

unread,

Jan 16, 2012, 8:57:23 AM1/16/12

to

Thanks Paul,
Plenty of information to go on here.

By the way (BTW) I am not familiar with FWIW or IIRC, would you expand? :)

E. S. Fabian

unread,

Jan 16, 2012, 10:06:21 AM1/16/12

to

| By the way (BTW) I am not familiar with FWIW or IIRC, would you
| expand? :)

Search for them on the internet - you will find them on sites which list a
whole lot more.

I do recall IIRC - If I Remember Correctly.

And before www days people used to use NB instead of BTW, from the Latin
phrase "Nota bene", literally "note well", short for "note this as well".
Just as to this day in English as well as many other languages one uses
"etc" from the Latin "aet cetera", where "aet" is pronounced nearly the same
as "et" in English, the words meaning "and others".
--
HTH (Hope This Helps), Steve

Paul S. Person

unread,

Jan 16, 2012, 12:50:43 PM1/16/12

to

For What It's Worth

If I Recall Correctly

Both are generally intended to help avoid sounding like an
authoritative know-it-all, even if one happens to be a know-it-all
(authoritative or not).

Paul S. Person

unread,

Jan 16, 2012, 1:03:09 PM1/16/12

to

Quibbles. Yes, quibbles.

Although Latin contains a great many words starting with "AE" (which
White's dictionary, at least, treats as a single letter, not a
diphthong, at the start of a word), "et" is not one of them. Of
course, my White's dictionary was written for students and so may not
be entirely complete; one indication of this is that it wastes half
the volume with an English to Latin section when it could be all Latin
to English and contain twice as many Latin words. (My experience is
that, if you know the language, you don't need an "English to" section
and, if you don't, an "English to" section is worse than useless
because it does not distinguish clearly enough between the actual
meanings of the words given, a recipe for translational disaster.)

And, BTW, I have always thought of "NB" as being a bit more serious
than "BTW". That is, this:

NB: you are discussing this with someone who took some Latin a long
time ago

is different from this:

BTW, you are <etc>.

in that (in my experience and understanding) the first is pointing out
something to which attention should be paid (in the opinion of the
person writing it, of course), while the second is introducing an
off-hand remark which can be safely ignored.

Peter C. Chapin

unread,

Jan 26, 2012, 7:35:07 AM1/26/12

to

On 2012-01-13 14:17, Paul S. Person wrote:

> 21.4.7.1 basic_string accessors [string.accessors]
> const charT* c_str() const;
> const charT* data() const;
>

> 1 Returns: a pointer p such that p + i ==&operator[](i) for each i in

> [0,size()].
>
> 2 Throws: nothing.
>
> 3 Complexity: constant time.
>
> 4 Requires: The program shall not alter any of the values stored in
> the character array.

[snip]

> NOTE (important): since the data in a string can be anything, it can
> both include null bytes and not be terminated by one. At least, the
> string class I wrote would allow this. I copied all the bytes
> controlled by the string and then appended a null byte if one was not
> already present.

The standard is fairly clear, I think, that std::string objects can
contain embedded null characters.

> This, of course, is one reason I returned a pointer to a copy: I
> didn't want to insert null bytes into the actual data, since I would
> then have to remove them when adding new text

The implementation of std::string in Open Watcom reserves an extra byte
of space at the end of the data. When c_str() is called, it drops a null
character into that byte and then returns a pointer to the internal
buffer. It does not attempt to keep a null character at the end of the
data; one is only added when c_str() is called. Modifications to the
string after c_str() can invalidate the pointer c_str() returned, but
that is allowed by the standard.

I don't think this violates condition (4) above. At least that's not my
interpretation. When condition (4) talks about "the program" I think it
is talking about the application. That is, the user of c_str() is not
allowed to modify the array... but the implementation can do what it
likes as long as it obeys the other rules.

Indeed... copying the data violates the complexity requirement because
making such a copy is O(n) not O(1) as required. [NOTE: sharing the
representation of two different strings with some kind of reference (as
g++'s implementation does... or did... also violates that requirement
because of the need to unshare the representation when calling c_str().]

Peter

Ron

unread,

Jan 26, 2012, 9:17:43 AM1/26/12

to

Thanks Peter,

I found the definition in string.hpp.

Best regards,
Ron

Paul S. Person

unread,

Jan 26, 2012, 1:15:05 PM1/26/12

to

On Thu, 26 Jan 2012 07:35:07 -0500, "Peter C. Chapin"
<PCh...@vtc.vsc.edu> wrote:

On 2012-01-13 14:17, Paul S. Person wrote:

>> NOTE (important): since the data in a string can be anything, it can
>> both include null bytes and not be terminated by one. At least, the
>> string class I wrote would allow this. I copied all the bytes
>> controlled by the string and then appended a null byte if one was not
>> already present.
>
>The standard is fairly clear, I think, that std::string objects can
>contain embedded null characters.
>
>> This, of course, is one reason I returned a pointer to a copy: I
>> didn't want to insert null bytes into the actual data, since I would
>> then have to remove them when adding new text
>
>The implementation of std::string in Open Watcom reserves an extra byte
>of space at the end of the data. When c_str() is called, it drops a null
>character into that byte and then returns a pointer to the internal
>buffer. It does not attempt to keep a null character at the end of the
>data; one is only added when c_str() is called. Modifications to the
>string after c_str() can invalidate the pointer c_str() returned, but
>that is allowed by the standard.

That's what I did originally and, if that's what you did in Open
Watcom, that's fine with me. I clearly misunderstood the OPs question.

The problem I was trying to avoid was this:
1) you store "fred" into a string, which now holds 4 characters
2) you use c_str(); now string contains "fred<0>" and holds 5
characters
3) you add "sam" to the same string, which now holds 8 characters:
"fred<0>sam" when what is wanted is, of course, for string to hold 7
characters: "fredsam".

If you are thinking "the code can check the last byte and remove it if
it is a null" that is /exactly/ the problem: it can't. It can't
because a null byte is a valid value that can be stored in a string:
"fred<0>" could be inserted deliberately. How do you tell the
difference between a null that is supposed to be part of the data and
one that was inserted when c_str() was invoked?

But that's OK with me if its OK with you. I'm doing C these days
anyway!

>I don't think this violates condition (4) above. At least that's not my
>interpretation. When condition (4) talks about "the program" I think it
>is talking about the application. That is, the user of c_str() is not
>allowed to modify the array... but the implementation can do what it
>likes as long as it obeys the other rules.

I think I will concede the point. The alternative, arguing about what
the standard means by "the program", is not something I feel like
doing at the moment.

One question: how do you prevent something like this, given the
existence of helpers like explicit_cast<> to remove const-ness:
1) you store "fred" into a string
2) you copy the pointer returned by c_str() into a char * variable
(this is /not/ a deep copy)
3) you then modify the contents pointed to by that variable

Can the compiler even detect this? If not, how can it possibly enforce
the rule? Does it now have to mark every single non-const char *
variable (with or without "signed" or "unsigned") as pointing to or
not pointing to a pointer returned by c_char() and so not modifiable
under any conditions? Or must the compiler simply refuse to allow the
return value of c_char() to be used except with output streams?

>Indeed... copying the data violates the complexity requirement because
>making such a copy is O(n) not O(1) as required.

That is quite true. And an indication that the Open Watcom method is
what was intended.

Peter C. Chapin

unread,

Jan 27, 2012, 9:12:06 AM1/27/12

to

On 2012-01-26 13:15, Paul S. Person wrote:

> The problem I was trying to avoid was this:
> 1) you store "fred" into a string, which now holds 4 characters
> 2) you use c_str(); now string contains "fred<0>" and holds 5
> characters
> 3) you add "sam" to the same string, which now holds 8 characters:
> "fred<0>sam" when what is wanted is, of course, for string to hold 7
> characters: "fredsam".

Open Watcom's std::string maintains a count of the number of characters
in the string. So for "fred" the count is four. When c_str() is called
the array is changed to "fred<0>" but the count remains at four. That's
how it can tell that the trailing <0> is not part of the string's data.

> One question: how do you prevent something like this, given the
> existence of helpers like explicit_cast<> to remove const-ness:
> 1) you store "fred" into a string
> 2) you copy the pointer returned by c_str() into a char * variable
> (this is /not/ a deep copy)
> 3) you then modify the contents pointed to by that variable
>
> Can the compiler even detect this? If not, how can it possibly enforce
> the rule?

This situation can't be easily detected. The compiler does not have to
enforce the rule. However, the standard is full of rules that compiler
doesn't have to enforce. Nevertheless the programmer must obey them if
he/she expects the program to work.

Peter

Using <string> and c_str

Ron

Paul S. Person

Ron

E. S. Fabian

Paul S. Person

Paul S. Person

Peter C. Chapin

Ron

Paul S. Person

Peter C. Chapin