Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Why string's c_str()? [Overloading const char *()]

144 views
Skip to first unread message

DSF

unread,
Oct 30, 2013, 3:00:26 PM10/30/13
to
Hello group!

I have been using/writing my own string class for some time now. A
while back, I discovered the wonderful ability to overload const char
*. I was now able to use my string anywhere I could use const char *.
But... I had always wondered why the STL string class uses c_str()
instead of overloading const char *(). My first thought on the
subject is that it frees the string class to store the string in any
manner the coder chooses. One option would be to use the format of
[length of string][string]. But then I started reading articles
online stating that overloading const char *() is a bad idea because
it can allow unintended implicit conversions. Of course, most of
these articles used the typical overkill terms such as "dangerous" and
"evil", etc. Without going into detail of the specific dangers and
evils.

I came upon a discussion at the site below that intrigued me.

http://www.dreamincode.net/forums/topic/260662-operator-const-char-and-operator-caveats-with-overloads/

Since the site thread is fairly old, I decided to ask here.

I've reposted a small section here. (I hope they don't mind.)

//start
D.I.C Lover
Re: operator const char* and operator[] - caveats with overloads?

Posted 21 December 2011 - 02:18 PM
The problems will arise from ambiguous conversions. In C++, if you
have a statement with two different types, the compiler will try to
find a cast for one or the other so that the statement can be
evaluated. Now when you start adding conversions like this, you can't
prepare for every possible use of your class. Because of this, someone
who doesn't fully understand the language might try to use your class
in such a way that ambiguous conversions occur. The fact that the
designers of the std::string class left out a conversion operator, is
enough of an explanation to tell me that it should be avoided.

View Postmonkey_05_06, on 21 December 2011 - 03:52 PM, said:
What would be considered "accidental" or "unintended" conversions to
char*?

Things that come to mind are.

01 String a, b;
02 if (a==B) {}
03 if (a=="str") {}
04 if ("str"==a) {}
05 if ("str"!=a) {}
06 if (a > B) {}
07 if (a <= B) {}
08 if ("str" < a) {}
09 *a;
10 (a + 5);

All these things will suddenly just compile, but none of these will do
what you expect from a String class.

What Karel-Lodewijk said is exactly what I am talking about. Without a
conversion operator, most of those statements will not compile. With a
conversion operator they WILL compile, but not do what you expected.
If you feel like chasing around hard to find bugs, then leave it in
there, if not just add a function like the std::string::c_str().
//end


The examples above along with "none of these will do what you expect
from a String class" aroused my curiosity. Except for 09 and 10, I
was pretty certain the rest would do what I expect them to.

So I compiled the above with my string class, adding the harmless
getch() (wait for a keypress) within all of the {} so the whole thing
wouldn't be optimized away and to test the result of the if
statements. I also ran it with string 'a' initialized to "str".

Every single one of the first 8 did exactly what I expected.

9 did nothing (of course), but walking through the assembly code
confirmed it returned a pointer to the first character of string 'a'.
I added a char c; statement, then c = *a; then a printf using 'c' and
it worked. Funny, the compiler won't optimize *a; away, but with the
statement c=*a; it will optimize away the c= if you don't use the 'c'
leaving the *a code intact.

10 I wasn't sure about. Of course, it also did nothing on the
surface, but below it got a pointer to 'a', added 5 to it and returned
the resulting pointer. An overrun with a = "" or a = "str", but
that's not the point.

So what is the "danger" of overloading * (or in the case of above *
and [])? Everything I've written so far has worked as I expected, and
it's very convenient and looks more elegant than the alternatives when
one needs to pass a const char * to an API call, etc.

"'Later' is the beginning of what's not to be."
D.S. Fiscus

Paavo Helde

unread,
Oct 30, 2013, 4:34:33 PM10/30/13
to
DSF <nota...@address.here> wrote in
news:iqh279hter1j3kvtm...@4ax.com:
> But then I started reading articles
> online stating that overloading const char *() is a bad idea because
> it can allow unintended implicit conversions. Of course, most of
> these articles used the typical overkill terms such as "dangerous" and
> "evil", etc. Without going into detail of the specific dangers and
> evils.

Nitpicking: it is called 'operator const char*', not just 'const char*'.

One failing scenario (there are others no doubt): say you have a class
storing a const char* pointer for a later use, assuming it is a static or
otherwise living-forever pointer:

class A {
public:
void f(const char* s) {s_=s;}
void g() {std::cout << s_;}
private:
const char* s_;
};

Now this is set by e.g.

A x;
void h(const char* s) {
x.f(s);
}

int main() {
h("foo");
x.g();
}

Now let's say somebody comes along and cleans up some code to use your
string class instead of const char* in h():

void h(const MyString& s) {
x.f(s);
}

int main() {
h("foo");
x.g(); // BOOM!
}

Everything compiles fine and may even run sometimes as expected, but
nevertheless the pointer which is stored is temporary and turns invalid
immediately after h() returns. Some innocent-looking code cleanup has
just created a dormant bug.

Now imagine this is a million-line system, functions are hundreds of
lines long, half of the code from 20 years back, class A and function h
managed by different teams, etc., and you should start to see why such
automatic conversions are evil. If there is a compile error when calling
x.f(), then somebody has to check the code and at least has a chance to
notice that a temporary pointer must not be passed to class A as it wants
to store it.

Yes, in case of perfect worlds and omnipotent programmers such automatic
conversions are fine. However, in that case C++ is not needed at all,
omnipotent programmers can write machine code directly ;-)

Cheers
Paavo

Öö Tiib

unread,
Oct 30, 2013, 8:38:07 PM10/30/13
to
On Wednesday, 30 October 2013 21:00:26 UTC+2, DSF wrote:
> But... I had always wondered why the STL string class uses c_str()
> instead of overloading const char *().

Most of novices to C or C++ are confused by tendency of raw arrays to
transform into raw pointers on most cases of usage. So standard library
designers decided not to mimic that confusing behavior.

Generally all implicit conversions are evil and raw pointers are rarely
needed in modern C++. So why you care?

Alf P. Steinbach

unread,
Oct 30, 2013, 11:45:25 PM10/30/13
to
On 30.10.2013 21:34, Paavo Helde wrote:
>
> Now let's say somebody comes along and cleans up some code to use your
> string class instead of const char* in h():
>
> void h(const MyString& s) {
> x.f(s);
> }
>
> int main() {
> h("foo");
> x.g(); // BOOM!
> }
>
> Everything compiles fine and may even run sometimes as expected, but
> nevertheless the pointer which is stored is temporary and turns invalid
> immediately after h() returns. Some innocent-looking code cleanup has
> just created a dormant bug.

Well that doesn't happen with MY string class. And so, lacking
information about the OP's string class, it does not necessarily happen
with that class either (although I suspect it would). In short, while
the failure mode idea is in the right direction, you're making some
assumptions here that do not necessarily hold, leading to an invalid
conclusion about inherent danger of implicit conversion.

In other words, a little fallacy. ;-)

Equally important for discussing goodness and badness of something like
this conversion, one must consider advantages and problems. The above
scenario, when it happens, shows that implicit conversion can expose
problems in bad code. This can be considered an advantage -- on top of
the sheer convenience of the implicit conversion, when done properly.


Cheers & hth.,

- Alf

Alf P. Steinbach

unread,
Oct 30, 2013, 11:52:28 PM10/30/13
to
I think, for high level programming one would be better off using a more
purely high level language.

C++, while supporting abstraction, deals with more low level stuff such
as calling OS API functions, and doing that very efficiently.

I find it very annoying both to write and to read all those .c_str()
explicit conversion operations. It would be okay if it was a rarely
invoked operation, full of dangers. But it's commonplace and harmless
(well mostly, but then anything /can/ be dangerous in C++, and one can't
avoid using integers or whatever).

Tobias Müller

unread,
Oct 31, 2013, 2:47:58 AM10/31/13
to
DSF <nota...@address.here> wrote:

[...]

> I've reposted a small section here. (I hope they don't mind.)
>
> //start

[...]

> 01 String a, b;
> 02 if (a==B) {}
> 03 if (a=="str") {}
> 04 if ("str"==a) {}
> 05 if ("str"!=a) {}
> 06 if (a > B) {}
> 07 if (a <= B) {}
> 08 if ("str" < a) {}
> 09 *a;
> 10 (a + 5);
>
> All these things will suddenly just compile, but none of these will do
> what you expect from a String class.

[...]

> The examples above along with "none of these will do what you expect
> from a String class" aroused my curiosity. Except for 09 and 10, I
> was pretty certain the rest would do what I expect them to.
>
> So I compiled the above with my string class, adding the harmless
> getch() (wait for a keypress) within all of the {} so the whole thing
> wouldn't be optimized away and to test the result of the if
> statements. I also ran it with string 'a' initialized to "str".
>
> Every single one of the first 8 did exactly what I expected.

If your class defines all those operators, everything should be fine. And
IMO a string class _without_ those operators is incomplete.
But in the case it doesn't, all those operators will work on raw pointers.

So in fact those examples are a bit unfortunate, because most of the
expressions are usually considered valid and meaningful for strings.

[...]

> So what is the "danger" of overloading * (or in the case of above *
> and [])? Everything I've written so far has worked as I expected, and
> it's very convenient and looks more elegant than the alternatives when
> one needs to pass a const char * to an API call, etc.

The real dangers lie in expressions that are usually not considered valid
for strings or that are controversial.

Most dangerous is IMO the implicit conversion from const char* to bool:
MyString a;
if (a) // always true
{...}

This one is easy to spot, but it could also be a more complex boolean
expression with a subtle error.

Tobi

Öö Tiib

unread,
Oct 31, 2013, 1:26:32 PM10/31/13
to
On Thursday, 31 October 2013 05:52:28 UTC+2, Alf P. Steinbach wrote:
> On 31.10.2013 01:38, Öö Tiib wrote:
> > On Wednesday, 30 October 2013 21:00:26 UTC+2, DSF wrote:
> >> But... I had always wondered why the STL string class uses c_str()
> >> instead of overloading const char *().
> >
> > Most of novices to C or C++ are confused by tendency of raw arrays to
> > transform into raw pointers on most cases of usage. So standard library
> > designers decided not to mimic that confusing behavior.
> >
> > Generally all implicit conversions are evil and raw pointers are rarely
> > needed in modern C++. So why you care?
>
> I think, for high level programming one would be better off using a more
> purely high level language.

It is difficult to find any higher and better scalable general purpose language
(IOW no limits) than C++. High level solutions should be indeed made with
problem-oriented stuff (like SAP Business Objects) if your problem domain
has such thing and the problem is within limits of it.

> C++, while supporting abstraction, deals with more low level stuff such
> as calling OS API functions, and doing that very efficiently.

Lets say Windows API CreateFileW. It has 7 parameters and at least 10
different responsibilities. If we call such monsters raw and unenwrapped
then the conversions are least of our problems I suppose.
So we write a particular OS API call only once in wrapper and compiler
optimises the wrapper call mostly away anyway.

> I find it very annoying both to write and to read all those .c_str()
> explicit conversion operations. It would be okay if it was a rarely
> invoked operation, full of dangers. But it's commonplace and harmless
> (well mostly, but then anything /can/ be dangerous in C++, and one can't
> avoid using integers or whatever).

I see that c_str() used rather rarely. Basically only in interfaces with
alien languages. Say converting to or wrapping C interface. Interfaces are
rarely maintained. Where else you see masses of that c_str() used?

Alf P. Steinbach

unread,
Oct 31, 2013, 2:03:12 PM10/31/13
to
On 31.10.2013 18:26, 嘱 Tiib wrote:
> On Thursday, 31 October 2013 05:52:28 UTC+2, Alf P. Steinbach wrote:
>> On 31.10.2013 01:38, 嘱 Tiib wrote:
>>> On Wednesday, 30 October 2013 21:00:26 UTC+2, DSF wrote:
>>>> But... I had always wondered why the STL string class uses c_str()
>>>> instead of overloading const char *().
>>>
>>> Most of novices to C or C++ are confused by tendency of raw arrays to
>>> transform into raw pointers on most cases of usage. So standard library
>>> designers decided not to mimic that confusing behavior.
>>>
>>> Generally all implicit conversions are evil and raw pointers are rarely
>>> needed in modern C++. So why you care?
>>
>> I think, for high level programming one would be better off using a more
>> purely high level language.
>
> It is difficult to find any higher and better scalable general purpose language
> (IOW no limits) than C++.

For the "better" you would have to define what you mean by that.

But re higher level general purpose languages, C# and Java come to mind.
These languages have module support, currently lacking in C++ (I don't
know the status of Daveed's proposal). I think these languages scale
rather well, probably better than C++ currently does, due to the lack of
module support in C++.

Even higher level than that you have Python, which works nicely with
C++, but doesn't really scale (as was discovered with YouTube, IIRC).


> High level solutions should be indeed made with
> problem-oriented stuff (like SAP Business Objects) if your problem domain
> has such thing and the problem is within limits of it.
>
>> C++, while supporting abstraction, deals with more low level stuff such
>> as calling OS API functions, and doing that very efficiently.
>
> Lets say Windows API CreateFileW. It has 7 parameters and at least 10
> different responsibilities. If we call such monsters raw and unenwrapped
> then the conversions are least of our problems I suppose.
> So we write a particular OS API call only once in wrapper and compiler
> optimises the wrapper call mostly away anyway.

One cannot wrap all Windows API functions.

And it gets worse when you add in further 3rd party libraries, such as
OpenCV.

Not to mention the C++ standard library itself... ;-)


>> I find it very annoying both to write and to read all those .c_str()
>> explicit conversion operations. It would be okay if it was a rarely
>> invoked operation, full of dangers. But it's commonplace and harmless
>> (well mostly, but then anything /can/ be dangerous in C++, and one can't
>> avoid using integers or whatever).
>
> I see that c_str() used rather rarely. Basically only in interfaces with
> alien languages. Say converting to or wrapping C interface. Interfaces are
> rarely maintained. Where else you see masses of that c_str() used?

In C++03 even std::ofstream constructors required raw C string pointers
for the filenames.

In other words, library functions taking raw C string pointers is a
widespread practice, so "natural" that the std::ofstream design without
std::string argument constructor was adopted.

Regarding the rationale for that, it removes a header dependency and it
generally does not add any conversion, since the conversion generally
has to be done at some level anyway, but I suspect that often it's done
simply because programmers have become used to interfaces like that.

Öö Tiib

unread,
Nov 1, 2013, 11:53:55 AM11/1/13
to
On Thursday, 31 October 2013 20:03:12 UTC+2, Alf P. Steinbach wrote:
> On 31.10.2013 18:26, Öö Tiib wrote:
> > On Thursday, 31 October 2013 05:52:28 UTC+2, Alf P. Steinbach wrote:
> >> On 31.10.2013 01:38, Öö Tiib wrote:
> >>> On Wednesday, 30 October 2013 21:00:26 UTC+2, DSF wrote:
> >>>> But... I had always wondered why the STL string class uses c_str()
> >>>> instead of overloading const char *().
> >>>
> >>> Most of novices to C or C++ are confused by tendency of raw arrays to
> >>> transform into raw pointers on most cases of usage. So standard library
> >>> designers decided not to mimic that confusing behavior.
> >>>
> >>> Generally all implicit conversions are evil and raw pointers are rarely
> >>> needed in modern C++. So why you care?
> >>
> >> I think, for high level programming one would be better off using a more
> >> purely high level language.
> >
> > It is difficult to find any higher and better scalable general purpose language
> > (IOW no limits) than C++.
>
> For the "better" you would have to define what you mean by that.

"Better scalable" in sense that C++ is better doing multithreading and
multiprocessing and multi-computing and so on. While language does not contain
anything supporting it the processes start, fork and shut down faster in practice.

> But re higher level general purpose languages, C# and Java come to mind.
> These languages have module support, currently lacking in C++ (I don't
> know the status of Daveed's proposal). I think these languages scale
> rather well, probably better than C++ currently does, due to the lack of
> module support in C++.

That is true defect of our legal system (IOW standard C++). In actual reality
we have all the dlls, libraries and executables like everybody. Ours work
even better than modules of others in practice. Every "oh so high" language
there uses C or C++ modules. It is pity that our standard avoids legalising
that reality. It however can't be used as argument since if everybody use
our modules and processes how come we don't have them? ;-)

> Even higher level than that you have Python, which works nicely with
> C++, but doesn't really scale (as was discovered with YouTube, IIRC).

We always have used such sidekick script languages in real projects. Python
integrates simpler than Lisp and is better readable than Perl. Something that
does not scale up can not be considered "higher" but more like "servant". ;-)

> > High level solutions should be indeed made with
> > problem-oriented stuff (like SAP Business Objects) if your problem domain
> > has such thing and the problem is within limits of it.
> >
> >> C++, while supporting abstraction, deals with more low level stuff such
> >> as calling OS API functions, and doing that very efficiently.
> >
> > Lets say Windows API CreateFileW. It has 7 parameters and at least 10
> > different responsibilities. If we call such monsters raw and unenwrapped
> > then the conversions are least of our problems I suppose.
> > So we write a particular OS API call only once in wrapper and compiler
> > optimises the wrapper call mostly away anyway.
>
> One cannot wrap all Windows API functions.

One does not /have/ to. In practice it is often advisable not to. For majority of
common things one has option to take a library that already does it
(like boost.program_options, boost::filesystem, boost::asio or boost::interprocess).
If one needs lot of things then there are whole frameworks too (like Qt). So what
remains are very few "special" calls that the libraries do not wrap but that the
requirements demand.

> And it gets worse when you add in further 3rd party libraries, such as
> OpenCV.

I have regretfully had no time to mess with OpenCV but my impression was that it
has C++ API and C 'char*' API of it is deprecated?

> Not to mention the C++ standard library itself... ;-)

There are, yes, plenty of things in standard library that are best for nothing.
Decades long legacy results with stuff like that. Just recently I discovered
(someone asked in comp.lang.c++.moderated) that some things even are
described by standard to possibly do nothing whatsoever like:

std::cin.rdbuf()->pubsetbuf(buffer, sizeof(buffer));

> >> I find it very annoying both to write and to read all those .c_str()
> >> explicit conversion operations. It would be okay if it was a rarely
> >> invoked operation, full of dangers. But it's commonplace and harmless
>
> >> (well mostly, but then anything /can/ be dangerous in C++, and one can't
> >> avoid using integers or whatever).
> >
> > I see that c_str() used rather rarely. Basically only in interfaces with
> > alien languages. Say converting to or wrapping C interface. Interfaces are
> > rarely maintained. Where else you see masses of that c_str() used?
>
> In C++03 even std::ofstream constructors required raw C string pointers
> for the filenames.

Since contents of those raw byte buffers passed to fstream constructors as
"file name" are anyway platform specific we typically need something
(like boost::filesystem) to handle that case in sane manner anyway.

> In other words, library functions taking raw C string pointers is a
> widespread practice, so "natural" that the std::ofstream design without
> std::string argument constructor was adopted.
>
> Regarding the rationale for that, it removes a header dependency and it
> generally does not add any conversion, since the conversion generally
> has to be done at some level anyway, but I suspect that often it's done
> simply because programmers have become used to interfaces like that.

I think I/O library is bad example anyway. Bjarne wrote it ages ago; 2 to 4
other guys "repaired" it and now it is what it is. It gets things done but
nothing of it is pretty. In reality we have abstracted it (or some other I/O)
far away under things like "database", "client", "configuration" etc.

DSF

unread,
Nov 1, 2013, 10:45:06 PM11/1/13
to
That's why the examples struck me as being odd. A string class that
couldn't handle comparisons wouldn't be very useful for most
situations.

>
>> So what is the "danger" of overloading * (or in the case of above *
>> and [])? Everything I've written so far has worked as I expected, and
>> it's very convenient and looks more elegant than the alternatives when
>> one needs to pass a const char * to an API call, etc.
>
>The real dangers lie in expressions that are usually not considered valid
>for strings or that are controversial.
>
>Most dangerous is IMO the implicit conversion from const char* to bool:
>MyString a;
>if (a) // always true
>{...}
>
>This one is easy to spot, but it could also be a more complex boolean
>expression with a subtle error.
>
>Tobi

I'm not quite sure I understand the danger here. Is it that someone
is testing for a NULL char pointer before acting on 'a' in an original
design that's now it's been converted to a string object and may*
always return true?

*It is possible to design a string class that would return a null
pointer under conditions such as being uninitialized or containing
only a zero (an empty string), but I didn't take that route.

Tobias Müller

unread,
Nov 2, 2013, 7:57:24 AM11/2/13
to
DSF <nota...@address.here> wrote:
> On Thu, 31 Oct 2013 06:47:58 +0000 (UTC), Tobias Müller
> <tro...@bluewin.ch> wrote:

[...]

>> The real dangers lie in expressions that are usually not considered valid
>> for strings or that are controversial.
>>
>> Most dangerous is IMO the implicit conversion from const char* to bool:
>> MyString a;
>> if (a) // always true
>> {...}
>>
>> This one is easy to spot, but it could also be a more complex boolean
>> expression with a subtle error.
>>
>> Tobi
>
> I'm not quite sure I understand the danger here. Is it that someone
> is testing for a NULL char pointer before acting on 'a' in an original
> design that's now it's been converted to a string object and may*
> always return true?

I've seen string classes with operator bool() defined as test for non-empty
strings. I wouldn't consider it good design though.

The other thing is just typos. If you construct a boolean expression and
forget e.g. to actually invoke your is_empty() method the compiler will not
complain.

Anyway, if I'd write a string class from scratch it would probably also
have an operator const char*().
I tend to write my functions such that they take the most general type as
parameter and return the most specific type. In case of strings const char*
seems to be most general, you can use the function with most existing
string classes.

> *It is possible to design a string class that would return a null
> pointer under conditions such as being uninitialized or containing
> only a zero (an empty string), but I didn't take that route.

IMO that's even worse than defining operator bool(). That would mean you
couldn't get an empty C string from your string class and you would have to
insert checks everywhere.

[...]

Tobi

Öö Tiib

unread,
Nov 2, 2013, 10:08:51 AM11/2/13
to
On Saturday, 2 November 2013 13:57:24 UTC+2, Tobias Müller wrote:
> DSF <nota...@address.here> wrote:
> > On Thu, 31 Oct 2013 06:47:58 +0000 (UTC), Tobias Müller
> > <tro...@bluewin.ch> wrote:
>
> [...]
>
> >> The real dangers lie in expressions that are usually not considered valid
> >> for strings or that are controversial.
> >>
> >> Most dangerous is IMO the implicit conversion from const char* to bool:
> >> MyString a;
> >> if (a) // always true
> >> {...}
> >>
> >> This one is easy to spot, but it could also be a more complex boolean
> >> expression with a subtle error.
> >
> > I'm not quite sure I understand the danger here. Is it that someone
> > is testing for a NULL char pointer before acting on 'a' in an original
> > design that's now it's been converted to a string object and may*
> > always return true?
>
> I've seen string classes with operator bool() defined as test for non-empty
> strings. I wouldn't consider it good design though.

Yes, besides there are often difference between empty and missing data so
that practice of C++ to silently convert to bool is confusing as rule.
Unintuitive:

float x = get_from_somewhere();
if ( x ) // Q: Does it check for 0 or for NaN or both? A: RTFM.
{
// ...
}

> The other thing is just typos. If you construct a boolean expression and
> forget e.g. to actually invoke your is_empty() method the compiler will not
> complain.

Yes. While most compilers can be set to complain about inbuilt implicit
conversions it usually treats implicit conversions of library as "user-made"
so to get warnings about those one needs usually to make a tool himself.

> Anyway, if I'd write a string class from scratch it would probably also
> have an operator const char*().

Why? If you would write your own 'vector' would you make it implicitly transform
into its 'begin()' iterator?

> I tend to write my functions such that they take the most general type as
> parameter and return the most specific type. In case of strings const char*
> seems to be most general, you can use the function with most existing
> string classes.

Most general type? 'boost::any' or 'void*' if possible? 'char*' is inefficient and
buggy string. It loses the length despite all operations with string perform
better when knowing length of it ahead. Therefore in most code bases
there are several places where 'std::string const&' performs twice better
than 'char const*'.

Paavo Helde

unread,
Nov 2, 2013, 10:52:42 AM11/2/13
to
�� Tiib <oot...@hot.ee> wrote in
news:fff9d975-eb7e-4459...@googlegroups.com:

> On Saturday, 2 November 2013 13:57:24 UTC+2, Tobias M�ller wrote:
>
>> I tend to write my functions such that they take the most general
>> type as parameter and return the most specific type. In case of
>> strings const cha
> r*
>> seems to be most general, you can use the function with most existing
>> string classes.
>
> Most general type? 'boost::any' or 'void*' if possible? 'char*' is
> inefficient and buggy string. It loses the length despite all
> operations with string perform better when knowing length of it ahead.
> Therefore in most code bases there are several places where
> 'std::string const&' performs twice better than 'char const*'.

One can pass the string length as another argument. However, this would
make the call more verbose, which kind of contradicts the main motivation
of providing an automatic conversion operator, and it also makes chaining
of calls impossible, causing more verbosity again.

On top of that, if one wants to implement the function contents in C++
(as opposed to C), then one has to immediately reconstruct std::string or
some other C++ object from the const char* pointer, which potentialy
involves a dynamic memory allocation operator and content copy, resulting
in even larger performance hits than strlen().

So, if the desire is that the function works with a broad range of C++
string classes, then instead of falling back to char* pointers I would
suggest to use templates instead, assuming std::string-compatible
interface. This way the function can even return the same string type
which is passed in, which is certainly much more convenient for the
caller than dealing with the "most specific type" hardcoded by the
function.

Cheers
Paavo



Öö Tiib

unread,
Nov 2, 2013, 1:01:42 PM11/2/13
to
On Saturday, 2 November 2013 16:52:42 UTC+2, Paavo Helde wrote:
> Öö Tiib <oot...@hot.ee> wrote in
> news:fff9d975-eb7e-4459...@googlegroups.com:
> > On Saturday, 2 November 2013 13:57:24 UTC+2, Tobias Müller wrote:
> >
> >> I tend to write my functions such that they take the most general
> >> type as parameter and return the most specific type. In case of
> >> strings const cha
> > r*
> >> seems to be most general, you can use the function with most existing
> >> string classes.
> >
> > Most general type? 'boost::any' or 'void*' if possible? 'char*' is
> > inefficient and buggy string. It loses the length despite all
> > operations with string perform better when knowing length of it ahead.
> > Therefore in most code bases there are several places where
> > 'std::string const&' performs twice better than 'char const*'.
>
> One can pass the string length as another argument. However, this would
> make the call more verbose, which kind of contradicts the main motivation
> of providing an automatic conversion operator, and it also makes chaining
> of calls impossible, causing more verbosity again.

Yes, splitting a fully functional object into number of parameters is usually
bad idea.

> On top of that, if one wants to implement the function contents in C++
> (as opposed to C), then one has to immediately reconstruct std::string or
> some other C++ object from the const char* pointer, which potentialy
> involves a dynamic memory allocation operator and content copy, resulting
> in even larger performance hits than strlen().

Fortunately not everything in C++ has such hits (say Boost.Range).
Unfortunately there are no range-based string abstractions.
Even if there were then refactoring would be painful because 'std::string's
interface is too position-based (in contrast to iterator-based).

> So, if the desire is that the function works with a broad range of C++
> string classes, then instead of falling back to char* pointers I would
> suggest to use templates instead, assuming std::string-compatible
> interface. This way the function can even return the same string type
> which is passed in, which is certainly much more convenient for the
> caller than dealing with the "most specific type" hardcoded by the
> function.

Agreed. Also I feel that 'namespace_of_T::begin( T )' and
'namespace_of_T::end( T )' that return random access iterators and
T's constructor from pair of input iterators is perhaps plentiful
for "std::string-compatible interface" on most of the cases. 'std::string'
has pointlessly large interface to mimic it.

Paavo Helde

unread,
Nov 2, 2013, 3:05:28 PM11/2/13
to
嘱 Tiib <oot...@hot.ee> wrote in
news:04c69bd3-eecb-4b3a...@googlegroups.com:
> 'std::string' has pointlessly large interface to mimic it.

I have no problems with the large interface of std::string. There are some
convenience functions like find_(first|last)(_not|)_of() which I use quite
often. And mimicking std::string interface is quite easy when using
specializations of std::basic_string for another char type, traits or
allocator.

Cheers
Paavo


Öö Tiib

unread,
Nov 2, 2013, 4:24:57 PM11/2/13
to
On Saturday, 2 November 2013 21:05:28 UTC+2, Paavo Helde wrote:
> Öö Tiib <oot...@hot.ee> wrote in
> news:04c69bd3-eecb-4b3a...@googlegroups.com:
> > 'std::string' has pointlessly large interface to mimic it.
>
> I have no problems with the large interface of std::string. There are some
> convenience functions like find_(first|last)(_not|)_of() which I use quite
> often.

Why to cut out of context? I described bare minimum to mimic for passing
to template interface and then said that above. I did nowhere mean that
you should not use what you need from interface of std::string (or even
bigger interface of QString for example) in your own code.

> And mimicking std::string interface is quite easy when using
> specializations of std::basic_string for another char type, traits or
> allocator.

That is still the std::basic_string. In reality the potential user of your module's
interface might have some 'QString', CString, wxString, NSString or some
self-made original_posters::string in her hands. Requiring adaptor from
her that makes it to have matching interface with std::string is sort of asking
for trouble I suppose because std::basic_string has lengthy interface.

DSF

unread,
Nov 2, 2013, 6:40:04 PM11/2/13
to
On Thu, 31 Oct 2013 04:45:25 +0100, "Alf P. Steinbach"
<alf.p.stein...@gmail.com> wrote:

>On 30.10.2013 21:34, Paavo Helde wrote:
>>
>> Now let's say somebody comes along and cleans up some code to use your
>> string class instead of const char* in h():
>>
>> void h(const MyString& s) {
>> x.f(s);
>> }
>>
>> int main() {
>> h("foo");
>> x.g(); // BOOM!
>> }
>>
>> Everything compiles fine and may even run sometimes as expected, but
>> nevertheless the pointer which is stored is temporary and turns invalid
>> immediately after h() returns. Some innocent-looking code cleanup has
>> just created a dormant bug.
>
>Well that doesn't happen with MY string class. And so, lacking
>information about the OP's string class, it does not necessarily happen
>with that class either (although I suspect it would). In short, while
>the failure mode idea is in the right direction, you're making some
>assumptions here that do not necessarily hold, leading to an invalid
>conclusion about inherent danger of implicit conversion.

My string class does indeed produce the wrong results. I would
expect it to, considering the hypothetical class A is expecting a
pointer to an object with a substantially longer life. And I agree
that such information could get lost over many years and through many
hands.

Fortunately, I doubt my string class will ever see the light of day,
so that is not really a problem for me.

What I would like to ask you, Mr. Steinbach, is how did you get around
the problem? The string doesn't know it's a temporary, and it doesn't
know how the pointer it passes is going to be used, other than it
can't be used to write to the string. I don't see any way, without
altering the original code, that the original code won't receive an
orphaned pointer.


{snip}

>Cheers & hth.,
>
>- Alf

DSF

unread,
Nov 2, 2013, 7:21:35 PM11/2/13
to
On Wed, 30 Oct 2013 17:38:07 -0700 (PDT), 嘱 Tiib <oot...@hot.ee>
wrote:

>On Wednesday, 30 October 2013 21:00:26 UTC+2, DSF wrote:
>> But... I had always wondered why the STL string class uses c_str()
>> instead of overloading const char *().
>
>Most of novices to C or C++ are confused by tendency of raw arrays to
>transform into raw pointers on most cases of usage. So standard library
>designers decided not to mimic that confusing behavior.

I was familiar with pointers before I'd ever heard of C. Still I
had difficulty with them at the start, until it hit me that a pointer
is another name for indirect addressing.

>Generally all implicit conversions are evil and raw pointers are rarely
>needed in modern C++. So why you care?

First, a pet peeve of mine, the term "evil" is overkill. Some
operations may be more subject to purposeful or accidental abuse, and
some may be difficult to learn, but none are truly "evil."

I assume "raw pointers" refer to pointers that are not managed by
the STL, in other words, a plain, simple pointer.

Why do I care? Because I use raw pointers all the time. I write
Windows code, and 70% of the API calls involve a pointer to a
character string, pointer to a structure, pointer to a buffer, etc.

I find pointer manipulation fascinating. They are the mainstays of
my string and container class templates. Why write these when they
exist in the STL? Two reasons:

1. To learn. Writing a string class provided most of my education on
class design. Not only do I learn faster by doing than just reading
how to, but most of my work lies in the manipulation of character
strings. 80% of the books I've read teach based on mathematical
examples. I beat my head against the brick wall of templates, with
occasional rest periods of months, for a long time. I just couldn't
grasp any more than the basics, probably because most books start of
with the old "add ints, floats, etc. using a template" (which I
understood) and then lead off in mostly mathematical directions I
wasn't interested in. I would learn later that was probably because
the things I was interested in were in the STL, so why teach them. So
I decided to write an array-type template. By the time I had it up
and running, I had a pretty good grasp on the more complex aspects of
templates.

2. When I started with C++, I'd never heard of the STL. Once I had,
I found some of its terms odd, and the STL counter-intuitive.

Alf P. Steinbach

unread,
Nov 2, 2013, 9:00:26 PM11/2/13
to
On 02.11.2013 23:40, DSF wrote:
>
> What I would like to ask you, Mr. Steinbach, is how did you get around
> the problem? The string doesn't know it's a temporary

Well that's what it does. ;-)

For the main string class this isn't a problem at all since every
literal for that class is wrapped in a `Literal_string_<T, n>` object.

But for each platform's derived class it's desirable to give the same
treatment to plain `char` or (depending on the class) `wchar_t` string
literals, and although they could be distinguished as literals in C++03,
that involved passing them through a macro[1]. For C++11 a macro can't
detect, since there's no distinguishing feature of a literal in C++11,
but it can enforce[2]. However, I decided that convenience was important
so instead of that enforcement macro I use a convention where any array
of `char const` or `wchar_t const` is regarded as a literal.

My convenience choice isn't quite as 100% safe as the enforcement macro
route would be, but when one is aware of that convention then there's no
real problem, since such arrays[3] simply do not occur naturally.


Cheers & hth.,

- Alf

Notes:
[1] In C++03 and C++98 a special rule allowed a string literal to decay
to pointer to non-const, which constituted a distinguishing feature of
string literals.
[2] E.g. the expression `"" middle ""` is fine when `middle` is a
literal narrow string, and otherwise it's generally a syntax error.
[3] An array is not a pointer.

Paavo Helde

unread,
Nov 3, 2013, 3:33:16 AM11/3/13
to
DSF <nota...@address.here> wrote in
news:2rva79leu30dpkocg...@4ax.com:
>
> Why do I care? Because I use raw pointers all the time. I write
> Windows code, and 70% of the API calls involve a pointer to a
> character string, pointer to a structure, pointer to a buffer, etc.

That's because Windows API is defined in terms of C and not C++.

In C++ one usually writes wrappers or uses other C++ libraries in order
to encapsulate a C API, so the rest of the code can use normal C++ style.
In the wrapper code you need pointers and buffers indeed, but this is a
localized one-time activity.

Example: encapsulate getcwd():

sdt::string My_getcwd() {
wchar_t buff[MAX_PATH];
DWORD n = ::GetCurrentDirectoryW(MAX_PATH, buff);
if (n==0 || n>MAX_PATH) {
throw MyException("my_getcwd failed: " + MyGetLastErrorString());
}
return Win2UtfFileName(std::wstring(buff, n));
}

Here, Win2UtfFileName() is another wrapper function wrapping
WideCharToMultiByte() on Windows an converting Windows UTF-16 to more
portable UTF-8, but that's not the main point here.

Third-party libraries like boost::filesystem probably do this better. But
in any case you should use C++ interfaces in the bulk of your codebase
and not struggling with C pointer-and-buffer madness all the time.

Cheers
Paavo



Alf P. Steinbach

unread,
Nov 3, 2013, 7:22:22 AM11/3/13
to
On 03.11.2013 09:33, Paavo Helde wrote:
> DSF <nota...@address.here> wrote in
> news:2rva79leu30dpkocg...@4ax.com:
>>
>> Why do I care? Because I use raw pointers all the time. I write
>> Windows code, and 70% of the API calls involve a pointer to a
>> character string, pointer to a structure, pointer to a buffer, etc.
>
> That's because Windows API is defined in terms of C and not C++.
>
> In C++ one usually writes wrappers or uses other C++ libraries in order
> to encapsulate a C API, so the rest of the code can use normal C++ style.
> In the wrapper code you need pointers and buffers indeed, but this is a
> localized one-time activity.

IMHO it's generally a good idea to wrap, but it's not practical to wrap
everything in the APIs and libraries one uses.

For high level programming where most everything low level is already
wrapped up, I would use e.g. C# or Java.


> Example: encapsulate getcwd():
>
> sdt::string My_getcwd() {
> wchar_t buff[MAX_PATH];
> DWORD n = ::GetCurrentDirectoryW(MAX_PATH, buff);
> if (n==0 || n>MAX_PATH) {
> throw MyException("my_getcwd failed: " + MyGetLastErrorString());
> }
> return Win2UtfFileName(std::wstring(buff, n));
> }

Demonstrates two needless string copying operations, one needless
dynamic allocation, introduction of a needless possible failure mode
(translation) and a choice of representation that makes further
operations with the string inefficient on this platform, and that even
makes display of that string impractical for debugging.

Probably this is all a trade-off for easy cross platform development
with types dictated by the original platform.

As such it's not necessarily "wrong", but it sure ain't perfect. ;-)


> Here, Win2UtfFileName() is another wrapper function wrapping
> WideCharToMultiByte() on Windows an converting Windows UTF-16 to more
> portable UTF-8, but that's not the main point here.

Oh. I think, on the contrary, that it's a pretty important point, as far
as we're discussing practical programming methodology.

For, the above fundamental code needlessly MIXES RESPONSIBILITIES.

Mixing responsibilities can be fine at higher levels (when they have to
be mixed) and/or when everything works perfectly, but in the above code
we're down at fundamental level that's invoked by all higher level code,
and here at bottom there is the silly extra work done, at the cost of
efficiency and some reliability, to create an unsuitable string
representation for the platform, at further cost.

Probably the perceived need to use UTF-8 representation internally in
the program, is great, and probably most all of the code is based on
this choice.

So as a practical matter I advice to (at least at first) merely SEPARATE
RESPONSIBILITIES, at least in any new wrappers.

Calling an API function in a safe way with errors translated to
exceptions, that's one thing. Adapting the function to the existing code
environment, that's another thing. They are better separate.

Like, say yes to both.

One doesn't have to choose one.


> Third-party libraries like boost::filesystem probably do this better.

Apparently boost::filesystem was fine in version 2.

Currently, version 3, it's unable to handle Windows filenames in general
when used with the g++ compiler (it does handle them OK with Visual C++,
by using a Visual C++ extension of the standard library).

Considering the very large effort that has gone into developing and
quality checking Boost filesystem, this is a good demonstration that
it's difficult to do "wrappers" right -- and then, relying on the
ungood wrapper the bugs and limitations are propagated to all the code.
That's very much worth having in mind as one continues to write
wrappers. And yes I do write them, all the time, but very carefully.


> But
> in any case you should use C++ interfaces in the bulk of your codebase
> and not struggling with C pointer-and-buffer madness all the time.

A good ideal to aim for. :-)

Öö Tiib

unread,
Nov 3, 2013, 7:41:33 AM11/3/13
to
On Sunday, 3 November 2013 14:22:22 UTC+2, Alf P. Steinbach wrote:
> On 03.11.2013 09:33, Paavo Helde wrote:
> > In C++ one usually writes wrappers or uses other C++ libraries in order
> > to encapsulate a C API, so the rest of the code can use normal C++ style.
> > In the wrapper code you need pointers and buffers indeed, but this is a>
> > localized one-time activity.
>
> IMHO it's generally a good idea to wrap, but it's not practical to wrap
> everything in the APIs and libraries one uses.

It is. If for nothing else then for adding sanity checks and for RAII.

> For high level programming where most everything low level is already
> wrapped up, I would use e.g. C# or Java.

Matter of taste. Note that in C# and Java RAII does not work. So you can
not encapsulate truly precious resources (with what you apparently deal
if you discuss exotic parts of Windows APIs) elegantly.

> > Example: encapsulate getcwd():
> >
> > sdt::string My_getcwd() {
> > wchar_t buff[MAX_PATH];
> > DWORD n = ::GetCurrentDirectoryW(MAX_PATH, buff);
> > if (n==0 || n>MAX_PATH) {
> > throw MyException("my_getcwd failed: " + MyGetLastErrorString());
> > }
> > return Win2UtfFileName(std::wstring(buff, n));
> > }
>
> Demonstrates two needless string copying operations, one needless
> dynamic allocation, introduction of a needless possible failure mode
> (translation) and a choice of representation that makes further
> operations with the string inefficient on this platform, and that even
> makes display of that string impractical for debugging.

May be. Unless profiler tells that it matters I don't care. But that is
most mundane part of Windows API so I myself prefer:

boost::filesystem::path cwd( boost::filesystem::current_path() );

I do not care if it calls '::GetCurrentDirectoryW()' or what it does.
I use "high level programming language C++" unless enforced to use "low
level programming language C++".

Paavo Helde

unread,
Nov 3, 2013, 8:04:52 AM11/3/13
to
"Alf P. Steinbach" <alf.p.stein...@gmail.com> wrote in
news:l55f6i$opk$1...@dont-email.me:
> On 03.11.2013 09:33, Paavo Helde wrote:
>> sdt::string My_getcwd() {
>> wchar_t buff[MAX_PATH];
>> DWORD n = ::GetCurrentDirectoryW(MAX_PATH, buff);
>> if (n==0 || n>MAX_PATH) {
>> throw MyException("my_getcwd failed: " +
>> MyGetLastErrorString());
>> }
>> return Win2UtfFileName(std::wstring(buff, n));
>> }
>
> Demonstrates two needless string copying operations, one needless
> dynamic allocation, introduction of a needless possible failure mode
> (translation) and a choice of representation that makes further
> operations with the string inefficient on this platform, and that even
> makes display of that string impractical for debugging.
>
> Probably this is all a trade-off for easy cross platform development
> with types dictated by the original platform.
>
> As such it's not necessarily "wrong", but it sure ain't perfect. ;-)

Agreed.

>> Here, Win2UtfFileName() is another wrapper function wrapping
>> WideCharToMultiByte() on Windows an converting Windows UTF-16 to more
>> portable UTF-8, but that's not the main point here.
>
> Oh. I think, on the contrary, that it's a pretty important point, as
> far as we're discussing practical programming methodology.
>
> For, the above fundamental code needlessly MIXES RESPONSIBILITIES.
>
> Mixing responsibilities can be fine at higher levels (when they have
> to be mixed) and/or when everything works perfectly, but in the above
> code we're down at fundamental level that's invoked by all higher
> level code, and here at bottom there is the silly extra work done, at
> the cost of efficiency and some reliability, to create an unsuitable
> string representation for the platform, at further cost.

By our convention, all of the codebase uses UTF-8 for string handling. It
kind of makes sense because there is a lot of XML and HTML text
processing involved, all working in UTF-8. Yes, UTF-8 might be suboptimal
when using Windows API-s, but then most of the times these calls are
performing actual disk or registry access which tends to be slow anyway
(not in this example, but getcwd() is next to useless anyway, I only
chose it as a minimal example returning a string).

The responsibilities could be separated, but this would needlessly
increase the complexity because it would introduce usage of another
string type at the application level which must be handled differently on
different platforms (starting from the slash-backslash confusion). So we
have decided to process all string data as UTF-8, even if it might cause
slight performance loss when using Windows API-s.

Yes, this is a compromise between speed, usability and maintainability,
and in our projects we have chosen to settle on this point. For other
projects the demands may be different and require different compromises.

Cheers
Paavo
0 new messages