C++ more efficient than C?

copx

unread,

Apr 6, 2008, 12:46:00 AM4/6/08

to

In "Learning Standard C++ as a New Language" Bjarne Stroustrup claims that
properly written C++ outperforms C code. I will just copy his first example
here, which is supposed to demonstrate how C++ abstractions do not only make
code easier to understand but also make it more efficient:
===
The simplest specific example I can think of is a program to find the mean
and median of a sequence of double precision floating-point numbers read
from input. A conventional C-style solution would be:

#include <stdlib.h>
#include <stdio.h>

int compare(const void *p, const void *q)
{
register double p0 = * (double *)p;
register double q0 = * (double *)q;
if (p0 > q0) return 1;
if (p0 < q0) return -1;
return 0;
}

void quit()
{
fprintf(stder, "memory exhausted\n");
exit(1);
}

int main(int argc, char *argv[])
{
int res = 1000;
char * file = argv[2];

double *buf = (double *) malloc(sizeof (double) *res);
if (buf==0) quit();

double median = 0;
double mean = 0;
int n = 0;

FILE *fin = fopen(file, "r");
double d;
while (fscanf(fin, "%lg", &d) == 1) {
if (n==res) {
res += res;
buf = (double *)realloc(buf, sizeof(double) *res);
if (buf==0) quit();
}
buf[n++] = d;
mean = (n == 1) ? d : mean + (d - mean) / n;
}

qsort(buf, n, sizeof(double), compare);

if (n) {
int mid = n/2;
median = (n%2) ? buf[mid] : (buf[mid - 1] + buf[mid]) / 2;
}

printf("number of elements = %d, median = %g, mean = %g\n", n, median,
mean);
free(buf);
}

To compare, here is an idiomatic C++ solution:

#include<vector>
#include<fstream>
#include<algorithm>

using namespace std;

int main(int argc, char * argv[])
{
char *file = argv[2];
vector<double>buf;

double median = 0;
double mean = 0;

fstream fin(file, ios::in);
double d;
while (fin >> d) {
buf.push_back(d);
mean = (buf.size() == 1) ? d : mean + (d - mean) / buf.size();
}

sort(buf.begin(), buf.end());

if (buf.size()) {
int mid = buf.size() / 2;
median = (buf.size() %2) ? buf[mid] : (buf[mid - 1] + buf[mid]) / 2;
}

cout << "number of elements = " << buf.size()
<< " , median = " << median << ", mean = " << mean << '\n';
}
======================

He goes on to claim that, using a sample set of 5,000,000 elements, the C++
code is more than 4 times faster than the C code. Further examples follow to
prove the superiority of C++ even in the realm of efficiency.

So C coders who care about efficiency should switch to C++?

Are their any counter-arguments against Stroustroups claim?

Ian Collins

unread,

Apr 6, 2008, 12:56:54 AM4/6/08

to

copx wrote:
>
> So C coders who care about efficiency should switch to C++?
>

Are you trying to start a flame war?

In some instances (typically where inlined function templates can
replace functions with void* parameters) C++ will be faster. Because
the C++ standard library incorporates the C standard library, C++ code
should never have to be slower.

--
Ian Collins.

copx

unread,

Apr 6, 2008, 1:45:50 AM4/6/08

to

"copx" <co...@gazeta.pl> schrieb im Newsbeitrag
news:ft9kic$6ao$1...@inews.gazeta.pl...

> In "Learning Standard C++ as a New Language" Bjarne Stroustrup claims that
> properly written C++ outperforms C code. I will just copy his first
> example here, which is supposed to demonstrate how C++ abstractions do not
> only make code easier to understand but also make it more efficient:

[snip]

I copied the examples by hand, and two typos sneaked in: in the C example it
should be "stderr" instead of "stder", and the C++ example lacks a final
closing bracket.

copx

unread,

Apr 6, 2008, 1:58:17 AM4/6/08

to

"Ian Collins" <ian-...@hotmail.com> schrieb im Newsbeitrag
news:65r3gmF...@mid.individual.net...

> copx wrote:
>>
>> So C coders who care about efficiency should switch to C++?
>>
> Are you trying to start a flame war?

No, if that were my intention, I would have crossposted this to
comp.lang.c++ for maximum effect! ;)

> In some instances (typically where inlined function templates can
> replace functions with void* parameters) C++ will be faster. Because
> the C++ standard library incorporates the C standard library, C++ code
> should never have to be slower.

Of course, but the interesting claim here is that you should NOT use C style
code but modern C++ abstractions, because they are not only more readable
and expressive, but also more efficient. The second part (they actually
being more efficient) was news to me.
Well, the inlined function templates superiour to void * functions thing
certainly makes sense. Maybe I should finally "move on" after all.
However, I compiled both examples with the current version of GCC (on
Win32-x86), and there is at least one area where C won: code size. The size
difference between the resoluting executables was grotesque: C code => 5KB,
C++ code => over 400KB! (both binaries optimized for size and stripped!)

Richard Heathfield

unread,

Apr 6, 2008, 2:28:25 AM4/6/08

to

copx said:

> In "Learning Standard C++ as a New Language" Bjarne Stroustrup claims
> that properly written C++ outperforms C code. I will just copy his first
> example here, which is supposed to demonstrate how C++ abstractions do
> not only make code easier to understand but also make it more efficient:
> ===
> The simplest specific example I can think of is a program to find the
> mean and median of a sequence of double precision floating-point numbers
> read from input. A conventional C-style solution would be:
>

<snip>

The C-style solution did not compile. When I fixed it minimally, it
segfaulted. So I wrote my own.

>
> To compare, here is an idiomatic C++ solution:
>

<snip>

Results on my system:
C:
number of elements = 5000000, median = -9.10326e+08, mean = -1.40759e+10

real 0m29.955s
user 0m25.840s
sys 0m3.500s

C++:
number of elements = 5000000 , median = -9.10326e+08, mean = -1.40759e+10

real 0m22.701s
user 0m12.260s
sys 0m3.970s

> He goes on to claim that, using a sample set of 5,000,000 elements, the
> C++ code is more than 4 times faster than the C code.

<shrug> My very, very simplistic C code doesn't quite manage to keep up
with the C++ code, but it certainly isn't four times slower. The
difference between my code and the C++ code is likely to be caused by
better memory management on the part of gcc's STL writer.

> follow to
> prove the superiority of C++ even in the realm of efficiency.

<shrug> You picked a rather artificial example. There is more to efficiency
than being able to store and sort five million numbers.

> So C coders who care about efficiency should switch to C++?

C coders who care about efficiency enough to consider switching languages
should first ensure that the language switch really does give sufficient
efficiency gains to justify the switch. They should also bear in mind the
costs in terms of simplicity, readability, maintainability, and
portability.

> Are their any counter-arguments against Stroustroups claim?

My favourite example is a 450-line C++ program I was shown, about eight
years ago, which had taken four man-years to write and took about 10
minutes to process a fairly trivial amount of data. I rewrote it from spec
(because I saw no point in rewriting it from the code) in a single day, in
C. My version took considerably less than a second to process the same
amount of data. There is more to efficiency than mere language choice.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

copx

unread,

Apr 6, 2008, 2:55:10 AM4/6/08

to

"Richard Heathfield" <r...@see.sig.invalid> schrieb im Newsbeitrag
news:NKOdnfOFf_8s8WXa...@bt.com...

> copx said:
>
>> In "Learning Standard C++ as a New Language" Bjarne Stroustrup claims
>> that properly written C++ outperforms C code. I will just copy his first
>> example here, which is supposed to demonstrate how C++ abstractions do
>> not only make code easier to understand but also make it more efficient:
>> ===
>> The simplest specific example I can think of is a program to find the
>> mean and median of a sequence of double precision floating-point numbers
>> read from input. A conventional C-style solution would be:
>>
> <snip>
>
> The C-style solution did not compile. When I fixed it minimally, it
> segfaulted.

Maybe because you passed the test data file as the first argument? For some
strange reason the program expects the test file to be the second argument.
So you have to invoke it like: prog ignored test.dat

>>
>> To compare, here is an idiomatic C++ solution:
>>
> <snip>
>
> Results on my system:
> C:
> number of elements = 5000000, median = -9.10326e+08, mean = -1.40759e+10
>
> real 0m29.955s
> user 0m25.840s
> sys 0m3.500s
>
> C++:
> number of elements = 5000000 , median = -9.10326e+08, mean = -1.40759e+10
>
> real 0m22.701s
> user 0m12.260s
> sys 0m3.970s
>
>> He goes on to claim that, using a sample set of 5,000,000 elements, the
>> C++ code is more than 4 times faster than the C code.
>
> <shrug> My very, very simplistic C code doesn't quite manage to keep up
> with the C++ code, but it certainly isn't four times slower.

I am not that surprised, Stroustrup's C code sucked, even I could do better!
;)
However, the basic claim - that the higher abstraction level of C++ does not
hurt code efficiency but actually improves it - seems to be supported by
your results.

> The difference between my code and the C++ code is likely to be caused by
> better memory management on the part of gcc's STL writer.
>
>> follow to
>> prove the superiority of C++ even in the realm of efficiency.
>
> <shrug> You picked a rather artificial example. There is more to
> efficiency
> than being able to store and sort five million numbers.

I didn't pick that example, Stroustrup did! :P
However, I think the example is quite good, because in the end a complex
application is nothing more than a collection of simple functions/data
structures (or "objects" for the OO crowd) which do things like reading
files, sorting numbers etc.

>> So C coders who care about efficiency should switch to C++?
>
> C coders who care about efficiency enough to consider switching languages
> should first ensure that the language switch really does give sufficient
> efficiency gains to justify the switch. They should also bear in mind the
> costs in terms of simplicity, readability, maintainability, and
> portability.
>
>> Are their any counter-arguments against Stroustroups claim?
>
> My favourite example is a 450-line C++ program I was shown, about eight
> years ago, which had taken four man-years to write and took about 10
> minutes to process a fairly trivial amount of data. I rewrote it from spec
> (because I saw no point in rewriting it from the code) in a single day, in
> C. My version took considerably less than a second to process the same
> amount of data. There is more to efficiency than mere language choice.

Of course, a bad programmer will write worse code in any language compared
to a good programmer. However, the question here is - assuming programmers
of equal skill - who would produce more efficient code if one used C and the
other used C++. So far the answer seems to be the C++ guy would win.

Joachim Schmitz

unread,

Apr 6, 2008, 3:08:53 AM4/6/08

to

copx wrote:
> "Richard Heathfield" <r...@see.sig.invalid> schrieb im Newsbeitrag
> news:NKOdnfOFf_8s8WXa...@bt.com...
>> copx said:
>>
<snip>
>>
>> The C-style solution did not compile. When I fixed it minimally, it
>> segfaulted.
>
> Maybe because you passed the test data file as the first argument?
> For some strange reason the program expects the test file to be the
> second argument.

The reason isn't really that strange:

int main(int argc, char *argv[])
{
int res = 1000;
char * file = argv[2];

Got to be argv[1] to take the 1st argument

Bye, Jojo

bc1...@googlemail.com

unread,

Apr 6, 2008, 3:24:06 AM4/6/08

to

Assuming two programmers of equal skill, on the whole the one writing
in C will write far nicer code than the one in C++ (nicer in terms of
ease of readibility, maintainability, and lack of bugs). And he will
write it quicker.

Assuming the resulting C++ program actually works as intended, it will
run at pretty much the same speed as the C one.

Assumming a team of C++ programmers, the useful subset of the bloated
nightmare of C++ that they know in common is called C.

C++ is a virus. There will come a day when it is wiped from the Earth.

Ian Collins

unread,

Apr 6, 2008, 3:53:57 AM4/6/08

to

copx wrote:
> "Ian Collins" <ian-...@hotmail.com> schrieb im Newsbeitrag
> news:65r3gmF...@mid.individual.net...
>> copx wrote:
>>> So C coders who care about efficiency should switch to C++?
>>>
>> Are you trying to start a flame war?
>
> No, if that were my intention, I would have crossposted this to
> comp.lang.c++ for maximum effect! ;)
>

Ho No! Please don't we've had another Java troll stirring things up
over there!

>> In some instances (typically where inlined function templates can
>> replace functions with void* parameters) C++ will be faster. Because
>> the C++ standard library incorporates the C standard library, C++ code
>> should never have to be slower.
>
> Of course, but the interesting claim here is that you should NOT use C style
> code but modern C++ abstractions, because they are not only more readable
> and expressive, but also more efficient. The second part (they actually
> being more efficient) was news to me.

The only place C++ will win the performance race is where the *language*
offers an advantage, which is why sorting is often cited as an example.
A lot of C++ code (including libraries) is written either in the C
subset of C++ or using constructs that can be implemented equally well in C.

> Well, the inlined function templates superiour to void * functions thing
> certainly makes sense. Maybe I should finally "move on" after all.
> However, I compiled both examples with the current version of GCC (on
> Win32-x86), and there is at least one area where C won: code size. The size
> difference between the resoluting executables was grotesque: C code => 5KB,
> C++ code => over 400KB! (both binaries optimized for size and stripped!)
>

That'll be all those inlined templates! Seriously, there is nearly
always a performance/size trade-off. Not always this big though.

--
Ian Collins.

Ian Collins

unread,

Apr 6, 2008, 3:56:33 AM4/6/08

to

bc1...@googlemail.com wrote:
>
> Assuming two programmers of equal skill, on the whole the one writing
> in C will write far nicer code than the one in C++ (nicer in terms of
> ease of readibility, maintainability, and lack of bugs). And he will
> write it quicker.
>

Humbug and flame bait.

--
Ian Collins.

Richard Heathfield

unread,

Apr 6, 2008, 4:00:31 AM4/6/08

to

copx said:

>
> "Richard Heathfield" <r...@see.sig.invalid> schrieb im Newsbeitrag
> news:NKOdnfOFf_8s8WXa...@bt.com...
>> copx said:
>>
>>> In "Learning Standard C++ as a New Language" Bjarne Stroustrup claims
>>> that properly written C++ outperforms C code. I will just copy his
>>> first example here, which is supposed to demonstrate how C++
>>> abstractions do not only make code easier to understand but also make
>>> it more efficient: ===
>>> The simplest specific example I can think of is a program to find the
>>> mean and median of a sequence of double precision floating-point
>>> numbers read from input. A conventional C-style solution would be:
>>>
>> <snip>
>>
>> The C-style solution did not compile. When I fixed it minimally, it
>> segfaulted.
>
> Maybe because you passed the test data file as the first argument?

I had spotted that bug, but I thought I'd allowed for it. (I did so
successfully for the C++ version, which had the same bug.) On re-testing,
it appears that I had not allowed for it, so I withdraw that complaint, at
least. Having said that, the C version ran at about half the speed of my
own C version.

<snip>

>>> He goes on to claim that, using a sample set of 5,000,000 elements, the
>>> C++ code is more than 4 times faster than the C code.
>>
>> <shrug> My very, very simplistic C code doesn't quite manage to keep up
>> with the C++ code, but it certainly isn't four times slower.
>
> I am not that surprised, Stroustrup's C code sucked, even I could do
> better! ;)
> However, the basic claim - that the higher abstraction level of C++ does
> not hurt code efficiency but actually improves it - seems to be supported
> by your results.

Rather, it suggests that, *in this one particular example*, C++'s sort
works faster than qsort. Before getting too excited about that, I'd like
to see comparative performance stats for sorts where comparisons are
considerably more expensive than for mere doubles...

>> The difference between my code and the C++ code is likely to be caused
>> by better memory management on the part of gcc's STL writer.

...and don't forget this bit.

>>
>>> follow to
>>> prove the superiority of C++ even in the realm of efficiency.
>>
>> <shrug> You picked a rather artificial example. There is more to
>> efficiency
>> than being able to store and sort five million numbers.
>
> I didn't pick that example, Stroustrup did! :P

By posting it here, you picked it.

> However, I think the example is quite good, because in the end a complex
> application is nothing more than a collection of simple functions/data
> structures (or "objects" for the OO crowd) which do things like reading
> files, sorting numbers etc.

That's perhaps a rather naive view. It ain't so much what building blocks
you has, as how you puts them together.

<snip>

>>> Are their any counter-arguments against Stroustroups claim?
>>
>> My favourite example is a 450-line C++ program I was shown, about eight
>> years ago, which had taken four man-years to write and took about 10
>> minutes to process a fairly trivial amount of data. I rewrote it from
>> spec (because I saw no point in rewriting it from the code) in a single
>> day, in C. My version took considerably less than a second to process
>> the same amount of data. There is more to efficiency than mere language
>> choice.
>
> Of course, a bad programmer will write worse code in any language
> compared to a good programmer.

Right - and take a look at some of the C++ applications out there if you
want to see some fine examples of bad programming. :-)

> However, the question here is - assuming
> programmers of equal skill - who would produce more efficient code if one
> used C and the other used C++. So far the answer seems to be the C++ guy
> would win.

I think there are many more factors than you're allowing for here, and
language choice (between C and C++, at least) is a relatively
insignificant factor. As for programmers of equal skill, how would you
measure that?

Measure, measure, measure. If you want a real comparison, get Bjarne
Stroustrup to write a real-time transaction processing system in C++ for a
major bank, *to spec*. And get Dennis Ritchie to implement the same spec
in C. *Then* you have something worth testing and measuring. Don't forget
to measure bugs as well as performance - if the answer doesn't have to be
right, I can make both the C and the C++ versions run very very fast
indeed.

Richard Heathfield

unread,

Apr 6, 2008, 4:11:05 AM4/6/08

to

copx said:

<snip>

> However, I compiled both examples with the current version of GCC (on
> Win32-x86), and there is at least one area where C won: code size. The
> size difference between the resoluting executables was grotesque: C code
> => 5KB, C++ code => over 400KB! (both binaries optimized for size and
> stripped!)

I didn't find the difference to be quite so remarkable. I didn't bother
with size optimisation, and I got these results:

Your C version: 37519
My C version: 60818
Your C++ version: 101287

Stripping yielded these figures:

Your C version: 4752
My C version: 9984
Your C++ version: 12172

Whilst the C++ version is significantly larger, it's not that big a deal
really, is it?

arnuld

unread,

Apr 6, 2008, 4:30:22 AM4/6/08

to

> On Sun, 06 Apr 2008 00:24:06 -0700, bc1891 wrote:

> C++ is a virus. There will come a day when it is wiped from the Earth.

ouch! , sometimes I wonder why people give such titles to C++.

:: Is C++ powerful ?

yes, a lot.

:: Is C++ complex ?

of course.

:: Is C++ monstrous and efficient ?

yes, of course it is like this heck a lot :O)

:: Is programming in C hard ?

Nah, it is a *lot* harder than what you do with other languages.

I am still looking for the day when more and more people will give titles
to VB, .NET or Java rather than yanking their brains around C vs C++
flame-wars.

you may be young and this young-time of yours is too precious to waste to
C vs C++. If you are really interested please use it on "VB vs any Good
Language" ( Python, Common Lisp, Ruby, Ada etc etc)

I give the same advice to "copx" (the OP)

--
http://lispmachine.wordpress.com/

Please remove capital 'V's when you reply
to me via e-mail.

arnuld

unread,

Apr 6, 2008, 4:34:13 AM4/6/08

to

> On Sun, 06 Apr 2008 13:30:22 +0500, arnuld wrote:

> :: Is C++ powerful ?
>
> yes, a lot.

> ....[SNIP]....

> I give the same advice to "copx" (the OP)

Oh.. no. I unintentionally f*cked-up the thread from other NG. Negligence
is on my part.

I apologize for the mistake & inconvenience I have caused.

arnuld

unread,

Apr 6, 2008, 4:37:11 AM4/6/08

to

> On Sun, 06 Apr 2008 13:34:13 +0500, arnuld wrote:

> Oh.. no. I unintentionally f*cked-up the thread from other NG. Negligence
> is on my part.
>
> I apologize for the mistake & inconvenience I have caused.

That apology was for comp.lang.c++. My PAN settings were wrong. I just
corrected them.

copx

unread,

Apr 6, 2008, 4:47:39 AM4/6/08

to

"Richard Heathfield" <r...@see.sig.invalid> schrieb im Newsbeitrag

news:jYednbMQHMkhGWXa...@bt.com...

> copx said:
>
> <snip>
>
>> However, I compiled both examples with the current version of GCC (on
>> Win32-x86), and there is at least one area where C won: code size. The
>> size difference between the resoluting executables was grotesque: C code
>> => 5KB, C++ code => over 400KB! (both binaries optimized for size and
>> stripped!)
>
> I didn't find the difference to be quite so remarkable. I didn't bother
> with size optimisation, and I got these results:
>
> Your C version: 37519
> My C version: 60818
> Your C++ version: 101287
>
> Stripping yielded these figures:
>
> Your C version: 4752
> My C version: 9984
> Your C++ version: 12172
>
> Whilst the C++ version is significantly larger, it's not that big a deal
> really, is it?

1. Please, do not call it "my C version". I did not write that code.

2. Given that we use the same compiler, I assume the large difference can be
explained by the fact that you use a setup which dynamically links the
executable against the C++ standard library installed on your system i.e.
you do not see the true complexity of your code, and it would not run on any
system without a compatible library. I am surprised that the principle of
dynamic linking is news to you... GCC does that on Linux (your OS I guess)
by default.
In contrast, on Windows systems - where no C++ standard library can be
expected to be present - the compiler links the required functionality into
the executable i.e. it becomes a lot bigger. Happens with C code too, but
the C standard library functions are very simple/small so they do not add
much to the executable.

Juha Nieminen

unread,

Apr 6, 2008, 5:59:26 AM4/6/08

to

bc1...@googlemail.com wrote:
> Assuming two programmers of equal skill, on the whole the one writing
> in C will write far nicer code than the one in C++ (nicer in terms of
> ease of readibility, maintainability, and lack of bugs). And he will
> write it quicker.

Here we have a perfect example of the C-hacker syndrome.

I bet you are a big fan of Linus Torvalds. (Heck, you may even *be*
Linus Torvalds. Exactly the same type of trolling.)

Lloyd Bonafide

unread,

Apr 6, 2008, 7:03:08 AM4/6/08

to

arnuld <arn...@ippiVmail.com> wrote in news:pan.2008.04.06.08.30.19.428735
@ippiVmail.com:

>> On Sun, 06 Apr 2008 00:24:06 -0700, bc1891 wrote:
>
>> C++ is a virus. There will come a day when it is wiped from the Earth.
>
> ouch! , sometimes I wonder why people give such titles to C++.

My guess is that they are bad and/or lazy programmers and blame the
language for not being able to learn it. I'm not fond of several languages
but their existence doesn't bother me. I just choose not to use them.

arnuld

unread,

Apr 6, 2008, 7:21:13 AM4/6/08

to

> On Sun, 06 Apr 2008 12:59:26 +0300, Juha Nieminen wrote:

> I bet you are a big fan of Linus Torvalds.

:D

> (Heck, you may even *be*
> Linus Torvalds. Exactly the same type of trolling.)

Linus also gives same types on ideas on why he did not choose a
microkernel based design.

Richard Heathfield

unread,

Apr 6, 2008, 8:10:19 AM4/6/08

to

copx said:

>
> "Richard Heathfield" <r...@see.sig.invalid> schrieb im Newsbeitrag
> news:jYednbMQHMkhGWXa...@bt.com...

<snip>

>> Whilst the C++ version is significantly larger, it's not that big a deal
>> really, is it?
>
> 1. Please, do not call it "my C version". I did not write that code.

Fair enough - but I think you know what I mean.

> 2. Given that we use the same compiler, I assume the large difference can
> be explained by the fact that you use a setup which dynamically links the
> executable against the C++ standard library installed on your system i.e.
> you do not see the true complexity of your code, and it would not run on
> any system without a compatible library.

But that's okay because we can just re-compile on the new system.

> I am surprised that the
> principle of dynamic linking is news to you...

Oh, but it isn't. I'm sick an... I mean, I'm quite familiar with the idea,
on several different OSen. And although it has its drawbacks, I think
you'll agree that it does drastically reduce filesize.

<snip>

Tomás Ó hÉilidhe

unread,

Apr 6, 2008, 8:08:13 AM4/6/08

to

My own stance on C versus C++ can be summed up in one word: compilers.

The only reason I program in C rather than C++ is that reliable C
compilers are far more ubiquitous than reliable C++ compilers. I don't
think this will change any time soon either; I mean who's actually
gonna sit down and write a C++ compiler for the PIC16F684
microcontroller when the C compiler is perfectly sufficient?

Both languages get compiled to machine code, and one language is for
the most part a superset of the other, so in my own eyes it stands to
reason that C++ could be better in every way (except when it comes to
finding a decent compiler for your platform).

Ben Bacarisse

unread,

Apr 6, 2008, 8:47:50 AM4/6/08

to

"copx" <co...@gazeta.pl> writes:

> In "Learning Standard C++ as a New Language" Bjarne Stroustrup claims that
> properly written C++ outperforms C code. I will just copy his first example
> here, which is supposed to demonstrate how C++ abstractions do not only make
> code easier to understand but also make it more efficient:
> ===
> The simplest specific example I can think of is a program to find the mean
> and median of a sequence of double precision floating-point numbers read
> from input. A conventional C-style solution would be:

> To compare, here is an idiomatic C++ solution:

> He goes on to claim that, using a sample set of 5,000,000 elements, the C++
> code is more than 4 times faster than the C code. Further examples follow to
> prove the superiority of C++ even in the realm of efficiency.

I can't get anything like 4 times on my system. With optimisations
off, the C code is a little faster, with optimisations on, the C code
is 20% slower.

All the speed is coming from the sort. Comment that out, and the C++
is slower with or without optimisations. Not really surprising that,
in an IO test, C would have the edge (without the sort, that is all
this is).

It is also not really surprising that C++'s sort can be optimised to
be faster than qsort -- the language is designed to make such
optimisations possible, but I wonder what combination of compiler and
hardware gives a factor of 4. What do other people see?

--
Ben.

Richard Heathfield

unread,

Apr 6, 2008, 9:08:12 AM4/6/08

to

Ben Bacarisse said:

<snip>

> It is also not really surprising that C++'s sort can be optimised to
> be faster than qsort -- the language is designed to make such
> optimisations possible, but I wonder what combination of compiler and
> hardware gives a factor of 4. What do other people see?

I see a fine specimen of Anser caerulescens ferali, with an array of
programmers in hot pursuit.

copx

unread,

Apr 6, 2008, 10:14:37 AM4/6/08

to

"Ben Bacarisse" <ben.u...@bsb.me.uk> schrieb im Newsbeitrag
news:87r6djn...@bsb.me.uk...
[snip]

> I can't get anything like 4 times on my system. With optimisations
> off, the C code is a little faster, with optimisations on, the C code
> is 20% slower.
>
> All the speed is coming from the sort. Comment that out, and the C++
> is slower with or without optimisations. Not really surprising that,
> in an IO test, C would have the edge (without the sort, that is all
> this is).
>
> It is also not really surprising that C++'s sort can be optimised to
> be faster than qsort -- the language is designed to make such
> optimisations possible, but I wonder what combination of compiler and
> hardware gives a factor of 4.

The text contains any details on that. It only says:
"The implementation I used is widely available and cheap - not a research
toy. Implementations that claim higher performance are also available."
The text was published in the May 1999 issue of "The C/C++ User's Journal".
That's all available info - make your guess! ;)

The factor of 4 seems to be a very case, however C++ does indeed seem to
beat C in synthetic benchmarks most of the time these days:
http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all
Funny side note: if you give equal importance to "CPU time", "Memory Use",
and "Gzip Bytes" the winner is... Pascal!

Richard

unread,

Apr 6, 2008, 10:24:05 AM4/6/08

to

"copx" <co...@gazeta.pl> writes:

Don't be ridiculous. This entire thread is clearly a troll.

Malcolm McLean

unread,

Apr 6, 2008, 1:04:50 PM4/6/08

to

"Richard Heathfield" <r...@see.sig.invalid> wrote in message

> I think there are many more factors than you're allowing for here, and
> language choice (between C and C++, at least) is a relatively
> insignificant factor. As for programmers of equal skill, how would you
> measure that?
>

Get the same programmer to write code in C and C++. You'll have to do the
exercise with several different programmers and switching the order of
writing to get a fair comparison.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Malcolm McLean

unread,

Apr 6, 2008, 1:08:58 PM4/6/08

to

"copx" <co...@gazeta.pl> wrote in message

>
> So C coders who care about efficiency should switch to C++?
>

> Are their any counter-arguments against Stroustroups claim?
>

Templates are faster than void *s and function pointers for generalising
code. It is a real argument for C++.
The main problem is the C++ template syntax. It's just a stretch too
difficult for a programmer to use comfortably.

Richard Heathfield

unread,

Apr 6, 2008, 1:35:20 PM4/6/08

to

Malcolm McLean said:

>
> "Richard Heathfield" <r...@see.sig.invalid> wrote in message
>> I think there are many more factors than you're allowing for here, and
>> language choice (between C and C++, at least) is a relatively
>> insignificant factor. As for programmers of equal skill, how would you
>> measure that?
>>
> Get the same programmer to write code in C and C++.

Finding a programmer who is precisely as skilled in C as he or she is in
C++ - no more, no less - will not be easy.

Ian Collins

unread,

Apr 6, 2008, 3:34:41 PM4/6/08

to

Malcolm McLean wrote:
> "copx" <co...@gazeta.pl> wrote in message
>>
>> So C coders who care about efficiency should switch to C++?
>>
>> Are their any counter-arguments against Stroustroups claim?
>>
> Templates are faster than void *s and function pointers for generalising
> code. It is a real argument for C++.
> The main problem is the C++ template syntax. It's just a stretch too
> difficult for a programmer to use comfortably.
>

I don't think it's the syntax, templates fall well short of function
pointer declarations in the obscurity scale. I think it's more a case
of the conceptual next level of indirection that throws novices.

--
Ian Collins.

Kaz Kylheku

unread,

Apr 7, 2008, 1:21:10 AM4/7/08

to

On Apr 5, 9:46 pm, "copx" <c...@gazeta.pl> wrote:
> To compare, here is an idiomatic C++ solution:

You can also write some silly #define macros whose job it is to write
a type-specialized sort function.

#define GEN_SORT_FUNCTION(TYPE, FUNC_NAME, LESS, EQUAL) ...

GEN_SORT_FUNCTION(int, sort_int, left < right, left == right)

The idea is that would generate a function sort_int which sorts an
array of integers, using ``left < right'' and ``left == right'' to
compare elements. (The definition of GEN_SORT_FUNCTION is a reader
exercise).

This has some concrete advantages over C++, because it doesn't suffer
from the std::less braindamage.

Our sorting function doesn't have to deduce that the elements are
equal by comparing them for inequality both ways.

We could easily develop a variant of GEN_SORT_FUNCTION in which a
single comparison expression is specified---one whose job it is to
produce a negative integer, 0, or a positive integer for the left <
right, left == right and left > right cases.

This has the advantage that if you are comparing strings, you only
have to make a single scan through them, from which you have all the
information you need.

(Look up the definition of strmp, and how it fits into qsort).

There is a reason that Stroustrup chose doubles, because if the data
consisted of strings---particularly long ones with lengthy common
prefixes---it's possible that the idiomatic C++ would actually lose to
the idiomatic C due to the impedance mismatch between std::less and
complex comparisons.

The cost of going through a function pointer to get to a comparison
function only matters if that comparison function does next to
nothing.

If the comparison function is nontrivial, you can bungle performance
by having to do that comparison two or more times over again.

If the comparison function is nontrivial, you can also bungle
performance by inappropriately inlining it, so that the combination of
sort function and inlined comparison code exceeds the available
instruction cache.

> So C coders who care about efficiency should switch to C++?

Or maybe only those coders who can't figure out how to sort an array
of doubles without using function indirection, yet are comfortable
with templates. :)

Dann Corbit

unread,

Apr 7, 2008, 9:31:16 PM4/7/08

to

"copx" <co...@gazeta.pl> wrote in message

news:ft9kic$6ao$1...@inews.gazeta.pl...

> In "Learning Standard C++ as a New Language" Bjarne Stroustrup claims that
> properly written C++ outperforms C code. I will just copy his first
> example here, which is supposed to demonstrate how C++ abstractions do not
> only make code easier to understand but also make it more efficient:
> ===
> The simplest specific example I can think of is a program to find the mean
> and median of a sequence of double precision floating-point numbers read
> from input. A conventional C-style solution would be:
>

> #include <stdlib.h>
> #include <stdio.h>
>
> int compare(const void *p, const void *q)
> {
> register double p0 = * (double *)p;
> register double q0 = * (double *)q;
> if (p0 > q0) return 1;
> if (p0 < q0) return -1;
> return 0;
> }
>
> void quit()
> {
> fprintf(stder, "memory exhausted\n");
> exit(1);

> }
>
> int main(int argc, char *argv[])
> {
> int res = 1000;
> char * file = argv[2];
>

> double *buf = (double *) malloc(sizeof (double) *res);
> if (buf==0) quit();
>
> double median = 0;
> double mean = 0;
> int n = 0;
>
> FILE *fin = fopen(file, "r");
> double d;
> while (fscanf(fin, "%lg", &d) == 1) {
> if (n==res) {
> res += res;
> buf = (double *)realloc(buf, sizeof(double) *res);
> if (buf==0) quit();
> }
> buf[n++] = d;
> mean = (n == 1) ? d : mean + (d - mean) / n;
> }
>
> qsort(buf, n, sizeof(double), compare);
>
> if (n) {
> int mid = n/2;
> median = (n%2) ? buf[mid] : (buf[mid - 1] + buf[mid]) / 2;
> }
>
> printf("number of elements = %d, median = %g, mean = %g\n", n, median,
> mean);
> free(buf);

> }
>
> To compare, here is an idiomatic C++ solution:
>

> #include<vector>
> #include<fstream>
> #include<algorithm>
>
> using namespace std;
>
> int main(int argc, char * argv[])
> {
> char *file = argv[2];
> vector<double>buf;
>
> double median = 0;
> double mean = 0;
>
> fstream fin(file, ios::in);
> double d;
> while (fin >> d) {
> buf.push_back(d);
> mean = (buf.size() == 1) ? d : mean + (d - mean) / buf.size();
> }
>
> sort(buf.begin(), buf.end());
>
> if (buf.size()) {
> int mid = buf.size() / 2;
> median = (buf.size() %2) ? buf[mid] : (buf[mid - 1] + buf[mid]) / 2;
> }
>
> cout << "number of elements = " << buf.size()
> << " , median = " << median << ", mean = " << mean << '\n';
> }
> ======================

>
> He goes on to claim that, using a sample set of 5,000,000 elements, the
> C++ code is more than 4 times faster than the C code. Further examples
> follow to prove the superiority of C++ even in the realm of efficiency.
>

> So C coders who care about efficiency should switch to C++?
>

> Are their any counter-arguments against Stroustroups claim?

He is timing memory management of the vector class verses a totally naive
realloc for every single row beyond 1000. The sort operation is a complete
red herring.

I can't believe that Mr. Stroustrup wrote that, because it's totally
disingenuous.

** Posted from http://www.teranews.com **

Kaz Kylheku

unread,

Apr 7, 2008, 10:14:15 PM4/7/08

to

On Apr 7, 6:31 pm, "Dann Corbit" <dcor...@connx.com> wrote:
> He is timing memory management of the vector class verses a totally naive
> realloc for every single row beyond 1000.

Alas, not true; you must have misread the ``res += res'' expression
which doubles the reserved size prior to each realloc.

Dann Corbit

unread,

Apr 7, 2008, 10:47:28 PM4/7/08

to

#include <stdlib.h>
#include <stdio.h>
#include <time.h>
static const double CLOCK_INV = 1.0 / CLOCKS_PER_SEC;

int compare(const void *p, const void *q)
{
register double p0 = *(double *) p;
register double q0 = *(double *) q;
if (p0 > q0)
return 1;
if (p0 < q0)
return -1;
return 0;
}

void quit()
{
fprintf(stderr, "memory exhausted\n");
exit(1);
}

int main(int argc, char *argv[])
{

int res = 100000;
char *file = argv[1];

double median = 0;
double mean = 0;
int n = 0;

double d;
clock_t start, end;

double *buf = (double *) malloc(sizeof(double) * res);

FILE *fin = fopen(file, "r");

if (buf == 0)
quit();
start = clock();

while (fscanf(fin, "%lg", &d) == 1) {
if (n == res) {
res += res;
buf = (double *) realloc(buf, sizeof(double) * res);
if (buf == 0)
quit();
}
buf[n++] = d;
mean = (n == 1) ? d : mean + (d - mean) / n;
}

qsort(buf, n, sizeof(double), compare);

if (n) {
int mid = n / 2;

median = (n % 2) ? buf[mid] : (buf[mid - 1] + buf[mid]) / 2;
}
end = clock();
printf("number of elements = %d, median = %g, mean = %g in %f
seconds\n", n, median, mean, (end - start)*CLOCK_INV);
free(buf);
return 0;
}

#include <vector>
#include <fstream>
#include <iostream>
#include <ctime>
#include <algorithm>

using namespace std;

static const double CLOCK_INV = 1.0 / CLOCKS_PER_SEC;

int main(int argc, char *argv[])
{

char *file = argv[1];
vector < double >buf;
clock_t start, end;

double median = 0;
double mean = 0;

fstream fin(file, ios::in);
double d;

start = clock();

while (fin >> d) {
buf.push_back(d);
mean = (buf.size() == 1) ? d : mean + (d - mean) / buf.size();
}

sort(buf.begin(), buf.end());

if (buf.size()) {
int mid = buf.size() / 2;

median = (buf.size() % 2) ? buf[mid] : (buf[mid - 1] + buf[mid]) /
2;
}
end = clock();

cout << "number of elements = " << buf.size()

<< " , median = " << median << ", mean = " << mean << " in " <<
(end - start)*CLOCK_INV << " seconds." << endl ;
}
/*
C:\tmp>cl /EHsc /W4 /Ox testcpp.cpp
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for
80x86
Copyright (C) Microsoft Corporation. All rights reserved.

testcpp.cpp
testcpp.cpp(12) : warning C4100: 'argc' : unreferenced formal parameter
Microsoft (R) Incremental Linker Version 8.00.50727.762
Copyright (C) Microsoft Corporation. All rights reserved.

/out:testcpp.exe
testcpp.obj

C:\tmp>testcpp n.txt
number of elements = 5000000 , median = 1.00008, mean = 5.42094 in 10.344
seconds.

C:\tmp>testcpp n.txt
number of elements = 5000000 , median = 1.00008, mean = 5.42094 in 10.328
seconds.

C:\tmp>testcpp n.txt
number of elements = 5000000 , median = 1.00008, mean = 5.42094 in 10.344
seconds.

C:\tmp>cl /W4 /Ox testc.c
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for
80x86
Copyright (C) Microsoft Corporation. All rights reserved.

testc.c
testc.c(32) : warning C4996: 'fopen': This function or variable may be
unsafe. Consider using fopen_s instead. To disable deprecation, use
_CRT_SECURE_NO_WARNINGS. See online help for d
etails.
C:\Program Files\Microsoft Visual Studio 8\VC\INCLUDE\stdio.h(234) :
see declaration of 'fopen'
testc.c(36) : warning C4996: 'fscanf': This function or variable may be
unsafe. Consider using fscanf_s instead. To disable deprecation, use
_CRT_SECURE_NO_WARNINGS. See online help for
details.
C:\Program Files\Microsoft Visual Studio 8\VC\INCLUDE\stdio.h(249) :
see declaration of 'fscanf'
testc.c(22) : warning C4100: 'argc' : unreferenced formal parameter
Microsoft (R) Incremental Linker Version 8.00.50727.762
Copyright (C) Microsoft Corporation. All rights reserved.

/out:testc.exe
testc.obj

C:\tmp>testc n.txt
number of elements = 5000000, median = 1.00008, mean = 5.42094 in 9.469000
seconds

C:\tmp>testc n.txt
number of elements = 5000000, median = 1.00008, mean = 5.42094 in 9.406000
seconds

C:\tmp>testc n.txt
number of elements = 5000000, median = 1.00008, mean = 5.42094 in 9.484000
seconds

C:\tmp>
*/

user923005

unread,

Apr 8, 2008, 3:21:37 AM4/8/08

to

Even so, the realloc() was still the rate limiting step.

5 million doubles will be about 40 MB on most systems, which is about
2^25 bytes of memory.
The initial size of 1000 is about 2^10, so we will have 15 realloc()
calls which will entail memcpy() of all objects from the previous hunk
(an approximately O(N^2) operation by doing things in this manner).
By changing the starting size to 100,000 doubles {thereby cutting the
number of realloc() calls approximately in half}, the speed is a
little faster than the C++ version.

It's also a tragic mistake to sort the data to find the median.
Median location is O(N) when done properly {e.g. randomized
quickselect()}. But that's neither here nor there.

Having said all that, I would much rather write data structures in C++
than in C.

Ben Bacarisse

unread,

Apr 8, 2008, 9:27:14 AM4/8/08

to

user923005 <dco...@connx.com> writes:

> On Apr 7, 7:14 pm, Kaz Kylheku <kkylh...@gmail.com> wrote:
>> On Apr 7, 6:31 pm, "Dann Corbit" <dcor...@connx.com> wrote:
>>
>> > He is timing memory management of the vector class verses a totally naive
>> > realloc for every single row beyond 1000.
>>
>> Alas, not true; you must have misread the ``res += res'' expression
>> which doubles the reserved size prior to each realloc.
>
> Even so, the realloc() was still the rate limiting step.

That did not seem to be the case in my tests. Commenting out the sort
in both programs gave C the advantage at all optimisation settings.

Just to be sure, I changed the C++ version to use fscanf and the
"naive" realloc beats the vector (but only by a little) for simply
reading and storing the numbers.

It seemed to me that C++ was able to inline the sort comparison. Of
course all this will depend on the libraries involved as much as it
does on the compilers.

--
Ben.

Kaz Kylheku

unread,

Apr 8, 2008, 12:34:29 PM4/8/08

to

On Apr 8, 12:21 am, user923005 <dcor...@connx.com> wrote:
> The initial size of 1000 is about 2^10, so we will have 15 realloc()
> calls which will entail memcpy() of all objects from the previous hunk
> (an approximately O(N^2) operation by doing things in this manner).

That is not a tight upper bound; in fact the copying is in fact an
O(N) operation. Every time it takes place, half of the elements in the
array being moved are brand new (have never been moved before, only
assigned to their present location). Of the remaining items, half have
been moved only once before, a quarter have been moved twice, and so
on. This works out to a constant average of 2 copies per element.
(Hint: find the limit to which converges the series 1/2 + 2/4 + 3/8 +
4/16 + 5/32 ...).

The copying doesn't necessarily take place, either, if the malloc is
able to just keep expanding the block in place.

Kelsey Bjarnason

unread,

Apr 8, 2008, 1:20:02 PM4/8/08

to

On Sun, 06 Apr 2008 12:59:26 +0300, Juha Nieminen wrote:

I happen to like both C and C++, but there _are_ cases where C++ makes
things a little less clear than they could be. As a trivial example:

int x = 3, y;

y = func(x);

Quick: what's the value of x?

In C the answer's easy: the value is 3. In C++, there is absolutely no
way to know, at the point of calling, what will happen to the variable.

This makes the same code, in C, somewhat easier to read, as the
indicators of whether the value will be modified or not are right there,
in your face - or not. In C++, there's no indication whatsoever, meaning
you have to hunt for the called function, check its interface, realize
that yes, it modifies the value and add this to the growing list of
pointless things to remember which shouldn't need remembering.

Richard

unread,

Apr 8, 2008, 1:37:40 PM4/8/08

to

Kelsey Bjarnason <kbjar...@gmail.com> writes:

Simplified nonsense. You are too kind.

Heres another C++ snippet:

int x=0,y=0;

x=1;
x=x+y;

y=x+2;

Quick what is y?

Answer no idea. Because you don't know what "+" does half the time .....

The clc crowd who fix bugs from a printout would not fare too well
methinks ....

peter koch

unread,

Apr 8, 2008, 1:45:09 PM4/8/08

to

On 8 Apr., 19:20, Kelsey Bjarnason <kbjarna...@gmail.com> wrote:
> On Sun, 06 Apr 2008 12:59:26 +0300, Juha Nieminen wrote:
> > bc1...@googlemail.com wrote:
> >> Assuming two programmers of equal skill, on the whole the one writing
> >> in C will write far nicer code than the one in C++ (nicer in terms of
> >> ease of readibility, maintainability, and lack of bugs). And he will
> >> write it quicker.
>
> > Here we have a perfect example of the C-hacker syndrome.
>
> > I bet you are a big fan of Linus Torvalds. (Heck, you may even *be*
> > Linus Torvalds. Exactly the same type of trolling.)
>
> I happen to like both C and C++, but there _are_ cases where C++ makes
> things a little less clear than they could be. As a trivial example:
>
> int x = 3, y;
>
> y = func(x);
>
> Quick: what's the value of x?
>
> In C the answer's easy: the value is 3. In C++, there is absolutely no
> way to know, at the point of calling, what will happen to the variable.
>
> This makes the same code, in C, somewhat easier to read, as the
> indicators of whether the value will be modified or not are right there,
> in your face - or not.

Right. But this is only true in such trivial cases as the above where
you allow such an absurd name. Replace the name with some reasonable
one, such as square or fibonacci, and no one will be in doubt. I have
heard others complain like you, but never found statements such as the
above to be a real problem.
A much larger problem has been in absurd declaration such as the
above. Are you quite sure that it should read int x = 3; int y; (as
you imply) and not int x = (3,y); ? ;-)

> In C++, there's no indication whatsoever, meaning
> you have to hunt for the called function, check its interface, realize
> that yes, it modifies the value and add this to the growing list of
> pointless things to remember which shouldn't need remembering.

If you need to look at the code, you should know what func does
anyway!

/Peter

lbon...@yahoo.com

unread,

Apr 8, 2008, 1:50:59 PM4/8/08

to

On Apr 8, 12:20 pm, Kelsey Bjarnason <kbjarna...@gmail.com> wrote:
> I happen to like both C and C++, but there _are_ cases where C++ makes
> things a little less clear than they could be. As a trivial example:
>
> int x = 3, y;
>
> y = func(x);
>
> Quick: what's the value of x?

Maybe with a name better than "func" it would be less ambiguous.

> In C the answer's easy: the value is 3. In C++, there is absolutely no
> way to know, at the point of calling, what will happen to the variable.

And in C, there's absolutely no way to know who's fiddling around with
your struct members because you can't restrict access to them.

Richard Heathfield

unread,

Apr 8, 2008, 2:14:16 PM4/8/08

to

lbon...@yahoo.com said:

<snip>

> And in C, there's absolutely no way to know who's fiddling around with
> your struct members because you can't restrict access to them.

Wrong. Maybe *you* can't restrict access to them, but I can, and so can
quite a few others in comp.lang.c. The technique is not difficult.

Keith Thompson

unread,

Apr 8, 2008, 2:16:34 PM4/8/08

to

<OT>
Richard <de...@gmail.com> writes:
[...]

> Heres another C++ snippet:
>
> int x=0,y=0;
>
> x=1;
> x=x+y;
>
> y=x+2;
>
> Quick what is y?

3

> Answer no idea. Because you don't know what "+" does half the time .....

Ah, but this is the other half. C++ doesn't allow "+" to be
overloaded for type int.
</OT>

[...]

--
Keith Thompson (The_Other_Keith) <ks...@mib.org>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson

unread,

Apr 8, 2008, 2:21:12 PM4/8/08

to

peter koch <peter.ko...@gmail.com> writes:
> On 8 Apr., 19:20, Kelsey Bjarnason <kbjarna...@gmail.com> wrote:

[...]

>> int x = 3, y;
>>
>> y = func(x);
>>
>> Quick: what's the value of x?

[...]

> A much larger problem has been in absurd declaration such as the
> above. Are you quite sure that it should read int x = 3; int y; (as
> you imply) and not int x = (3,y); ? ;-)

[...]

Yes. An initializer must be an assigment-expression, which means it
cannot have a comma operator at its top level.

The same restriction applies to function arguments. In both cases,
it's because the comma is used as a delimiter in that context, so it's
not allowed as an operator.

I suppose this:

int x = 3;
int y;

would be clearer, but I have no problem with either form.

Anand Hariharan

unread,

Apr 8, 2008, 2:35:10 PM4/8/08

to

On Apr 8, 12:20 pm, Kelsey Bjarnason <kbjarna...@gmail.com> wrote:

> int x = 3, y;
>
> y = func(x);
>
> Quick: what's the value of x?
>
> In C the answer's easy: the value is 3. In C++, there is absolutely no
> way to know, at the point of calling, what will happen to the variable.
>
> This makes the same code, in C, somewhat easier to read, as the
> indicators of whether the value will be modified or not are right there,
> in your face - or not. In C++, there's no indication whatsoever, meaning
> you have to hunt for the called function, check its interface, realize
> that yes, it modifies the value and add this to the growing list of
> pointless things to remember which shouldn't need remembering.

ITYM "... realize that yes, it /could modify/ the value ....". Unless
you look at func's definition, one cannot know for sure (just from
checking its interface) if it actually modifies 'x' or not.

- Anand

Dann Corbit

unread,

Apr 8, 2008, 2:48:20 PM4/8/08

to

"Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message
news:87wsn8m...@bsb.me.uk...

Many C compilers have the ability to arbitrarily inline functions that are
small enough to inline.

Kelsey Bjarnason

unread,

Apr 8, 2008, 3:40:02 PM4/8/08

to

[snips]

On Tue, 08 Apr 2008 10:45:09 -0700, peter koch wrote:

> Right. But this is only true in such trivial cases as the above where
> you allow such an absurd name. Replace the name with some reasonable
> one, such as square or fibonacci, and no one will be in doubt.

Really? Fine: try this:

string a, b;

a = somevalue;
b = trim(a);

If the intent of trim is to act like the similar function in PHP - i.e.
to remove leading and trailing whitespace - what does this tell us about
the effect, if any, on a?

Oh, right, nothing. Hmm. So what's a good name for this? How about, oh,
trim_which_modifies_its_parameter()? I don't want to type that.

> I have
> heard others complain like you, but never found statements such as the
> above to be a real problem.

Yet the problem remains: C++ makes the issue of whether a passed
parameter gets modified or not absolutely indeterminate at point of call
- meaning the poor bastard having to maintain the code has no hope, by
reading the portion of code he's trying to work on, of figuring out what
is actually going on.

This is not a particularly good design.

> A much larger problem has been in absurd declaration such as the above.
> Are you quite sure that it should read int x = 3; int y; (as you imply)

I implied that? Where? I wrote int x = 3, y; There's only one semicolon
in there, only one instance of "int". Why would you think this somehow
"implies" a completely different manner of writing code? Does your way
of writing it imply it should be re-done in Pascal? No? Why not? You
seem to be taking purely arbitrary paths, so why not that one?

> If you need to look at the code, you should know what func does anyway!

Why? Since it cannot modify the passed parameter, what it does is
fundamentally irrelevant to debugging the local problem. That is, unless
someone invented a method of passing parameters which would then be
modified, but making this absolutely invisible to the maintenance
programmer looking at the calling code.

Oh, yeah, they did exactly that, didn't they? Good, now instead of
focusing on *this* piece of code, one has to examine every called
function, not for its effects on output and the like, but its effects on
*local* variables it gives absolutely no indication it will be modifying.

Information hiding can be good. This sort is bad.

Kelsey Bjarnason

unread,

Apr 8, 2008, 3:45:01 PM4/8/08

to

Indeed. However, the more important point is that you can be reasonably
certain that it will *not* be modifying values unless there's a pointer
involved somewhere, which is generally *visible* at the point of the call.

eg:

void func1(void)
{
int x = 3;

func2(x);
}

Anyone seeing this sort of thing wouldn't _need_ to examine "func2" to
see if x is modified. It's not. Simple as that. Unless you're using C+
+, in which case you have no way to tell either way.

Ben Bacarisse

unread,

Apr 8, 2008, 4:44:40 PM4/8/08

to

"Dann Corbit" <dco...@connx.com> writes:

> "Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message
> news:87wsn8m...@bsb.me.uk...

<snip>

>> It seemed to me that C++ was able to inline the sort comparison. Of
>> course all this will depend on the libraries involved as much as it
>> does on the compilers.
>
> Many C compilers have the ability to arbitrarily inline functions that are
> small enough to inline.

Yes, but very few (none?) can do it for the comparison function used
in qsort. I am sure it can be done, but I don't know of it being
done, which suggests it is tricky to get right.

I am not sure that this is what is happening, but the design of C++'s
STL encourages a lot of inlining in the templates used to implement
sort. Something is making the assembler that comes out of g++ -S ten
times bigger for the C version.

--
Ben.

Keith Thompson

unread,

Apr 8, 2008, 4:38:21 PM4/8/08

to

Kelsey Bjarnason <kbjar...@gmail.com> writes:
> [snips]
>
> On Tue, 08 Apr 2008 10:45:09 -0700, peter koch wrote:

[...]

>> I have
>> heard others complain like you, but never found statements such as the
>> above to be a real problem.
>
> Yet the problem remains: C++ makes the issue of whether a passed
> parameter gets modified or not absolutely indeterminate at point of call
> - meaning the poor bastard having to maintain the code has no hope, by
> reading the portion of code he's trying to work on, of figuring out what
> is actually going on.
>
> This is not a particularly good design.

[...]

This thread is cross-posted between comp.lang.c and comp.lang.c++,
which is rarely a good idea, especially when what's being discussed is
a controversial difference between the two languages.

I'll just point out that there are plenty of programming languages in
which functions (procedures, subprograms, whatever) can modify their
arguments, and programmers who use those languages generally manage to
cope with the pain.

If I were inventing a new language, I might consider requiring a
special syntax to distinguish between arguments that are passed by
value and arguments whose value might be changed. For example:

sin(x) /* passed by value, can't modify x */
set_sin(@x) /* can modify x */

peter koch

unread,

Apr 8, 2008, 5:09:27 PM4/8/08

to

On 8 Apr., 21:40, Kelsey Bjarnason <kbjarna...@gmail.com> wrote:
> [snips]
>
> On Tue, 08 Apr 2008 10:45:09 -0700, peter koch wrote:
> > Right. But this is only true in such trivial cases as the above where
> > you allow such an absurd name. Replace the name with some reasonable
> > one, such as square or fibonacci, and no one will be in doubt.
>
> Really? Fine: try this:
>
> string a, b;
>
> a = somevalue;
> b = trim(a);

In C++ you would never write as above. It would be
string a(somevalue);
string b(trim(a));

or

string a = somevalue;
string b = trim(a);

>
> If the intent of trim is to act like the similar function in PHP - i.e.
> to remove leading and trailing whitespace - what does this tell us about
> the effect, if any, on a?

I would assume this would mean that there would be no effect on a. If
trim would trim its parameter, the name of the function would indicate
this.

>
> Oh, right, nothing. Hmm. So what's a good name for this? How about, oh,
> trim_which_modifies_its_parameter()? I don't want to type that.

to_trim would be my choice, as_trim being the non-modifying name in
those rare cases where I would be in doubt.

>
> > I have
> > heard others complain like you, but never found statements such as the
> > above to be a real problem.
>
> Yet the problem remains: C++ makes the issue of whether a passed
> parameter gets modified or not absolutely indeterminate at point of call
> - meaning the poor bastard having to maintain the code has no hope, by
> reading the portion of code he's trying to work on, of figuring out what
> is actually going on.
>
> This is not a particularly good design.

I can't see how else you could design the code (unless you suggest
demanding pointers for the parameters which is not at all clearer), so
lets not discuss the design of C++ here. It is designed as it is and
not anything we can really do anything about.

>
> > A much larger problem has been in absurd declaration such as the above.
> > Are you quite sure that it should read int x = 3; int y; (as you imply)
>
> I implied that? Where? I wrote int x = 3, y; There's only one semicolon
> in there, only one instance of "int". Why would you think this somehow
> "implies" a completely different manner of writing code? Does your way
> of writing it imply it should be re-done in Pascal? No? Why not? You
> seem to be taking purely arbitrary paths, so why not that one?

It was just a remark about clarity - namely that I found your way of
declaring the variables unclear.
Also I do not see the connotion to Pascal. Here it was allowed (I
believe!) to declare several variables together - e.g. var i,j:
integer;

>
> > If you need to look at the code, you should know what func does anyway!
>
> Why? Since it cannot modify the passed parameter, what it does is
> fundamentally irrelevant to debugging the local problem. That is, unless
> someone invented a method of passing parameters which would then be
> modified, but making this absolutely invisible to the maintenance
> programmer looking at the calling code.

If you need to have a look at a piece of code, do you seriously
suggest that this can be done without at least a casual knowledge
about what called functions do?

>
> Oh, yeah, they did exactly that, didn't they? Good, now instead of
> focusing on *this* piece of code, one has to examine every called
> function, not for its effects on output and the like, but its effects on
> *local* variables it gives absolutely no indication it will be modifying.
>
> Information hiding can be good. This sort is bad.

Certainly. That is why you choose your function (and variable) names
with care.

/Peter

peter koch

unread,

Apr 8, 2008, 5:14:40 PM4/8/08

to

On 8 Apr., 20:21, Keith Thompson <ks...@mib.org> wrote:

> peter koch <peter.koch.lar...@gmail.com> writes:
> > On 8 Apr., 19:20, Kelsey Bjarnason <kbjarna...@gmail.com> wrote:
> [...]
> >> int x = 3, y;
>
> >> y = func(x);
>
> >> Quick: what's the value of x?
> [...]
> > A much larger problem has been in absurd declaration such as the
> > above. Are you quite sure that it should read int x = 3; int y; (as
> > you imply) and not int x = (3,y); ? ;-)
>
> [...]
>
> Yes. An initializer must be an assigment-expression, which means it
> cannot have a comma operator at its top level.
>
> The same restriction applies to function arguments. In both cases,
> it's because the comma is used as a delimiter in that context, so it's
> not allowed as an operator.
>
> I suppose this:
> int x = 3;
> int y;
> would be clearer, but I have no problem with either form.

Yup - it would be much clearer. I probably would not let that one pass
in a C++ code review (also because there is a potentially
uninitialised variable), but of course C is a different beast.

I certainly would not let int* i,j pass in either language.

/Peter

Richard Tobin

unread,

Apr 8, 2008, 5:30:09 PM4/8/08

to

In article <874pacm...@bsb.me.uk>,
Ben Bacarisse <ben.u...@bsb.me.uk> wrote:

>> Many C compilers have the ability to arbitrarily inline functions that are
>> small enough to inline.

>Yes, but very few (none?) can do it for the comparison function used
>in qsort. I am sure it can be done, but I don't know of it being
>done, which suggests it is tricky to get right.

It could work if qsort() itself was inlined as well as the comparison
function, because then the comparison function would be known at compile
time.

This would use more space, but in principle a compiler could generate
"curried" versions of qsort() for each (known-at-compile-time) comparison
function, to avoid replication of the code where the the same comparison
function is used in multiple calls to qsort().

(Currying is "partially evaluating" a function after fixing some of
its arguments.)

-- Richard
--
:wq

Ian Collins

unread,

Apr 8, 2008, 5:58:37 PM4/8/08

to

Kelsey Bjarnason wrote:
>
> Yet the problem remains: C++ makes the issue of whether a passed
> parameter gets modified or not absolutely indeterminate at point of call
> - meaning the poor bastard having to maintain the code has no hope, by
> reading the portion of code he's trying to work on, of figuring out what
> is actually going on.
>

So you would sacrifice the convenience of operator overloading to avoid
an obscure corner case that any half decent tagging editor would solve
in a blink of an eye?

--
Ian Collins.

Keith Thompson

unread,

Apr 8, 2008, 6:06:33 PM4/8/08

to

Inlining qsort's comparison function should be straightforward *if*
the compiler is able to inline (a call to) the qsort() function
itself. That would require either the ability to inline across
separately compiled translation units, or an implementation that in
effect delays compilation until link time.

Keith Thompson

unread,

Apr 8, 2008, 7:01:48 PM4/8/08

to

peter koch <peter.ko...@gmail.com> writes:
> On 8 Apr., 20:21, Keith Thompson <ks...@mib.org> wrote:

[...]

>> > On 8 Apr., 19:20, Kelsey Bjarnason <kbjarna...@gmail.com> wrote:
>> >> int x = 3, y;

[...]

>> I suppose this:
>> int x = 3;
>> int y;
>> would be clearer, but I have no problem with either form.
>
> Yup - it would be much clearer. I probably would not let that one pass
> in a C++ code review (also because there is a potentially
> uninitialised variable), but of course C is a different beast.
>
> I certainly would not let int* i,j pass in either language.

Agreed. But this:
int *i, j;
might be acceptable in C (except that "i" is a terrible name for a
pointer). But yes, I'd prefer
int *i; // or "int* i;" if you insist
int j;
in either language.

Note that this:
int *ptr;
tends to be the preferred spacing in C, and this:
int* ptr;
tends to be the preferred spacing in C++. There are valid reasons for
either convention.

Dann Corbit

unread,

Apr 8, 2008, 7:07:21 PM4/8/08

to

"Keith Thompson" <ks...@mib.org> wrote in message
news:87lk3of...@kvetch.smov.org...

> Ben Bacarisse <ben.u...@bsb.me.uk> writes:
>> "Dann Corbit" <dco...@connx.com> writes:
>>
>>> "Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message
>>> news:87wsn8m...@bsb.me.uk...
>> <snip>
>>>> It seemed to me that C++ was able to inline the sort comparison. Of
>>>> course all this will depend on the libraries involved as much as it
>>>> does on the compilers.
>>>
>>> Many C compilers have the ability to arbitrarily inline functions that
>>> are
>>> small enough to inline.
>>
>> Yes, but very few (none?) can do it for the comparison function used
>> in qsort. I am sure it can be done, but I don't know of it being
>> done, which suggests it is tricky to get right.
>>
>> I am not sure that this is what is happening, but the design of C++'s
>> STL encourages a lot of inlining in the templates used to implement
>> sort. Something is making the assembler that comes out of g++ -S ten
>> times bigger for the C version.
>
> Inlining qsort's comparison function should be straightforward *if*
> the compiler is able to inline (a call to) the qsort() function
> itself. That would require either the ability to inline across
> separately compiled translation units, or an implementation that in
> effect delays compilation until link time.

Also, I would have used this comparison function:

#include <float.h>
#include <math.h>

int double_compare(void *pd1, void *pd2)
{
double d1 = *(double *) pd1;
double d2 = *(double *) pd2;
if (d1 > d2)
if ((d1 - d2) < fabs(d1 * DBL_EPSILON))
return 0;
else
return 1;
if (d1 < d2)
if ((d2 - d1) < fabs(d2 * DBL_EPSILON))
return 0;
else
return -1;
return 0;
}

The actual problem proposed should logically have been solved like this:

#include <stdlib.h>
#include <float.h>
#include <math.h>

typedef double Etype;

extern Etype RandomSelect(Etype * A, size_t p, size_t r, size_t i);
extern size_t RandRange(size_t a, size_t b);
extern size_t RandomPartition(Etype * A, size_t p, size_t r);
extern size_t Partition(Etype * A, size_t p, size_t r);

/*
**
** In the following code, every reference to CLR means:
**
** "Introduction to Algorithms"
** By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest
** ISBN 0-07-013143-0
*/

/*
** CLR, page 187
*/
Etype RandomSelect(Etype A[], size_t p, size_t r, size_t i)
{
size_t q,
k;
if (p == r)
return A[p];
q = RandomPartition(A, p, r);
k = q - p + 1;

if (i <= k)
return RandomSelect(A, p, q, i);
else
return RandomSelect(A, q + 1, r, i - k);
}

size_t RandRange(size_t a, size_t b)
{
size_t c = (size_t) ((double) rand() / ((double) RAND_MAX + 1)
* (b - a));
return c + a;
}

/*
** CLR, page 162
*/
size_t RandomPartition(Etype A[], size_t p, size_t r)
{
size_t i = RandRange(p, r);
Etype Temp;
Temp = A[p];
A[p] = A[i];
A[i] = Temp;
return Partition(A, p, r);
}

/*
** CLR, page 154
*/
size_t Partition(Etype A[], size_t p, size_t r)
{
Etype x,
temp;
size_t i,
j;

x = A[p];
i = p - 1;
j = r + 1;

for (;;) {
do {
j--;
} while (!(A[j] <= x));
do {
i++;
} while (!(A[i] >= x));
if (i < j) {
temp = A[i];
A[i] = A[j];
A[j] = temp;
} else
return j;
}
}

#include <stdlib.h>
#include <stdio.h>
#include <time.h>
static const double CLOCK_INV = 1.0 / CLOCKS_PER_SEC;

int compare(const void *p, const void *q)
{
register double p0 = *(double *) p;
register double q0 = *(double *) q;
if (p0 > q0)
return 1;
if (p0 < q0)
return -1;
return 0;
}

void quit()
{
fprintf(stderr, "memory exhausted\n");
exit(1);
}

int main(int argc, char *argv[])
{

int res = 100000;
char *file = argv[1];

double median = 0;
double mean = 0;
int n = 0;

double d;
clock_t start,
end;

double *buf = (double *) malloc(sizeof(double) * res);

FILE *fin = fopen(file, "r");

if (buf == 0)
quit();
start = clock();

while (fscanf(fin, "%lg", &d) == 1) {
if (n == res) {
res += res;

buf = (double *) realloc(buf, sizeof(double) * res);
if (buf == 0)

quit();
}
buf[n++] = d;
mean = (n == 1) ? d : mean + (d - mean) / n;
}

/*
if n is odd, then we use (n+1)/2
If n is even, we use [(n/2)th term + (n/2+1)th term] / 2
*/
if (n) {

int mid = n / 2;

median = (n % 2) ? RandomSelect(buf, 0, n - 1, mid) :
(RandomSelect(buf, 0, n - 1, mid-1) + RandomSelect(buf, 0, n - 1, mid))/2;
}
/*

qsort(buf, n, sizeof(double), compare);
if (n) {

int mid = n / 2;

median = (n % 2) ? buf[mid] : (buf[mid - 1] + buf[mid]) / 2;
}
*/

end = clock();
printf("number of elements = %d, median = %g, mean = %g in %f
seconds\n", n, median, mean, (end - start) * CLOCK_INV);
free(buf);
return 0;
}

/*
It would be a lot faster for odd numbered data sets:
C:\tmp>ksel n.txt
number of elements = 5000000, median = 1.00008, mean = 5.42094 in 8.000000
seconds

C:\tmp>ksel n.txt
number of elements = 5000000, median = 1.00008, mean = 5.42094 in 7.375000
seconds

C:\tmp>ksel n.txt
number of elements = 5000000, median = 1.00008, mean = 5.42094 in 7.437000
seconds
*/

Dann Corbit

unread,

Apr 8, 2008, 7:16:13 PM4/8/08

to

"Dann Corbit" <dco...@connx.com> wrote in message
news:a18a8$47fbfaaa$19...@news.teranews.com...
[snip]

> /*
> if n is odd, then we use (n+1)/2
> If n is even, we use [(n/2)th term + (n/2+1)th term] / 2
> */
> if (n) {
> int mid = n / 2;
> median = (n % 2) ? RandomSelect(buf, 0, n - 1, mid) :
> (RandomSelect(buf, 0, n - 1, mid-1) + RandomSelect(buf, 0, n - 1, mid))/2;
> }

[snip]

>
> /*
> It would be a lot faster for odd numbered data sets:
> C:\tmp>ksel n.txt
> number of elements = 5000000, median = 1.00008, mean = 5.42094 in 8.000000
> seconds
>
> C:\tmp>ksel n.txt
> number of elements = 5000000, median = 1.00008, mean = 5.42094 in 7.375000
> seconds
>
> C:\tmp>ksel n.txt
> number of elements = 5000000, median = 1.00008, mean = 5.42094 in 7.437000
> seconds
> */

I guessed wrong, the extra cost for a second call to RandomSelect is
negligible:
C:\tmp>ksel n.txt
number of elements = 5000001, median = 1.00008, mean = 5.42094 in 7.328000
seconds

C:\tmp>ksel n.txt
number of elements = 5000001, median = 1.00008, mean = 5.42094 in 7.375000
seconds

I did not profile, but I guess that we are now limited by file I/O.

Keith Thompson

unread,

Apr 8, 2008, 7:47:07 PM4/8/08

to

"Dann Corbit" <dco...@connx.com> writes:
> "Keith Thompson" <ks...@mib.org> wrote in message
> news:87lk3of...@kvetch.smov.org...

[...]

>> Inlining qsort's comparison function should be straightforward *if*
>> the compiler is able to inline (a call to) the qsort() function
>> itself. That would require either the ability to inline across
>> separately compiled translation units, or an implementation that in
>> effect delays compilation until link time.
>
> Also, I would have used this comparison function:
>
> #include <float.h>
> #include <math.h>
>
> int double_compare(void *pd1, void *pd2)
> {
> double d1 = *(double *) pd1;
> double d2 = *(double *) pd2;
> if (d1 > d2)
> if ((d1 - d2) < fabs(d1 * DBL_EPSILON))
> return 0;
> else
> return 1;
> if (d1 < d2)
> if ((d2 - d1) < fabs(d2 * DBL_EPSILON))
> return 0;
> else
> return -1;
> return 0;
> }

[...]

Bad idea. Treating two floating-point values as "equal" if they're
sufficiently close might be useful in some contexts, but not in a
qsort comparison function.

Suppose your array contains three elements X, Y, Z whose values are
very close to each other. Your comparison function could cause qsort
to treat them as if X==Y and Y==Z, but X < Z. Hilarity ensues.

What's wrong with just using the predefined "<" and "=="?

Peter Nilsson

unread,

Apr 8, 2008, 8:10:48 PM4/8/08

to

Keith Thompson wrote:
> ... Treating two floating-point values as "equal" if they're

> sufficiently close might be useful in some contexts, but
> not in a qsort comparison function.
>
> Suppose your array contains three elements X, Y, Z
> whose values are very close to each other. Your
> comparison function could cause qsort to treat them
> as if X==Y and Y==Z, but X < Z. Hilarity ensues.
>
> What's wrong with just using the predefined "<" and "=="?

Just "<" will do...

int dbl_cmp(const void *lv, const void *rv)
{
const double *ld = lv;
const double *rd = rv;
return !(*ld < *rd) - (*ld < *rd);
}

Care still needs to be taken though with infinities and nans.

--
Peter

Dann Corbit

unread,

Apr 8, 2008, 8:14:05 PM4/8/08

to

"Keith Thompson" <ks...@mib.org> wrote in message

news:87d4ozg...@kvetch.smov.org...

No it won't. Try it.

> What's wrong with just using the predefined "<" and "=="?

The principle of least surprise.

Keith Thompson

unread,

Apr 8, 2008, 8:34:01 PM4/8/08

to

"Dann Corbit" <dco...@connx.com> writes:
> "Keith Thompson" <ks...@mib.org> wrote in message
> news:87d4ozg...@kvetch.smov.org...
>> "Dann Corbit" <dco...@connx.com> writes:

[...]

Trying it can't demonstrate that the behavior is well defined.

I believe that giving qsort() a comparison function that doesn't
define a complete ordering on the elements invokes undefined behavior.
The standard doesn't explicitly say so, but I believe it's implicit
in the description:

The function shall return an integer less than, equal to, or
greater than zero if the first argument is considered to be
respectively less than, equal to, or greater than the second.

Equality is an equivalence relation; X==Y and Y==Z implies X==Z.

>> What's wrong with just using the predefined "<" and "=="?
>
> The principle of least surprise.

If qsort() gives me array[N] > array[N+1], I'll be very surprised.

You go to significant additional effort to ensure that sorting works
less well than it could. I fail to see the point.

Richard Tobin

unread,

Apr 8, 2008, 9:17:53 PM4/8/08

to

In article <3fa0d$47fc0a4e$28...@news.teranews.com>,
Dann Corbit <dco...@connx.com> wrote:

>>> int double_compare(void *pd1, void *pd2)
>>> {
>>> double d1 = *(double *) pd1;
>>> double d2 = *(double *) pd2;
>>> if (d1 > d2)
>>> if ((d1 - d2) < fabs(d1 * DBL_EPSILON))
>>> return 0;
>>> else
>>> return 1;
>>> if (d1 < d2)
>>> if ((d2 - d1) < fabs(d2 * DBL_EPSILON))
>>> return 0;
>>> else
>>> return -1;
>>> return 0;
>>> }

>> Suppose your array contains three elements X, Y, Z whose values are

>> very close to each other. Your comparison function could cause qsort
>> to treat them as if X==Y and Y==Z, but X < Z. Hilarity ensues.

>No it won't.

What won't what? Hilarity won't ensue?

Are you saying you can't have X "equal" to Y, Y equal to Z, but X less
than Z? Or are you saying this doesn't matter?

If the input is Z, Y, X, what stops qsort() leaving it unchanged, even though
X < Z?

>Try it.

I did. The program is below. The output is

comparisons: 0 0 -1
before: 1.25000000000000044409 1.25000000000000022204 1.25000000000000000000
after: 1.25000000000000044409 1.25000000000000022204 1.25000000000000000000

X compares equal to Y. Y compares equal to Z. X compares as less than Z.
When I sort them, Z wrongly comes before X.

#include <float.h>
#include <stdio.h>
#include <math.h>

int double_compare(void *pd1, void *pd2)
{
double d1 = *(double *) pd1;
double d2 = *(double *) pd2;
if (d1 > d2)
if ((d1 - d2) < fabs(d1 * DBL_EPSILON))
return 0;
else
return 1;
if (d1 < d2)
if ((d2 - d1) < fabs(d2 * DBL_EPSILON))
return 0;
else
return -1;
return 0;
}

int main(void)
{
double x = 1.25, y = x+DBL_EPSILON, z = y+DBL_EPSILON;
double a[3];

printf("comparisons: %d %d %d\n",
double_compare(&x, &y),
double_compare(&y, &z),
double_compare(&x, &z));

a[0] = z; a[1] = y; a[2] = x;
printf("before: %.20f %.20f %.20f\n", a[0], a[1], a[2]);
qsort(a, 3, sizeof(a[0]), double_compare);
printf("after: %.20f %.20f %.20f\n", a[0], a[1], a[2]);

return 0;
}

-- Richard

--
:wq

Dann Corbit

unread,

Apr 8, 2008, 9:37:01 PM4/8/08

to

"Keith Thompson" <ks...@mib.org> wrote in message

news:878wzng...@kvetch.smov.org...

I think you may be right, in that it is possible that some implementation of
qsort() conceivably might not terminate.

> I believe that giving qsort() a comparison function that doesn't
> define a complete ordering on the elements invokes undefined behavior.
> The standard doesn't explicitly say so, but I believe it's implicit
> in the description:
>
> The function shall return an integer less than, equal to, or
> greater than zero if the first argument is considered to be
> respectively less than, equal to, or greater than the second.
>
> Equality is an equivalence relation; X==Y and Y==Z implies X==Z.

If only floating point actually worked like that, things would be a lot
easier.

>>> What's wrong with just using the predefined "<" and "=="?
>>
>> The principle of least surprise.
>
> If qsort() gives me array[N] > array[N+1], I'll be very surprised.
>
> You go to significant additional effort to ensure that sorting works
> less well than it could. I fail to see the point.

You are assuming that the ordering:
0.99999999999999978, 1, 1.0000000000000002
Is somehow better than:
0.99999999999999978, 1.0000000000000002, 1
It isn't. In fact, it is not unlikely that all three of them are really the
same number.

Dann Corbit

unread,

Apr 8, 2008, 9:38:46 PM4/8/08

to

"Richard Tobin" <ric...@cogsci.ed.ac.uk> wrote in message
news:fth5g1$eid$1...@pc-news.cogsci.ed.ac.uk...

Ah, but those are all the same number.

Richard Tobin

unread,

Apr 8, 2008, 10:05:22 PM4/8/08

to

In article <692bc$47fc1e27$77...@news.teranews.com>,

Dann Corbit <dco...@connx.com> wrote:
>> comparisons: 0 0 -1
>> before: 1.25000000000000044409 1.25000000000000022204
>> 1.25000000000000000000
>> after: 1.25000000000000044409 1.25000000000000022204
>> 1.25000000000000000000

>Ah, but those are all the same number.

Not according to your comparison function. The third is less than the
first (see the -1 in the first output line). qsort() has produced an
ordering inconsistent with the comparison function.

If instead of three numbers I used millions, qsort() could produce a
list where the last value was smaller than the first by millions of
times DBL_EPSILON.

Your comparison function does not define a total order, as required
by the standard. Z <= Y and Y <= X, but not Z <= X.

-- Richard

--
:wq

Dann Corbit

unread,

Apr 8, 2008, 10:05:20 PM4/8/08

to

"Richard Tobin" <ric...@cogsci.ed.ac.uk> wrote in message

news:fth892$gv2$1...@pc-news.cogsci.ed.ac.uk...

If you see three double numbers and they differ in relative error by
DBL_EPSILON, I submit that the probability that they are different numbers
is approximately zero and the probility that they are the same number is
approximately 1.

Depending on the order those three numbers arrive, the result of comparisons
may order them differently. But that is quite alright, since it is what the
function ought to do.

CBFalconer

unread,

Apr 8, 2008, 10:04:07 PM4/8/08

to

Dann Corbit wrote:
> "Richard Tobin" <ric...@cogsci.ed.ac.uk> wrote
>
... snip ...

>
>> after: 1.25000000000000044409
>> 1.25000000000000022204
>> 1.25000000000000000000
>
> Ah, but those are all the same number.

No they're not. They are exactly:

125000000000000044409 / 100000000000000000000
125000000000000022204 / 100000000000000000000
and 5 / 4

However, they may have arisen differently :-)

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

Richard Tobin

unread,

Apr 8, 2008, 10:22:29 PM4/8/08

to

In article <2a57f$47fc2463$10...@news.teranews.com>,

Dann Corbit <dco...@connx.com> wrote:
>Depending on the order those three numbers arrive, the result of comparisons
>may order them differently.

Or the sort may not terminate, or something else. Even if it
terminates, you now have a "sorted" array that (for example) will not
work with bsearch(), because it is not consistent with the comparison
function.

The fact is that your proposed function does not fulfill the
requirements of the standard. Programs that use it may give wrong
answers with no indication of error when presented with certain data.

-- Richard

--
:wq

Dann Corbit

unread,

Apr 8, 2008, 10:44:46 PM4/8/08

to

"Richard Tobin" <ric...@cogsci.ed.ac.uk> wrote in message

news:fth995$h5o$1...@pc-news.cogsci.ed.ac.uk...

> In article <2a57f$47fc2463$10...@news.teranews.com>,
> Dann Corbit <dco...@connx.com> wrote:
>>Depending on the order those three numbers arrive, the result of
>>comparisons
>>may order them differently.
>
> Or the sort may not terminate, or something else. Even if it
> terminates, you now have a "sorted" array that (for example) will not
> work with bsearch(), because it is not consistent with the comparison
> function.

I submit that it is consistent. If you really think that 1+eps is not the
same number as eps for double precision floating point, then I think you
should imagine how such numbers might arrive in your data base. I submit
that it will not interfere with correct operation of bsearch(), since
bsearch() should work in exactly the same way that qsort does for
comparisons.

> The fact is that your proposed function does not fulfill the
> requirements of the standard. Programs that use it may give wrong
> answers with no indication of error when presented with certain data.

It returns an ordering. The standard never says that the ordering must
always be identical (and indeed, such an ordering would be impossible on
many implementations due to the nature of floating point).

ozbear

unread,

Apr 8, 2008, 11:11:14 PM4/8/08

to

Your objection is correct, but specious.

There are a myriad of C functions (both that you/someone else wrote)
as well as ones in the stdio/stdlib/etc., that take pointers as
parameters. You cannot tell just by looking at the call to them
whether they modify what's being pointed at either.

Oz
--
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Keith Thompson

unread,

Apr 8, 2008, 11:52:59 PM4/8/08

to

"Dann Corbit" <dco...@connx.com> writes:
> "Keith Thompson" <ks...@mib.org> wrote in message

> news:878wzng...@kvetch.smov.org...
[...]

>> You go to significant additional effort to ensure that sorting works
>> less well than it could. I fail to see the point.
>
> You are assuming that the ordering:
> 0.99999999999999978, 1, 1.0000000000000002
> Is somehow better than:
> 0.99999999999999978, 1.0000000000000002, 1
> It isn't.

It is better, even by your own standards.

> In fact, it is not unlikely that all three of them are
> really the same number.

Here's my modified version of Richard Tobin's program:

#include <float.h>
#include <stdio.h>
#include <math.h>

int double_compare(void *pd1, void *pd2)
{
double d1 = *(double *) pd1;
double d2 = *(double *) pd2;
if (d1 > d2)

if ((d1 - d2) < fabs(d1 * DBL_EPSILON)) {
printf("%.20f == %.20f\n", d1, d2);
return 0;
}
else {
printf("%.20f > %.20f\n", d1, d2);

return 1;
}
if (d1 < d2)

if ((d2 - d1) < fabs(d2 * DBL_EPSILON)) {
printf("%.20f == %.20f\n", d1, d2);
return 0;
}
else {
printf("%.20f < %.20f\n", d1, d2);
return -1;
}
return 0;
}

int main(void)
{
double x = 1.25, y = x+DBL_EPSILON, z = y+DBL_EPSILON;
double a[3];

a[0] = z; a[1] = y; a[2] = x;

printf("before: %.20f %.20f %.20f\n", a[0], a[1], a[2]);
qsort(a, 3, sizeof(a[0]), double_compare);
printf("after: %.20f %.20f %.20f\n", a[0], a[1], a[2]);

printf("a[0] vs. a[1]: ");
(void)double_compare(&a[0], &a[1]);

printf("a[1] vs. a[2]: ");
(void)double_compare(&a[1], &a[2]);

printf("a[0] vs. a[2]: ");
(void)double_compare(&a[0], &a[2]);

return 0;
}

And here's the output I get:

before: 1.25000000000000044409 1.25000000000000022204 1.25000000000000000000
1.25000000000000044409 == 1.25000000000000022204
1.25000000000000022204 == 1.25000000000000000000
after: 1.25000000000000044409 1.25000000000000022204 1.25000000000000000000
a[0] vs. a[1]: 1.25000000000000044409 == 1.25000000000000022204
a[1] vs. a[2]: 1.25000000000000022204 == 1.25000000000000000000
a[0] vs. a[2]: 1.25000000000000044409 > 1.25000000000000000000

Note that *according to your own comparison function* a[0] and a[2]
are ordered incorrectly.

This happens because your comparison function is inconsistent. qsort
isn't misbehaving; it's doing exactly what it's supposed to do with
the information available to it.

Now if you could divide the set of real numbers into intervals such
that:

All numbers within a given interval are equal.

For any two intervals Foo and Bar, either all numbers in Foo are
less than all numbers in Bar, or all numbers in Foo are
greater than all numbers in Bar

then you could have a consistent comparison function. But it would
have problems; two numbers that are arbitrarily close together could
be unequal, while two numbers farther apart (say, at opposite ends of
an interval) could be equal.

It's *so* much simpler *and better* to use the predefined comparison
operators (so that, in the model of the previous paragraph, each
distinct number is its own interval).

I think you're assuming that there's something fundamentally wrong
with built-in floating-point comparisons. There isn't. You just have
to be careful with them.

Dann Corbit

unread,

Apr 9, 2008, 12:12:43 AM4/9/08

to

"Keith Thompson" <ks...@mib.org> wrote in message

news:874pabf...@kvetch.smov.org...

The comparison function:
return a > b ? 1 : a < b ? -1 : 0;
will also perform just as I have demonstrated on a conforming
implementation, as will yours.

Consider a machine with 80 bit floating point. It's not so far fetched, as
both Microsoft and Intel compilers will allow 80 bits of internal precision
by choosing options accordingly.

Now, imagine the values 1.25000000000000000001 and 1.25000000000000000000
are compared in two registers.
We decide that the first is bigger than the second.
Now, an interrupt fires. These values get shoved into 64 bit registers and
this time they compare equal. Is the comparison broken? Is the floating
point broken? No, and no.

What I am saying is that literally 1.25000000000000000001 and
1.25000000000000000000 are the same number if you encounter them. Consider
how numbers such as these might come about. You might have added part of a
palatte and part of a palatte or you might have combined the eaches. For
example:

(3/14 palatte) * 2 palattes + (8/14 palatte) * one palatte * 30 eaches per
palatte * 6 items per each

verses adding the individual eaches. Or something of that nature. Floating
point itself is often called an "approximate numeric" type because it does
not represent values exactly.

It is my opinion that your ordering is not better than my ordering and that
assuming 1.25000000000000000001 and 1.25000000000000000000 are different
numbers is a mistake. They are almost *surely* the same number.

Juha Nieminen

unread,

Apr 9, 2008, 12:58:14 AM4/9/08

to

Richard Heathfield wrote:
> lbon...@yahoo.com said:
>
> <snip>
>
>> And in C, there's absolutely no way to know who's fiddling around with
>> your struct members because you can't restrict access to them.
>
> Wrong. Maybe *you* can't restrict access to them, but I can, and so can
> quite a few others in comp.lang.c. The technique is not difficult.

I have always detested this kind of argument. "Maybe you can't, but we
who are better than you can", "it's quite easy", and then don't even
show this "easy" solution at all (which usually involves ugly hacks and
jumping through completely unnecessary hoops).

Reminds me of the C-hackers who insist that it's "easy" and "feasible"
to use dynamically bound "member functions" in structs, as if that
somehow made C as good as C++ (yet the big problem with that hack is
that each "member function" of the struct increases the size of the
struct, which in many cases is a completely useless waste and makes that
struct larger and more inefficient).

Chris Thomasson

unread,

Apr 9, 2008, 1:39:14 AM4/9/08

to

"Juha Nieminen" <nos...@thanks.invalid> wrote in message
news:47fc487e$0$23834$4f79...@news.tdc.fi...

Is this on topic:

http://groups.google.com/group/comp.lang.c/msg/dbb16567e49c42fc

?

Richard Heathfield

unread,

Apr 9, 2008, 1:53:52 AM4/9/08

to

Juha Nieminen said:

> Richard Heathfield wrote:
>> lbon...@yahoo.com said:
>>
>> <snip>
>>
>>> And in C, there's absolutely no way to know who's fiddling around with
>>> your struct members because you can't restrict access to them.
>>
>> Wrong. Maybe *you* can't restrict access to them, but I can, and so can
>> quite a few others in comp.lang.c. The technique is not difficult.
>
> I have always detested this kind of argument.

Let's see whether I can amend it to your liking.

> "Maybe you can't, but we
> who are better than you can",

Not what I meant. Mr lbonafide said that there's absolutely no way, so I
presume he accepts that he can't do it, and that his inability to do it is
not in dispute. That doesn't mean I'm "better" than him. It does, however,
appear to mean that I know something he doesn't - which is easily fixed,
of course.

> "it's quite easy",

Aye, it is. Sorry.

> and then don't even
> show this "easy" solution at all

See below.

> (which usually involves ugly hacks and
> jumping through completely unnecessary hoops).

Um, it's not /that/ ugly a hack, and I assure you I'll only jump through
*necessary* hoops.

The technique involves a minimum of three files - the interface (.h), the
implementation (.c, which you *don't release*), and the driver, which the
user-programmer writes.

Here is our interface (which we ship):

#ifndef H_POINT
#define H_POINT 1

#include <limits.h>

#define POINT_INVALID INT_MIN

typedef struct point_ point;

point *point_make(int, int);
void point_destroy(point **);
int point_setx(point *, int);
int point_sety(point *, int);
int point_set(point *, int, int);
double point_diff(const point *, const point *);
#endif

The important point (if you'll forgive the word) is that the point type is
declared but *not* defined, so its members are not visible. More about
that shortly.

The totally unimportant point is that this is just an example, and so you
shouldn't expect to see a complete function suite here!

Implementation file (NOT shipped):

#include "point.h"

#include <stdlib.h>
#include <math.h>

struct point_
{
int x;
int y;
};

point *point_make(int x, int y)
{
point *new = malloc(sizeof *new);
if(new != NULL)
{
new->x = x;
new->y = y;
}
return new;
}
void point_destroy(point **old)
{
if(old != NULL)
{
free(*old);
*old = NULL;
}
}

int point_setx(point *p, int x)
{
int old = POINT_INVALID;

if(p != NULL)
{
old = p->x;
p->x = x;
}
return old;
}

int point_sety(point *p, int y)
{
int old = POINT_INVALID;

if(p != NULL)
{
old = p->y;
p->y = y;
}
return old;
}

int point_set(point *p, int x, int y)
{
int rc = POINT_INVALID;

if(p != NULL)
{
rc = 0;
p->x = x;
p->y = y;
}
return rc;
}

double point_diff(const point *pa, const point *pb)
{
double d = 0.0;

if(pa != NULL && pb != NULL)
{
double xd = pa->x - pb->x;
double yd = pa->y - pb->y;
d = sqrt(xd * xd + yd * yd);
}

return d;
}

We don't ship this to the user. Instead, we compile it, put the resulting
object code into a library, and ship the library.

Test driver:

#include "point.h"

#include <stdio.h>

int main(void)
{
point *p1 = point_make(1, 1);
if(p1 != NULL)
{
point *p2 = point_make(4, 5);
if(p2 != NULL)
{
double d = point_diff(p1, p2);
printf("The hypotenuse is %f\n", d);
point_destroy(&p2);
}
point_destroy(&p1);
}
return 0;
}

This works. (Try it.)

The following test driver does *not* work:

#include "point.h"

#include <stdio.h>

int main(void)
{
point *p1 = point_make(1, 1);
size_t size = sizeof *p1; /* can't do this */
point instance; /* can't do this either */

instance.x = 42; /* or this! */

if(p1 != NULL)
{
point *p2 = point_make(4, 5);
if(p2 != NULL)
{
double d = point_diff(p1, p2);
printf("The hypotenuse is %f\n", d);
point_destroy(&p2);
}
point_destroy(&p1);
}
return 0;
}

When I compile this, I get the following errors:

driver.c: In function `main':
driver.c:8: dereferencing pointer to incomplete type
driver.c:9: storage size of `instance' isn't known

Thus, you *can* restrict access to members of a struct, QED.

> Reminds me of the C-hackers who insist that it's "easy" and "feasible"
> to use dynamically bound "member functions" in structs, as if that
> somehow made C as good as C++ (yet the big problem with that hack is
> that each "member function" of the struct increases the size of the
> struct, which in many cases is a completely useless waste and makes that
> struct larger and more inefficient).

<shrug> The "as if that somehow made C as good as C++" bit is mere flame
bait. C and C++ are different languages with different goals and different
user bases. To claim that one is "better" than another, without specifying
in what way it is better and for whom, is to fail to understand that there
are good reasons for having more than one programming language.

Whilst it /is/ not only feasible but even easy to use dynamically bound
member functions in structs, it is certainly true that this incurs an
overhead for the pointer in every instance of the struct, and this cost
should be considered when deciding whether or not to pursue this course.

The way I resolve this myself is simple - either I'm going to be using a
lot of structs, in which case they're probably going to be in some kind of
container, so I'll put the pointers in the container instead - which is
normally okay because all the structs in any given container are generally
going to want to be processed in the same way), or I only have a few, in
which case the overhead is no big deal.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Keith Thompson

unread,

Apr 9, 2008, 3:50:16 AM4/9/08

to

"Dann Corbit" <dco...@connx.com> writes:
> "Keith Thompson" <ks...@mib.org> wrote in message
> news:874pabf...@kvetch.smov.org...
>> "Dann Corbit" <dco...@connx.com> writes:
>>> "Keith Thompson" <ks...@mib.org> wrote in message
>>> news:878wzng...@kvetch.smov.org...
>> [...]
>>>> You go to significant additional effort to ensure that sorting works
>>>> less well than it could. I fail to see the point.
>>>
>>> You are assuming that the ordering:
>>> 0.99999999999999978, 1, 1.0000000000000002
>>> Is somehow better than:
>>> 0.99999999999999978, 1.0000000000000002, 1
>>> It isn't.
>>
>> It is better, even by your own standards.
>>
>>> In fact, it is not unlikely that all three of them are
>>> really the same number.
>>
>> Here's my modified version of Richard Tobin's program:
>>

[snip]

>>
>> And here's the output I get:
>>

[snip]

>>
>> Note that *according to your own comparison function* a[0] and a[2]
>> are ordered incorrectly.
>>
>> This happens because your comparison function is inconsistent. qsort
>> isn't misbehaving; it's doing exactly what it's supposed to do with
>> the information available to it.

[...]

>
> The comparison function:
> return a > b ? 1 : a < b ? -1 : 0;
> will also perform just as I have demonstrated on a conforming
> implementation, as will yours.
>
> Consider a machine with 80 bit floating point. It's not so far fetched, as
> both Microsoft and Intel compilers will allow 80 bits of internal precision
> by choosing options accordingly.
>
> Now, imagine the values 1.25000000000000000001 and 1.25000000000000000000
> are compared in two registers.
> We decide that the first is bigger than the second.
> Now, an interrupt fires. These values get shoved into 64 bit registers and
> this time they compare equal. Is the comparison broken? Is the floating
> point broken? No, and no.

Yes, I'd say it's broken, if the behavior of the operation changes if
it's interrupted. On the other hand, the C standard's requirements
for floating-point accuracy are sufficiently loose that it's probably
not a conformance violation.

> What I am saying is that literally 1.25000000000000000001 and
> 1.25000000000000000000 are the same number if you encounter them. Consider
> how numbers such as these might come about. You might have added part of a
> palatte and part of a palatte or you might have combined the eaches. For
> example:
>
> (3/14 palatte) * 2 palattes + (8/14 palatte) * one palatte * 30 eaches per
> palatte * 6 items per each
>
> verses adding the individual eaches. Or something of that nature. Floating
> point itself is often called an "approximate numeric" type because it does
> not represent values exactly.

No, they are *not* literally the same number. They are two different
numbers that might (or might not) happen to be the result of
calculations that logically should have yielded the same value. If
they were literally the same number, they wouldn't be
1.25000000000000000001 and 1.25000000000000000000; they'd be
1.25000000000000000001 and 1.25000000000000000001.

> It is my opinion that your ordering is not better than my ordering and that
> assuming 1.25000000000000000001 and 1.25000000000000000000 are different
> numbers is a mistake. They are almost *surely* the same number.

So what?

We've already demonstrated cases where numbers that are unequal even
by your standard are incorrectly sorted using your comparison
function. (I tried 10000 elements and got some interesting results.)

At the very least, my method is no worse than yours. If

1.25000000000000000001 and 1.25000000000000000000 are "the same

number" then *it doesn't matter* how they compare. You add a
multiplication, a subtraction, a call to fabs(), and a second
comparison to each comparison, all for no benefit.

In the context of sorting, assuming that two very close numbers are
different does no harm while saving considerable time and complexity.

Nick Keighley

unread,

Apr 9, 2008, 4:05:25 AM4/9/08

to

On 8 Apr, 18:37, Richard <de...@gmail.com> wrote:

<snip>

> Heres another C++ snippet:
>
> int x=0,y=0;
>
> x=1;
> x=x+y;
>
> y=x+2;
>
> Quick what is y?
>
> Answer no idea. Because you don't know what "+" does half the time .....
>
> The clc crowd who fix bugs from a printout would not fare too well
> methinks ....

we'd have the printout for operator+()

:-)

--
Nick Keighley

Richard Tobin

unread,

Apr 9, 2008, 4:59:05 AM4/9/08

to

In article <c592b$47fc2d9f$13...@news.teranews.com>,
Dann Corbit <dco...@connx.com> wrote:

>I submit that it is consistent. If you really think that 1+eps is not the
>same number as eps for double precision floating point, then I think you
>should imagine how such numbers might arrive in your data base.

I quite agree that for practical purposes they maybe the same number,
but building this idea into a qsort() comparison function does no good
and may break things.

>I submit
>that it will not interfere with correct operation of bsearch(), since
>bsearch() should work in exactly the same way that qsort does for
>comparisons.

It works in the same way, but bsearch() assumes that the order of the
array is consistent with the comparison function, and if we use your
function it isn't.

Suppose we have x, x+e, and x+2e, x+3e, x+4e. Your comparison
function may sort them as x+4e, x+3e, x+2e, x+e, x. Now you search
for x. bsearch() will compare it with x+2e, and find it's less. It
will then compare it with x+3e and x+4e, and conclude that it is not
present in the array. It will give the wrong answer.

>It returns an ordering. The standard never says that the ordering must
>always be identical (and indeed, such an ordering would be impossible on
>many implementations due to the nature of floating point).

It says it must be a total order. Yours isn't.

-- Richard
--
:wq

ger...@gmail.com

unread,

Apr 9, 2008, 11:31:32 AM4/9/08

to

On Apr 5, 11:28 pm, Richard Heathfield <r...@see.sig.invalid> wrote:
> copx said:
>
> > In "Learning Standard C++ as a New Language" Bjarne Stroustrup claims
> > that properly written C++ outperforms C code. I will just copy his first
> > example here, which is supposed to demonstrate how C++ abstractions do
> > not only make code easier to understand but also make it more efficient:
> > ===
> > The simplest specific example I can think of is a program to find the
> > mean and median of a sequence of double precision floating-point numbers
> > read from input. A conventional C-style solution would be:
>
> <snip>
>
> The C-style solution did not compile. When I fixed it minimally, it
> segfaulted. So I wrote my own.
>
>
>
> > To compare, here is an idiomatic C++ solution:
>
> <snip>
>
> Results on my system:
> C:
> number of elements = 5000000, median = -9.10326e+08, mean = -1.40759e+10
>
> real 0m29.955s
> user 0m25.840s
> sys 0m3.500s
>
> C++:
> number of elements = 5000000 , median = -9.10326e+08, mean = -1.40759e+10
>
> real 0m22.701s
> user 0m12.260s
> sys 0m3.970s
>
> > He goes on to claim that, using a sample set of 5,000,000 elements, the
> > C++ code is more than 4 times faster than the C code.
>
> <shrug> My very, very simplistic C code doesn't quite manage to keep up
> with the C++ code, but it certainly isn't four times slower. The
> difference between my code and the C++ code is likely to be caused by
> better memory management on the part of gcc's STL writer.
>
> > follow to
> > prove the superiority of C++ even in the realm of efficiency.
>
> <shrug> You picked a rather artificial example. There is more to efficiency
> than being able to store and sort five million numbers.
>
> > So C coders who care about efficiency should switch to C++?
>
> C coders who care about efficiency enough to consider switching languages
> should first ensure that the language switch really does give sufficient
> efficiency gains to justify the switch. They should also bear in mind the
> costs in terms of simplicity, readability, maintainability, and
> portability.
>
> > Are their any counter-arguments against Stroustroups claim?
>
> My favourite example is a 450-line C++ program I was shown, about eight
> years ago, which had taken four man-years to write and took about 10
> minutes to process a fairly trivial amount of data. I rewrote it from spec
> (because I saw no point in rewriting it from the code) in a single day, in
> C. My version took considerably less than a second to process the same
> amount of data. There is more to efficiency than mere language choice.

>
> --
> Richard Heathfield <http://www.cpax.org.uk>
> Email: -http://www. +rjh@
> Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
> "Usenet is a strange place" - dmr 29 July 1999

Four man-years to write a 450-line C++ code? Jesus, were they high or
something?

Adam
Do something good today. Fight cancer with just a click! - Global
Cancer Research Institute
http://gcri.blogspot.com/

Keith Thompson

unread,

Apr 9, 2008, 11:52:14 AM4/9/08

to

ric...@cogsci.ed.ac.uk (Richard Tobin) writes:
> In article <c592b$47fc2d9f$13...@news.teranews.com>,
> Dann Corbit <dco...@connx.com> wrote:

[...]

>>It returns an ordering. The standard never says that the ordering must
>>always be identical (and indeed, such an ordering would be impossible on
>>many implementations due to the nature of floating point).
>
> It says it must be a total order. Yours isn't.

Actually, the standard doesn't explicitly say it must be a total
order. IMHO it should say so.

C99 7.20.5.2:

The qsort function sorts an array of nmemb objects, the initial
element of which is pointed to by base. The size of each object is
specified by size.

The contents of the array are sorted into ascending order
according to a comparison function pointed to by compar, which is
called with two arguments that point to the objects being
compared. The function shall return an integer less than, equal

to, or greater than zero if the first argument is considered to be
respectively less than, equal to, or greater than the second.

If two elements compare as equal, their order in the resulting
sorted array is unspecified.

I believe this *implies* a total ordering (since qsort() can't
reasonably be expected to cope with anything other than a total
ordering). But as we're seeing, the current wording does leave room
for debate.

Richard Heathfield

unread,

Apr 9, 2008, 12:17:14 PM4/9/08

to

ger...@gmail.com said:

> On Apr 5, 11:28 pm, Richard Heathfield <r...@see.sig.invalid> wrote:

<snip>

>>
>> > Are their any counter-arguments against Stroustroups claim?
>>
>> My favourite example is a 450-line C++ program I was shown, about eight
>> years ago, which had taken four man-years to write and took about 10
>> minutes to process a fairly trivial amount of data. I rewrote it from
>> spec (because I saw no point in rewriting it from the code) in a single
>> day, in C. My version took considerably less than a second to process
>> the same amount of data. There is more to efficiency than mere language
>> choice.
>>

> Four man-years to write a 450-line C++ code? Jesus, were they high or
> something?

Eight blokes * six months. (I don't know whether they were high. But in the
end, they were fired.) There was no excuse for it. The task was a very,
very slightly forgiving file-comparison routine, used for comparing
QARun-recorded data without worrying too much about dates and stuff.
Really, the only reason I took a day over it was to be sure that I didn't
hand back a lemon. To take four man-years was unforgivable (and was not in
fact forgiven, once it came to light).

Diego Martins

unread,

Apr 9, 2008, 1:16:11 PM4/9/08

to

On Apr 9, 2:53 am, Richard Heathfield <r...@see.sig.invalid> wrote:
> Juha Nieminen said:
>
> > Richard Heathfield wrote:

VERY SIMPLE HUH?
verrryyy eaasyyyyyyy and nooooooooo boilerplaateeee

bah!

I always laugh to coders that loves to work for the computer
of course most of them does not have childs of wife or even a
girlfriend ;-)

computers should work to us. they are designed to do that :)

sorry for the ranting, but there are some things I can't admit today

Richard Tobin

unread,

Apr 9, 2008, 1:16:46 PM4/9/08

to

In article <87myo3d...@kvetch.smov.org>,
Keith Thompson <ks...@mib.org> wrote:

>Actually, the standard doesn't explicitly say it must be a total
>order.

Look at the introductory text of 7.20.5. It specifically says
"total ordering", though admittedly in a slightly strange context.

-- Richard
--
:wq

lawrenc...@siemens.com

unread,

Apr 9, 2008, 12:35:42 PM4/9/08

to

Keith Thompson <ks...@mib.org> wrote:
>
> Actually, the standard doesn't explicitly say it must be a total
> order.

Yes it does: 7.20.5p4 (always look at the parent clauses in addition to
the specific clause).

-Larry Jones

Bad news, Mom. I promised my soul to the Devil this afternoon. -- Calvin

lawrenc...@siemens.com

unread,

Apr 9, 2008, 12:33:26 PM4/9/08

to

Keith Thompson <ks...@mib.org> wrote:
>
> Inlining qsort's comparison function should be straightforward *if*
> the compiler is able to inline (a call to) the qsort() function
> itself. That would require either the ability to inline across
> separately compiled translation units, or an implementation that in
> effect delays compilation until link time.

Or an inline definition of qsort() in <stdlib.h>. A conforming
implementation can't do that directly, but it can define the inline
function with a reserved name and then define a qsort() macro that
invokes the inline function.

-Larry Jones

It works on the same principle as electroshock therapy. -- Calvin

Paul Hsieh

unread,

Apr 9, 2008, 2:23:15 PM4/9/08

to

On Apr 5, 9:46 pm, "copx" <c...@gazeta.pl> wrote:
> In "Learning Standard C++ as a New Language" Bjarne Stroustrup claims that
> properly written C++ outperforms C code. I will just copy his first example
> here, which is supposed to demonstrate how C++ abstractions do not only make
> code easier to understand but also make it more efficient:

Do you consider Bjarne Stroustrup to be unbiased in his choosing of
how to code up the C solution? Not surprisingly he poisoned the C
solution, and is arguing a case for which *BOTH* solutions are
suboptimal by design.

If you want to do language comparisons, you should use "The Great
Computer Language Shootout". They accept solutions to each problem
from anyone who cares to submit them for each language, and they use
the fastest code for each language. The results clearly show C and C+
+ to be pretty much dead even. (Mostly because C++ compilers are
being used to compile C, and C++ the language itself has no additional
performance features whatsoever.)

> ===
> The simplest specific example I can think of is a program to find the mean
> and median of a sequence of double precision floating-point numbers read
> from input.

The mean should be limited by input speed (it can be calculated
incrementally) and the median should be implemented using the "select"
algorithm, which offers no advantage for either side (its a fairly
complicated algorithm, though).

> [...] A conventional C-style solution would be:
>
> #include <stdlib.h>
> #include <stdio.h>
>
> int compare(const void *p, const void *q)
> {
> register double p0 = * (double *)p;
> register double q0 = * (double *)q;
> if (p0 > q0) return 1;
> if (p0 < q0) return -1;
> return 0;
> }
>
> void quit()
> {
> fprintf(stder, "memory exhausted\n");
> exit(1);
>
> }
>
> int main(int argc, char *argv[])
> {
> int res = 1000;
> char * file = argv[2];
>
> double *buf = (double *) malloc(sizeof (double) *res);
> if (buf==0) quit();
>
> double median = 0;
> double mean = 0;
> int n = 0;
>
> FILE *fin = fopen(file, "r");
> double d;
> while (fscanf(fin, "%lg", &d) == 1) {
> if (n==res) {
> res += res;

Integer overflow is not being tested for.

> buf = (double *)realloc(buf, sizeof(double) *res);
> if (buf==0) quit();

This is some pretty bad error handling.

> }
> buf[n++] = d;
> mean = (n == 1) ? d : mean + (d - mean) / n;
>
> }
>
> qsort(buf, n, sizeof(double), compare);

C programmers who use qsort, are programmers who have absolutely no
interest in performance. The overhead of qsort immediately makes it
slower than a well implemented heapsort almost always (and certainly
in this case of floating point sorting). The algorithmic complexity
problems of quick sort makes it slower than even badly written
heapsorts some of the time.

But as I described above, using sort is not the right way to solve
this (Its O(N*ln(N)). Use the Select algorithm; its O(N).

> if (n) {
> int mid = n/2;
> median = (n%2) ? buf[mid] : (buf[mid - 1] + buf[mid]) / 2;
> }
>
> printf("number of elements = %d, median = %g, mean = %g\n", n, median,
> mean);
> free(buf);

>
> }
>
> To compare, here is an idiomatic C++ solution:
>

> #include<vector>
> #include<fstream>
> #include<algorithm>
>
> using namespace std;
>
> int main(int argc, char * argv[])
> {
> char *file = argv[2];
> vector<double>buf;
>
> double median = 0;
> double mean = 0;
>
> fstream fin(file, ios::in);
> double d;
> while (fin >> d) {
> buf.push_back(d);
> mean = (buf.size() == 1) ? d : mean + (d - mean) / buf.size();
> }
>
> sort(buf.begin(), buf.end());
>
> if (buf.size()) {
> int mid = buf.size() / 2;
> median = (buf.size() %2) ? buf[mid] : (buf[mid - 1] + buf[mid]) / 2;
>
> }
>
> cout << "number of elements = " << buf.size()
> << " , median = " << median << ", mean = " << mean << '\n';}

The main thing Bjarne Stroustrup is trying to prove with all this is
that template code inlining is faster than eating indirect function
call overhead.

However, a completely macro-based heap sort can be implemented in C in
about 20 lines of code.

So its apples versus oranges, and specifically noting that the oranges
have been implemented in C++'s standard template library, while it has
not been in C's standard library.

> ======================

>
> He goes on to claim that, using a sample set of 5,000,000 elements, the C++

> code is more than 4 times faster than the C code. Further examples follow to

> prove the superiority of C++ even in the realm of efficiency.

The larger he makes the sample set, the more of a fool he looks like.
If you have large N, and you do *NOT* use the select algorithm, then
you are very *demonstratively* doing a clearly very wrong thing.

> So C coders who care about efficiency should switch to C++?

No. Coder who care about efficiency should first learn standard
algorithms, then understand how to recognize unnecessary performance
bottlenecks (which are always introduced when using qsort.)

> Are their any counter-arguments against Stroustroups claim?

In this case, its not a question of counter-argument, its a question
of debunking. If this is *truly* an example cited by Bjarne
Stroustrup, he deserves to be tarred and feathered on at least 3
grounds: 1) ignoring the Great Computer Language Shootout, 2) being
unaware of standard algorithms, and 3) intentionally misrepresenting
the kind of performance you can get from good C code.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Dann Corbit

unread,

Apr 9, 2008, 2:47:27 PM4/9/08

to

"Keith Thompson" <ks...@mib.org> wrote in message

news:87myo3d...@kvetch.smov.org...

> ric...@cogsci.ed.ac.uk (Richard Tobin) writes:
>> In article <c592b$47fc2d9f$13...@news.teranews.com>,
>> Dann Corbit <dco...@connx.com> wrote:
> [...]
>>>It returns an ordering. The standard never says that the ordering must
>>>always be identical (and indeed, such an ordering would be impossible on
>>>many implementations due to the nature of floating point).
>>
>> It says it must be a total order. Yours isn't.
>
> Actually, the standard doesn't explicitly say it must be a total
> order. IMHO it should say so.

The reason it does not say so is that it is literally impossible to enforce
this. The example I drew previously using 80 bit registers is a real
example that is operational today using the Microsoft or Intel compilers
with certain options. The shaving off of the trailing dirt at the end of
the numbers really does happen. It was not a hypothetical case.

> C99 7.20.5.2:
>
> The qsort function sorts an array of nmemb objects, the initial
> element of which is pointed to by base. The size of each object is
> specified by size.
>
> The contents of the array are sorted into ascending order
> according to a comparison function pointed to by compar, which is
> called with two arguments that point to the objects being
> compared. The function shall return an integer less than, equal
> to, or greater than zero if the first argument is considered to be
> respectively less than, equal to, or greater than the second.
>
> If two elements compare as equal, their order in the resulting
> sorted array is unspecified.
>
> I believe this *implies* a total ordering (since qsort() can't
> reasonably be expected to cope with anything other than a total
> ordering). But as we're seeing, the current wording does leave room
> for debate.

That's because it has to.

Dann Corbit

unread,

Apr 9, 2008, 2:48:03 PM4/9/08

to

<lawrenc...@siemens.com> wrote in message
news:ur7vc5-...@jones.homeip.net...

> Keith Thompson <ks...@mib.org> wrote:
>>
>> Actually, the standard doesn't explicitly say it must be a total
>> order.
>
> Yes it does: 7.20.5p4 (always look at the parent clauses in addition to
> the specific clause).

Bad news then, since that renders some compilers non-conforming.

Dann Corbit

unread,

Apr 9, 2008, 2:54:04 PM4/9/08

to

"Keith Thompson" <ks...@mib.org> wrote in message

news:87zls3e...@kvetch.smov.org...

Tell me what this difference in numbers will make in floating point. If
this difference is important then we should never perform an addition,
subtraction, multiplication or division, because that dirt at the end of the
numbers happens when we do one of these operations.

>> What I am saying is that literally 1.25000000000000000001 and
>> 1.25000000000000000000 are the same number if you encounter them.
>> Consider
>> how numbers such as these might come about. You might have added part of
>> a
>> palatte and part of a palatte or you might have combined the eaches. For
>> example:
>>
>> (3/14 palatte) * 2 palattes + (8/14 palatte) * one palatte * 30 eaches
>> per
>> palatte * 6 items per each
>>
>> verses adding the individual eaches. Or something of that nature.
>> Floating
>> point itself is often called an "approximate numeric" type because it
>> does
>> not represent values exactly.
>
> No, they are *not* literally the same number. They are two different
> numbers that might (or might not) happen to be the result of
> calculations that logically should have yielded the same value.

There is no physical measurement on earth that will produce a number with
that sort of difference. The tiny bit of sand we find populating the tails
of floating point numbers is the result of operations performed against
those numbers and not as the result of our initial or subsequent
measurements.

> If
> they were literally the same number, they wouldn't be
> 1.25000000000000000001 and 1.25000000000000000000; they'd be
> 1.25000000000000000001 and 1.25000000000000000001.

This is a tragic misunderstanding as to the nature of floating point.

>> It is my opinion that your ordering is not better than my ordering and
>> that
>> assuming 1.25000000000000000001 and 1.25000000000000000000 are different
>> numbers is a mistake. They are almost *surely* the same number.
>
> So what?
>
> We've already demonstrated cases where numbers that are unequal even
> by your standard are incorrectly sorted using your comparison
> function. (I tried 10000 elements and got some interesting results.)
>
> At the very least, my method is no worse than yours. If
> 1.25000000000000000001 and 1.25000000000000000000 are "the same
> number" then *it doesn't matter* how they compare. You add a
> multiplication, a subtraction, a call to fabs(), and a second
> comparison to each comparison, all for no benefit.

I will agree that your ordering is not worse, but I disagree that it is
better. I think that literally *every* operation performed against floating
point should be aware of the ball of fuzz at the end of the numbers and it
should recognize that this tiny bit of dirt is not really part of the
number.

> In the context of sorting, assuming that two very close numbers are
> different does no harm while saving considerable time and complexity.

And yet there is harm. For instance, you think that those two numbers are
different and that one is larger than the other. This is real harm. I
guess that the mistake is almost universal though.

Dann Corbit

unread,

Apr 9, 2008, 2:57:12 PM4/9/08

to

"Richard Tobin" <ric...@cogsci.ed.ac.uk> wrote in message

news:fti0gp$oa7$1...@pc-news.cogsci.ed.ac.uk...

> In article <c592b$47fc2d9f$13...@news.teranews.com>,
> Dann Corbit <dco...@connx.com> wrote:
>
>>I submit that it is consistent. If you really think that 1+eps is not the
>>same number as eps for double precision floating point, then I think you
>>should imagine how such numbers might arrive in your data base.
>
> I quite agree that for practical purposes they maybe the same number,
> but building this idea into a qsort() comparison function does no good
> and may break things.

I agree that there may be a real problem here and that it is conceivable
that under some circumstances it may be possible to produce a sort that does
not terminate.

>>I submit
>>that it will not interfere with correct operation of bsearch(), since
>>bsearch() should work in exactly the same way that qsort does for
>>comparisons.
>
> It works in the same way, but bsearch() assumes that the order of the
> array is consistent with the comparison function, and if we use your
> function it isn't.

1+epsilon is the same number as 1.

> Suppose we have x, x+e, and x+2e, x+3e, x+4e. Your comparison
> function may sort them as x+4e, x+3e, x+2e, x+e, x. Now you search
> for x. bsearch() will compare it with x+2e, and find it's less. It
> will then compare it with x+3e and x+4e, and conclude that it is not
> present in the array. It will give the wrong answer.

Assuming that this is the wrong answer is a mistake. If you are searching
for:
x+2e
and you find:
x+4e
then you found the number that you were searching for, if it is floating
point. If you are not happy with that answer then you should not use
floating point.

>>It returns an ordering. The standard never says that the ordering must
>>always be identical (and indeed, such an ordering would be impossible on
>>many implementations due to the nature of floating point).
>
> It says it must be a total order. Yours isn't.

Achieving total order is literally not possible, because floating point is
inexact. This statement is absolutely true.

Willem

unread,

Apr 9, 2008, 3:01:43 PM4/9/08

to

Dann wrote:
)<lawrenc...@siemens.com> wrote in message
) news:ur7vc5-...@jones.homeip.net...
)> Keith Thompson <ks...@mib.org> wrote:
)>>
)>> Actually, the standard doesn't explicitly say it must be a total
)>> order.
)>
)> Yes it does: 7.20.5p4 (always look at the parent clauses in addition to
)> the specific clause).
)
) Bad news then, since that renders some compilers non-conforming.

That seems quite impossible. Could you elaborate ?

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Dann Corbit

unread,

Apr 9, 2008, 3:32:47 PM4/9/08

to

"Willem" <wil...@stack.nl> wrote in message
news:slrnfvq4kn...@snail.stack.nl...

> Dann wrote:
> )<lawrenc...@siemens.com> wrote in message
> ) news:ur7vc5-...@jones.homeip.net...
> )> Keith Thompson <ks...@mib.org> wrote:
> )>>
> )>> Actually, the standard doesn't explicitly say it must be a total
> )>> order.
> )>
> )> Yes it does: 7.20.5p4 (always look at the parent clauses in addition to
> )> the specific clause).
> )
> ) Bad news then, since that renders some compilers non-conforming.
>
> That seems quite impossible. Could you elaborate ?

Currently it affects only long double. From the Intel documentation:

"Support for Intel® Itanium® Architecture ints and 80-bit Floats
Intel® Itanium® Architecture ints are supported with the __int64 or long
long data types. The Intel C++ Compiler for Windows does support 80-bit long
doubles when the -Qlong_double option is specified. See the Floating-point
Optimizations and Floating-point Options Quick Reference sections of the
User's Guide for more information."

From the Microsoft documentation:

"Flags
precise

The default.

Improves the consistency of floating-point tests for equality and
inequality by disabling optimizations that could change the precision of
floating-point calculations, which is required for strict ANSI conformance.
By default, the compiler uses the coprocessor's 80-bit registers to hold the
intermediate results of floating-point calculations. This increases program
speed and decreases program size. Because the calculation involves
floating-point data types that are represented in memory by less than 80
bits, however, carrying the extra bits of precision (80 bits minus the
number of bits in a smaller floating-point type) through a lengthy
calculation can produce inconsistent results."

If the CPU is performing calculations in 80 bit mode and an interrupt fires,
the floating point values may lose the trailing bits of precision. Hence,
the ordering you got one microsecond ago using the simple comparison may not
agree with the ordering you get right now.

In real life there are floating point systems where a*b != b*a if you
compare all of the bits of the result. You should not be at all surprised
if (7.0*6.0*5.0)/(7.0*6.0) does not compare equal to 5.0. If you would want
them to compare equal, you should not use a simple ordering. If you add a
tall column of non-integral floating point numbers forwards and backwards
you will get two different numbers. Everyone knows this and nobody is upset
by it. (1.0/7.0)*7.0 is not 1.0 exactly except by accident (or compiler
trickery if they can see the whole expression and do some constant folding).

I find it very odd that someone might look for the 27th number in a list and
get back an answer with machine epsilon difference from the number they
wanted and then be disturbed by that. That is the very nature of floating
point. If you are searching for 3 and find 2.9999999999999999 or
3.0000000000000000001 in floating point, then you have found your answer.
If you are disturbed by the answer then you should not be using floating
point.

Dann Corbit

unread,

Apr 9, 2008, 3:42:39 PM4/9/08

to

"Dann Corbit" <dco...@connx.com> wrote in message
news:720d8$47fd19e0$27...@news.teranews.com...

An aside:
On OpenVMS, the file system is called RMS for Records Management System. It
is not possible to create a key that has a floating point value in it (the
operating system does not allow it) though literally any other data type is
fine. Quite likely, the reason for this decision has to do with the nature
of floating point comparisons.

Richard Tobin

unread,

Apr 9, 2008, 4:03:40 PM4/9/08

to

In article <5749b$47fd0f41$23...@news.teranews.com>,
Dann Corbit <dco...@connx.com> wrote:

>> Actually, the standard doesn't explicitly say it must be a total
>> order. IMHO it should say so.

>The reason it does not say so

It does, see other articles.

>is that it is literally impossible to enforce this.

You think the description of the qsort() comparison function was written
with recent floating point hardware in mind?

-- Richard
--
:wq

Richard Tobin

unread,

Apr 9, 2008, 4:10:07 PM4/9/08

to

In article <c74c2$47fd1188$24...@news.teranews.com>,
Dann Corbit <dco...@connx.com> wrote:

>> Suppose we have x, x+e, and x+2e, x+3e, x+4e. Your comparison
>> function may sort them as x+4e, x+3e, x+2e, x+e, x. Now you search
>> for x. bsearch() will compare it with x+2e, and find it's less. It
>> will then compare it with x+3e and x+4e, and conclude that it is not
>> present in the array. It will give the wrong answer.

>Assuming that this is the wrong answer is a mistake. If you are searching
>for:
>x+2e
>and you find:
>x+4e
>then you found the number that you were searching for, if it is floating
>point. If you are not happy with that answer then you should not use
>floating point.

Read more carefully. It doesn't find the wrong entry. It wrongly
says there is no such value in the array. According to you, there are
several, but it doesn't find *any* of them.

>> It says it must be a total order. Yours isn't.

>Achieving total order is literally not possible, because floating point is
>inexact. This statement is absolutely true.

Then you'd better not use floating point with qsort().

-- Richard
--
:wq

Willem

unread,

Apr 9, 2008, 4:06:52 PM4/9/08

to

Dann wrote:
) "Willem" <wil...@stack.nl> wrote in message
) news:slrnfvq4kn...@snail.stack.nl...
)> Dann wrote:
)> )<lawrenc...@siemens.com> wrote in message
)> ) news:ur7vc5-...@jones.homeip.net...

)> )> Keith Thompson <ks...@mib.org> wrote:
)> )>>
)> )>> Actually, the standard doesn't explicitly say it must be a total

)> )>> order.
)> )>

)> )> Yes it does: 7.20.5p4 (always look at the parent clauses in addition to

)> )> the specific clause).

)> )
)> ) Bad news then, since that renders some compilers non-conforming.

)>
)> That seems quite impossible. Could you elaborate ?
)
) Currently it affects only long double. From the Intel documentation:
)
) <snip lengthy discussion about floating point precision>
) < (Which, by the way, applies to non-conforming mode.) >

This has nothing to do with qsort as such.

Suppose for the moment that the standard dictates that qsort() should
only be used with a comparison that returns a total order.

Now suppose that a compiler exists that has a qsort() function which
doesn't require that the order be total. How is that non-conforming ?

I assume you mean that compilers exist for which the 'a < b' comparison
for some datatypes does not give a total ordering. But that, in itself,
does not make them non-conforming, at least not as far as this clause
about qsort() is concerned.

In any case, if you call qsort() with a comparison function where it
doesn't hold that a=b and b=c implies a=c, then bad things might happen.
Who knows, it might run into an endless loop or something.

peter koch

unread,

Apr 9, 2008, 4:51:09 PM4/9/08

to

Well... Richards code really is not at all difficult - very much like
the C++ pimpl idiom. So information CAN be hidden in C. Just not as
elegantly as in C++, as I'm sure we all agree.
I remember in my old days writing classes with virutal functions in C.
I thought it was so elegant. Not so anymore - but you can do a lot in
C if you want to.

/Peter

Dann Corbit

unread,

Apr 9, 2008, 5:01:11 PM4/9/08

to

"Willem" <wil...@stack.nl> wrote in message

news:slrnfvq8es...@snail.stack.nl...

What I am saying is that with 80 bit floating point it is possible that
comparisons will not be consistent. That implies that total ordering is not
possible to enforce. If the standard demands total ordering, then qsort()
cannot be called with a long double vector on those systems without creating
undefined behavior.

Dann Corbit

unread,

Apr 9, 2008, 5:02:42 PM4/9/08

to

"Richard Tobin" <ric...@cogsci.ed.ac.uk> wrote in message

news:ftj7qv$143q$5...@pc-news.cogsci.ed.ac.uk...

> In article <c74c2$47fd1188$24...@news.teranews.com>,
> Dann Corbit <dco...@connx.com> wrote:
>
>>> Suppose we have x, x+e, and x+2e, x+3e, x+4e. Your comparison
>>> function may sort them as x+4e, x+3e, x+2e, x+e, x. Now you search
>>> for x. bsearch() will compare it with x+2e, and find it's less. It
>>> will then compare it with x+3e and x+4e, and conclude that it is not
>>> present in the array. It will give the wrong answer.
>
>>Assuming that this is the wrong answer is a mistake. If you are searching
>>for:
>>x+2e
>>and you find:
>>x+4e
>>then you found the number that you were searching for, if it is floating
>>point. If you are not happy with that answer then you should not use
>>floating point.
>
> Read more carefully. It doesn't find the wrong entry. It wrongly
> says there is no such value in the array. According to you, there are
> several, but it doesn't find *any* of them.

That is also a perfectly acceptable and expected outcome for a floating
point vector. If you are expecting to find a non-integral value, then you
should not be surprised to be disappointed. In fact, finding it will be a
little surprising.

>>> It says it must be a total order. Yours isn't.
>
>>Achieving total order is literally not possible, because floating point is
>>inexact. This statement is absolutely true.
>
> Then you'd better not use floating point with qsort().

Or with any other operation that involves comparison.

Dann Corbit

unread,

Apr 9, 2008, 5:03:46 PM4/9/08

to

"Richard Tobin" <ric...@cogsci.ed.ac.uk> wrote in message

news:ftj7es$143q$4...@pc-news.cogsci.ed.ac.uk...

> In article <5749b$47fd0f41$23...@news.teranews.com>,
> Dann Corbit <dco...@connx.com> wrote:
>
>>> Actually, the standard doesn't explicitly say it must be a total
>>> order. IMHO it should say so.
>
>>The reason it does not say so
>
> It does, see other articles.

Then sorting a floating point vector may result in undefined behavior.

>>is that it is literally impossible to enforce this.
>
> You think the description of the qsort() comparison function was written
> with recent floating point hardware in mind?

I do not think the data type was considered for a fundamental algorithm. At
least, it should not have been.