#include <stdlib.h>
#include <stdio.h>
int compare(const void *p, const void *q)
{
register double p0 = * (double *)p;
register double q0 = * (double *)q;
if (p0 > q0) return 1;
if (p0 < q0) return -1;
return 0;
}
void quit()
{
fprintf(stder, "memory exhausted\n");
exit(1);
}
int main(int argc, char *argv[])
{
int res = 1000;
char * file = argv[2];
double *buf = (double *) malloc(sizeof (double) *res);
if (buf==0) quit();
double median = 0;
double mean = 0;
int n = 0;
FILE *fin = fopen(file, "r");
double d;
while (fscanf(fin, "%lg", &d) == 1) {
if (n==res) {
res += res;
buf = (double *)realloc(buf, sizeof(double) *res);
if (buf==0) quit();
}
buf[n++] = d;
mean = (n == 1) ? d : mean + (d - mean) / n;
}
qsort(buf, n, sizeof(double), compare);
if (n) {
int mid = n/2;
median = (n%2) ? buf[mid] : (buf[mid - 1] + buf[mid]) / 2;
}
printf("number of elements = %d, median = %g, mean = %g\n", n, median,
mean);
free(buf);
}
To compare, here is an idiomatic C++ solution:
#include<vector>
#include<fstream>
#include<algorithm>
using namespace std;
int main(int argc, char * argv[])
{
char *file = argv[2];
vector<double>buf;
double median = 0;
double mean = 0;
fstream fin(file, ios::in);
double d;
while (fin >> d) {
buf.push_back(d);
mean = (buf.size() == 1) ? d : mean + (d - mean) / buf.size();
}
sort(buf.begin(), buf.end());
if (buf.size()) {
int mid = buf.size() / 2;
median = (buf.size() %2) ? buf[mid] : (buf[mid - 1] + buf[mid]) / 2;
}
cout << "number of elements = " << buf.size()
<< " , median = " << median << ", mean = " << mean << '\n';
}
======================
He goes on to claim that, using a sample set of 5,000,000 elements, the C++
code is more than 4 times faster than the C code. Further examples follow to
prove the superiority of C++ even in the realm of efficiency.
So C coders who care about efficiency should switch to C++?
Are their any counter-arguments against Stroustroups claim?
In some instances (typically where inlined function templates can
replace functions with void* parameters) C++ will be faster. Because
the C++ standard library incorporates the C standard library, C++ code
should never have to be slower.
--
Ian Collins.
I copied the examples by hand, and two typos sneaked in: in the C example it
should be "stderr" instead of "stder", and the C++ example lacks a final
closing bracket.
No, if that were my intention, I would have crossposted this to
comp.lang.c++ for maximum effect! ;)
> In some instances (typically where inlined function templates can
> replace functions with void* parameters) C++ will be faster. Because
> the C++ standard library incorporates the C standard library, C++ code
> should never have to be slower.
Of course, but the interesting claim here is that you should NOT use C style
code but modern C++ abstractions, because they are not only more readable
and expressive, but also more efficient. The second part (they actually
being more efficient) was news to me.
Well, the inlined function templates superiour to void * functions thing
certainly makes sense. Maybe I should finally "move on" after all.
However, I compiled both examples with the current version of GCC (on
Win32-x86), and there is at least one area where C won: code size. The size
difference between the resoluting executables was grotesque: C code => 5KB,
C++ code => over 400KB! (both binaries optimized for size and stripped!)
> In "Learning Standard C++ as a New Language" Bjarne Stroustrup claims
> that properly written C++ outperforms C code. I will just copy his first
> example here, which is supposed to demonstrate how C++ abstractions do
> not only make code easier to understand but also make it more efficient:
> ===
> The simplest specific example I can think of is a program to find the
> mean and median of a sequence of double precision floating-point numbers
> read from input. A conventional C-style solution would be:
>
<snip>
The C-style solution did not compile. When I fixed it minimally, it
segfaulted. So I wrote my own.
>
> To compare, here is an idiomatic C++ solution:
>
<snip>
Results on my system:
C:
number of elements = 5000000, median = -9.10326e+08, mean = -1.40759e+10
real 0m29.955s
user 0m25.840s
sys 0m3.500s
C++:
number of elements = 5000000 , median = -9.10326e+08, mean = -1.40759e+10
real 0m22.701s
user 0m12.260s
sys 0m3.970s
> He goes on to claim that, using a sample set of 5,000,000 elements, the
> C++ code is more than 4 times faster than the C code.
<shrug> My very, very simplistic C code doesn't quite manage to keep up
with the C++ code, but it certainly isn't four times slower. The
difference between my code and the C++ code is likely to be caused by
better memory management on the part of gcc's STL writer.
> follow to
> prove the superiority of C++ even in the realm of efficiency.
<shrug> You picked a rather artificial example. There is more to efficiency
than being able to store and sort five million numbers.
> So C coders who care about efficiency should switch to C++?
C coders who care about efficiency enough to consider switching languages
should first ensure that the language switch really does give sufficient
efficiency gains to justify the switch. They should also bear in mind the
costs in terms of simplicity, readability, maintainability, and
portability.
> Are their any counter-arguments against Stroustroups claim?
My favourite example is a 450-line C++ program I was shown, about eight
years ago, which had taken four man-years to write and took about 10
minutes to process a fairly trivial amount of data. I rewrote it from spec
(because I saw no point in rewriting it from the code) in a single day, in
C. My version took considerably less than a second to process the same
amount of data. There is more to efficiency than mere language choice.
--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Maybe because you passed the test data file as the first argument? For some
strange reason the program expects the test file to be the second argument.
So you have to invoke it like: prog ignored test.dat
>>
>> To compare, here is an idiomatic C++ solution:
>>
> <snip>
>
> Results on my system:
> C:
> number of elements = 5000000, median = -9.10326e+08, mean = -1.40759e+10
>
> real 0m29.955s
> user 0m25.840s
> sys 0m3.500s
>
> C++:
> number of elements = 5000000 , median = -9.10326e+08, mean = -1.40759e+10
>
> real 0m22.701s
> user 0m12.260s
> sys 0m3.970s
>
>> He goes on to claim that, using a sample set of 5,000,000 elements, the
>> C++ code is more than 4 times faster than the C code.
>
> <shrug> My very, very simplistic C code doesn't quite manage to keep up
> with the C++ code, but it certainly isn't four times slower.
I am not that surprised, Stroustrup's C code sucked, even I could do better!
;)
However, the basic claim - that the higher abstraction level of C++ does not
hurt code efficiency but actually improves it - seems to be supported by
your results.
> The difference between my code and the C++ code is likely to be caused by
> better memory management on the part of gcc's STL writer.
>
>> follow to
>> prove the superiority of C++ even in the realm of efficiency.
>
> <shrug> You picked a rather artificial example. There is more to
> efficiency
> than being able to store and sort five million numbers.
I didn't pick that example, Stroustrup did! :P
However, I think the example is quite good, because in the end a complex
application is nothing more than a collection of simple functions/data
structures (or "objects" for the OO crowd) which do things like reading
files, sorting numbers etc.
>> So C coders who care about efficiency should switch to C++?
>
> C coders who care about efficiency enough to consider switching languages
> should first ensure that the language switch really does give sufficient
> efficiency gains to justify the switch. They should also bear in mind the
> costs in terms of simplicity, readability, maintainability, and
> portability.
>
>> Are their any counter-arguments against Stroustroups claim?
>
> My favourite example is a 450-line C++ program I was shown, about eight
> years ago, which had taken four man-years to write and took about 10
> minutes to process a fairly trivial amount of data. I rewrote it from spec
> (because I saw no point in rewriting it from the code) in a single day, in
> C. My version took considerably less than a second to process the same
> amount of data. There is more to efficiency than mere language choice.
Of course, a bad programmer will write worse code in any language compared
to a good programmer. However, the question here is - assuming programmers
of equal skill - who would produce more efficient code if one used C and the
other used C++. So far the answer seems to be the C++ guy would win.
Got to be argv[1] to take the 1st argument
Bye, Jojo
Assuming two programmers of equal skill, on the whole the one writing
in C will write far nicer code than the one in C++ (nicer in terms of
ease of readibility, maintainability, and lack of bugs). And he will
write it quicker.
Assuming the resulting C++ program actually works as intended, it will
run at pretty much the same speed as the C one.
Assumming a team of C++ programmers, the useful subset of the bloated
nightmare of C++ that they know in common is called C.
C++ is a virus. There will come a day when it is wiped from the Earth.
>> In some instances (typically where inlined function templates can
>> replace functions with void* parameters) C++ will be faster. Because
>> the C++ standard library incorporates the C standard library, C++ code
>> should never have to be slower.
>
> Of course, but the interesting claim here is that you should NOT use C style
> code but modern C++ abstractions, because they are not only more readable
> and expressive, but also more efficient. The second part (they actually
> being more efficient) was news to me.
The only place C++ will win the performance race is where the *language*
offers an advantage, which is why sorting is often cited as an example.
A lot of C++ code (including libraries) is written either in the C
subset of C++ or using constructs that can be implemented equally well in C.
> Well, the inlined function templates superiour to void * functions thing
> certainly makes sense. Maybe I should finally "move on" after all.
> However, I compiled both examples with the current version of GCC (on
> Win32-x86), and there is at least one area where C won: code size. The size
> difference between the resoluting executables was grotesque: C code => 5KB,
> C++ code => over 400KB! (both binaries optimized for size and stripped!)
>
That'll be all those inlined templates! Seriously, there is nearly
always a performance/size trade-off. Not always this big though.
--
Ian Collins.
--
Ian Collins.
>
> "Richard Heathfield" <r...@see.sig.invalid> schrieb im Newsbeitrag
> news:NKOdnfOFf_8s8WXa...@bt.com...
>> copx said:
>>
>>> In "Learning Standard C++ as a New Language" Bjarne Stroustrup claims
>>> that properly written C++ outperforms C code. I will just copy his
>>> first example here, which is supposed to demonstrate how C++
>>> abstractions do not only make code easier to understand but also make
>>> it more efficient: ===
>>> The simplest specific example I can think of is a program to find the
>>> mean and median of a sequence of double precision floating-point
>>> numbers read from input. A conventional C-style solution would be:
>>>
>> <snip>
>>
>> The C-style solution did not compile. When I fixed it minimally, it
>> segfaulted.
>
> Maybe because you passed the test data file as the first argument?
I had spotted that bug, but I thought I'd allowed for it. (I did so
successfully for the C++ version, which had the same bug.) On re-testing,
it appears that I had not allowed for it, so I withdraw that complaint, at
least. Having said that, the C version ran at about half the speed of my
own C version.
<snip>
>>> He goes on to claim that, using a sample set of 5,000,000 elements, the
>>> C++ code is more than 4 times faster than the C code.
>>
>> <shrug> My very, very simplistic C code doesn't quite manage to keep up
>> with the C++ code, but it certainly isn't four times slower.
>
> I am not that surprised, Stroustrup's C code sucked, even I could do
> better! ;)
> However, the basic claim - that the higher abstraction level of C++ does
> not hurt code efficiency but actually improves it - seems to be supported
> by your results.
Rather, it suggests that, *in this one particular example*, C++'s sort
works faster than qsort. Before getting too excited about that, I'd like
to see comparative performance stats for sorts where comparisons are
considerably more expensive than for mere doubles...
>> The difference between my code and the C++ code is likely to be caused
>> by better memory management on the part of gcc's STL writer.
...and don't forget this bit.
>>
>>> follow to
>>> prove the superiority of C++ even in the realm of efficiency.
>>
>> <shrug> You picked a rather artificial example. There is more to
>> efficiency
>> than being able to store and sort five million numbers.
>
> I didn't pick that example, Stroustrup did! :P
By posting it here, you picked it.
> However, I think the example is quite good, because in the end a complex
> application is nothing more than a collection of simple functions/data
> structures (or "objects" for the OO crowd) which do things like reading
> files, sorting numbers etc.
That's perhaps a rather naive view. It ain't so much what building blocks
you has, as how you puts them together.
<snip>
>>> Are their any counter-arguments against Stroustroups claim?
>>
>> My favourite example is a 450-line C++ program I was shown, about eight
>> years ago, which had taken four man-years to write and took about 10
>> minutes to process a fairly trivial amount of data. I rewrote it from
>> spec (because I saw no point in rewriting it from the code) in a single
>> day, in C. My version took considerably less than a second to process
>> the same amount of data. There is more to efficiency than mere language
>> choice.
>
> Of course, a bad programmer will write worse code in any language
> compared to a good programmer.
Right - and take a look at some of the C++ applications out there if you
want to see some fine examples of bad programming. :-)
> However, the question here is - assuming
> programmers of equal skill - who would produce more efficient code if one
> used C and the other used C++. So far the answer seems to be the C++ guy
> would win.
I think there are many more factors than you're allowing for here, and
language choice (between C and C++, at least) is a relatively
insignificant factor. As for programmers of equal skill, how would you
measure that?
Measure, measure, measure. If you want a real comparison, get Bjarne
Stroustrup to write a real-time transaction processing system in C++ for a
major bank, *to spec*. And get Dennis Ritchie to implement the same spec
in C. *Then* you have something worth testing and measuring. Don't forget
to measure bugs as well as performance - if the answer doesn't have to be
right, I can make both the C and the C++ versions run very very fast
indeed.
<snip>
> However, I compiled both examples with the current version of GCC (on
> Win32-x86), and there is at least one area where C won: code size. The
> size difference between the resoluting executables was grotesque: C code
> => 5KB, C++ code => over 400KB! (both binaries optimized for size and
> stripped!)
I didn't find the difference to be quite so remarkable. I didn't bother
with size optimisation, and I got these results:
Your C version: 37519
My C version: 60818
Your C++ version: 101287
Stripping yielded these figures:
Your C version: 4752
My C version: 9984
Your C++ version: 12172
Whilst the C++ version is significantly larger, it's not that big a deal
really, is it?
> C++ is a virus. There will come a day when it is wiped from the Earth.
ouch! , sometimes I wonder why people give such titles to C++.
:: Is C++ powerful ?
yes, a lot.
:: Is C++ complex ?
of course.
:: Is C++ monstrous and efficient ?
yes, of course it is like this heck a lot :O)
:: Is programming in C hard ?
Nah, it is a *lot* harder than what you do with other languages.
I am still looking for the day when more and more people will give titles
to VB, .NET or Java rather than yanking their brains around C vs C++
flame-wars.
you may be young and this young-time of yours is too precious to waste to
C vs C++. If you are really interested please use it on "VB vs any Good
Language" ( Python, Common Lisp, Ruby, Ada etc etc)
I give the same advice to "copx" (the OP)
--
http://lispmachine.wordpress.com/
Please remove capital 'V's when you reply
to me via e-mail.
> :: Is C++ powerful ?
>
> yes, a lot.
> ....[SNIP]....
> I give the same advice to "copx" (the OP)
Oh.. no. I unintentionally f*cked-up the thread from other NG. Negligence
is on my part.
I apologize for the mistake & inconvenience I have caused.
> Oh.. no. I unintentionally f*cked-up the thread from other NG. Negligence
> is on my part.
>
> I apologize for the mistake & inconvenience I have caused.
That apology was for comp.lang.c++. My PAN settings were wrong. I just
corrected them.
1. Please, do not call it "my C version". I did not write that code.
2. Given that we use the same compiler, I assume the large difference can be
explained by the fact that you use a setup which dynamically links the
executable against the C++ standard library installed on your system i.e.
you do not see the true complexity of your code, and it would not run on any
system without a compatible library. I am surprised that the principle of
dynamic linking is news to you... GCC does that on Linux (your OS I guess)
by default.
In contrast, on Windows systems - where no C++ standard library can be
expected to be present - the compiler links the required functionality into
the executable i.e. it becomes a lot bigger. Happens with C code too, but
the C standard library functions are very simple/small so they do not add
much to the executable.
Here we have a perfect example of the C-hacker syndrome.
I bet you are a big fan of Linus Torvalds. (Heck, you may even *be*
Linus Torvalds. Exactly the same type of trolling.)
>> On Sun, 06 Apr 2008 00:24:06 -0700, bc1891 wrote:
>
>> C++ is a virus. There will come a day when it is wiped from the Earth.
>
> ouch! , sometimes I wonder why people give such titles to C++.
My guess is that they are bad and/or lazy programmers and blame the
language for not being able to learn it. I'm not fond of several languages
but their existence doesn't bother me. I just choose not to use them.
> I bet you are a big fan of Linus Torvalds.
:D
> (Heck, you may even *be*
> Linus Torvalds. Exactly the same type of trolling.)
Linus also gives same types on ideas on why he did not choose a
microkernel based design.
>
> "Richard Heathfield" <r...@see.sig.invalid> schrieb im Newsbeitrag
> news:jYednbMQHMkhGWXa...@bt.com...
<snip>
>> Whilst the C++ version is significantly larger, it's not that big a deal
>> really, is it?
>
> 1. Please, do not call it "my C version". I did not write that code.
Fair enough - but I think you know what I mean.
> 2. Given that we use the same compiler, I assume the large difference can
> be explained by the fact that you use a setup which dynamically links the
> executable against the C++ standard library installed on your system i.e.
> you do not see the true complexity of your code, and it would not run on
> any system without a compatible library.
But that's okay because we can just re-compile on the new system.
> I am surprised that the
> principle of dynamic linking is news to you...
Oh, but it isn't. I'm sick an... I mean, I'm quite familiar with the idea,
on several different OSen. And although it has its drawbacks, I think
you'll agree that it does drastically reduce filesize.
<snip>
The only reason I program in C rather than C++ is that reliable C
compilers are far more ubiquitous than reliable C++ compilers. I don't
think this will change any time soon either; I mean who's actually
gonna sit down and write a C++ compiler for the PIC16F684
microcontroller when the C compiler is perfectly sufficient?
Both languages get compiled to machine code, and one language is for
the most part a superset of the other, so in my own eyes it stands to
reason that C++ could be better in every way (except when it comes to
finding a decent compiler for your platform).
> In "Learning Standard C++ as a New Language" Bjarne Stroustrup claims that
> properly written C++ outperforms C code. I will just copy his first example
> here, which is supposed to demonstrate how C++ abstractions do not only make
> code easier to understand but also make it more efficient:
> ===
> The simplest specific example I can think of is a program to find the mean
> and median of a sequence of double precision floating-point numbers read
> from input. A conventional C-style solution would be:
<snip C>
> To compare, here is an idiomatic C++ solution:
<snip C++>
> He goes on to claim that, using a sample set of 5,000,000 elements, the C++
> code is more than 4 times faster than the C code. Further examples follow to
> prove the superiority of C++ even in the realm of efficiency.
I can't get anything like 4 times on my system. With optimisations
off, the C code is a little faster, with optimisations on, the C code
is 20% slower.
All the speed is coming from the sort. Comment that out, and the C++
is slower with or without optimisations. Not really surprising that,
in an IO test, C would have the edge (without the sort, that is all
this is).
It is also not really surprising that C++'s sort can be optimised to
be faster than qsort -- the language is designed to make such
optimisations possible, but I wonder what combination of compiler and
hardware gives a factor of 4. What do other people see?
--
Ben.
<snip>
> It is also not really surprising that C++'s sort can be optimised to
> be faster than qsort -- the language is designed to make such
> optimisations possible, but I wonder what combination of compiler and
> hardware gives a factor of 4. What do other people see?
I see a fine specimen of Anser caerulescens ferali, with an array of
programmers in hot pursuit.
The text contains any details on that. It only says:
"The implementation I used is widely available and cheap - not a research
toy. Implementations that claim higher performance are also available."
The text was published in the May 1999 issue of "The C/C++ User's Journal".
That's all available info - make your guess! ;)
The factor of 4 seems to be a very case, however C++ does indeed seem to
beat C in synthetic benchmarks most of the time these days:
http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all
Funny side note: if you give equal importance to "CPU time", "Memory Use",
and "Gzip Bytes" the winner is... Pascal!
--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm
>
> "Richard Heathfield" <r...@see.sig.invalid> wrote in message
>> I think there are many more factors than you're allowing for here, and
>> language choice (between C and C++, at least) is a relatively
>> insignificant factor. As for programmers of equal skill, how would you
>> measure that?
>>
> Get the same programmer to write code in C and C++.
Finding a programmer who is precisely as skilled in C as he or she is in
C++ - no more, no less - will not be easy.
--
Ian Collins.
You can also write some silly #define macros whose job it is to write
a type-specialized sort function.
#define GEN_SORT_FUNCTION(TYPE, FUNC_NAME, LESS, EQUAL) ...
GEN_SORT_FUNCTION(int, sort_int, left < right, left == right)
The idea is that would generate a function sort_int which sorts an
array of integers, using ``left < right'' and ``left == right'' to
compare elements. (The definition of GEN_SORT_FUNCTION is a reader
exercise).
This has some concrete advantages over C++, because it doesn't suffer
from the std::less braindamage.
Our sorting function doesn't have to deduce that the elements are
equal by comparing them for inequality both ways.
We could easily develop a variant of GEN_SORT_FUNCTION in which a
single comparison expression is specified---one whose job it is to
produce a negative integer, 0, or a positive integer for the left <
right, left == right and left > right cases.
This has the advantage that if you are comparing strings, you only
have to make a single scan through them, from which you have all the
information you need.
(Look up the definition of strmp, and how it fits into qsort).
There is a reason that Stroustrup chose doubles, because if the data
consisted of strings---particularly long ones with lengthy common
prefixes---it's possible that the idiomatic C++ would actually lose to
the idiomatic C due to the impedance mismatch between std::less and
complex comparisons.
The cost of going through a function pointer to get to a comparison
function only matters if that comparison function does next to
nothing.
If the comparison function is nontrivial, you can bungle performance
by having to do that comparison two or more times over again.
If the comparison function is nontrivial, you can also bungle
performance by inappropriately inlining it, so that the combination of
sort function and inlined comparison code exceeds the available
instruction cache.
> So C coders who care about efficiency should switch to C++?
Or maybe only those coders who can't figure out how to sort an array
of doubles without using function indirection, yet are comfortable
with templates. :)
He is timing memory management of the vector class verses a totally naive
realloc for every single row beyond 1000. The sort operation is a complete
red herring.
I can't believe that Mr. Stroustrup wrote that, because it's totally
disingenuous.
** Posted from http://www.teranews.com **
Alas, not true; you must have misread the ``res += res'' expression
which doubles the reserved size prior to each realloc.
void quit()
{
fprintf(stderr, "memory exhausted\n");
exit(1);
}
int main(int argc, char *argv[])
{
int res = 100000;
char *file = argv[1];
double median = 0;
double mean = 0;
int n = 0;
double d;
clock_t start, end;
double *buf = (double *) malloc(sizeof(double) * res);
FILE *fin = fopen(file, "r");
if (buf == 0)
quit();
start = clock();
while (fscanf(fin, "%lg", &d) == 1) {
if (n == res) {
res += res;
buf = (double *) realloc(buf, sizeof(double) * res);
if (buf == 0)
quit();
}
buf[n++] = d;
mean = (n == 1) ? d : mean + (d - mean) / n;
}
qsort(buf, n, sizeof(double), compare);
if (n) {
int mid = n / 2;
median = (n % 2) ? buf[mid] : (buf[mid - 1] + buf[mid]) / 2;
}
end = clock();
printf("number of elements = %d, median = %g, mean = %g in %f
seconds\n", n, median, mean, (end - start)*CLOCK_INV);
free(buf);
return 0;
}
#include <vector>
#include <fstream>
#include <iostream>
#include <ctime>
#include <algorithm>
using namespace std;
static const double CLOCK_INV = 1.0 / CLOCKS_PER_SEC;
int main(int argc, char *argv[])
{
char *file = argv[1];
vector < double >buf;
clock_t start, end;
double median = 0;
double mean = 0;
fstream fin(file, ios::in);
double d;
start = clock();
while (fin >> d) {
buf.push_back(d);
mean = (buf.size() == 1) ? d : mean + (d - mean) / buf.size();
}
sort(buf.begin(), buf.end());
if (buf.size()) {
int mid = buf.size() / 2;
median = (buf.size() % 2) ? buf[mid] : (buf[mid - 1] + buf[mid]) /
2;
}
end = clock();
cout << "number of elements = " << buf.size()
<< " , median = " << median << ", mean = " << mean << " in " <<
(end - start)*CLOCK_INV << " seconds." << endl ;
}
/*
C:\tmp>cl /EHsc /W4 /Ox testcpp.cpp
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for
80x86
Copyright (C) Microsoft Corporation. All rights reserved.
testcpp.cpp
testcpp.cpp(12) : warning C4100: 'argc' : unreferenced formal parameter
Microsoft (R) Incremental Linker Version 8.00.50727.762
Copyright (C) Microsoft Corporation. All rights reserved.
/out:testcpp.exe
testcpp.obj
C:\tmp>testcpp n.txt
number of elements = 5000000 , median = 1.00008, mean = 5.42094 in 10.344
seconds.
C:\tmp>testcpp n.txt
number of elements = 5000000 , median = 1.00008, mean = 5.42094 in 10.328
seconds.
C:\tmp>testcpp n.txt
number of elements = 5000000 , median = 1.00008, mean = 5.42094 in 10.344
seconds.
C:\tmp>cl /W4 /Ox testc.c
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for
80x86
Copyright (C) Microsoft Corporation. All rights reserved.
testc.c
testc.c(32) : warning C4996: 'fopen': This function or variable may be
unsafe. Consider using fopen_s instead. To disable deprecation, use
_CRT_SECURE_NO_WARNINGS. See online help for d
etails.
C:\Program Files\Microsoft Visual Studio 8\VC\INCLUDE\stdio.h(234) :
see declaration of 'fopen'
testc.c(36) : warning C4996: 'fscanf': This function or variable may be
unsafe. Consider using fscanf_s instead. To disable deprecation, use
_CRT_SECURE_NO_WARNINGS. See online help for
details.
C:\Program Files\Microsoft Visual Studio 8\VC\INCLUDE\stdio.h(249) :
see declaration of 'fscanf'
testc.c(22) : warning C4100: 'argc' : unreferenced formal parameter
Microsoft (R) Incremental Linker Version 8.00.50727.762
Copyright (C) Microsoft Corporation. All rights reserved.
/out:testc.exe
testc.obj
C:\tmp>testc n.txt
number of elements = 5000000, median = 1.00008, mean = 5.42094 in 9.469000
seconds
C:\tmp>testc n.txt
number of elements = 5000000, median = 1.00008, mean = 5.42094 in 9.406000
seconds
C:\tmp>testc n.txt
number of elements = 5000000, median = 1.00008, mean = 5.42094 in 9.484000
seconds
C:\tmp>
*/
Even so, the realloc() was still the rate limiting step.
5 million doubles will be about 40 MB on most systems, which is about
2^25 bytes of memory.
The initial size of 1000 is about 2^10, so we will have 15 realloc()
calls which will entail memcpy() of all objects from the previous hunk
(an approximately O(N^2) operation by doing things in this manner).
By changing the starting size to 100,000 doubles {thereby cutting the
number of realloc() calls approximately in half}, the speed is a
little faster than the C++ version.
It's also a tragic mistake to sort the data to find the median.
Median location is O(N) when done properly {e.g. randomized
quickselect()}. But that's neither here nor there.
Having said all that, I would much rather write data structures in C++
than in C.
> On Apr 7, 7:14 pm, Kaz Kylheku <kkylh...@gmail.com> wrote:
>> On Apr 7, 6:31 pm, "Dann Corbit" <dcor...@connx.com> wrote:
>>
>> > He is timing memory management of the vector class verses a totally naive
>> > realloc for every single row beyond 1000.
>>
>> Alas, not true; you must have misread the ``res += res'' expression
>> which doubles the reserved size prior to each realloc.
>
> Even so, the realloc() was still the rate limiting step.
That did not seem to be the case in my tests. Commenting out the sort
in both programs gave C the advantage at all optimisation settings.
Just to be sure, I changed the C++ version to use fscanf and the
"naive" realloc beats the vector (but only by a little) for simply
reading and storing the numbers.
It seemed to me that C++ was able to inline the sort comparison. Of
course all this will depend on the libraries involved as much as it
does on the compilers.
--
Ben.
That is not a tight upper bound; in fact the copying is in fact an
O(N) operation. Every time it takes place, half of the elements in the
array being moved are brand new (have never been moved before, only
assigned to their present location). Of the remaining items, half have
been moved only once before, a quarter have been moved twice, and so
on. This works out to a constant average of 2 copies per element.
(Hint: find the limit to which converges the series 1/2 + 2/4 + 3/8 +
4/16 + 5/32 ...).
The copying doesn't necessarily take place, either, if the malloc is
able to just keep expanding the block in place.
I happen to like both C and C++, but there _are_ cases where C++ makes
things a little less clear than they could be. As a trivial example:
int x = 3, y;
y = func(x);
Quick: what's the value of x?
In C the answer's easy: the value is 3. In C++, there is absolutely no
way to know, at the point of calling, what will happen to the variable.
This makes the same code, in C, somewhat easier to read, as the
indicators of whether the value will be modified or not are right there,
in your face - or not. In C++, there's no indication whatsoever, meaning
you have to hunt for the called function, check its interface, realize
that yes, it modifies the value and add this to the growing list of
pointless things to remember which shouldn't need remembering.
Simplified nonsense. You are too kind.
Heres another C++ snippet:
int x=0,y=0;
x=1;
x=x+y;
y=x+2;
Quick what is y?
Answer no idea. Because you don't know what "+" does half the time .....
The clc crowd who fix bugs from a printout would not fare too well
methinks ....
If you need to look at the code, you should know what func does
anyway!
/Peter
Maybe with a name better than "func" it would be less ambiguous.
> In C the answer's easy: the value is 3. In C++, there is absolutely no
> way to know, at the point of calling, what will happen to the variable.
And in C, there's absolutely no way to know who's fiddling around with
your struct members because you can't restrict access to them.
<snip>
> And in C, there's absolutely no way to know who's fiddling around with
> your struct members because you can't restrict access to them.
Wrong. Maybe *you* can't restrict access to them, but I can, and so can
quite a few others in comp.lang.c. The technique is not difficult.
3
> Answer no idea. Because you don't know what "+" does half the time .....
Ah, but this is the other half. C++ doesn't allow "+" to be
overloaded for type int.
</OT>
[...]
--
Keith Thompson (The_Other_Keith) <ks...@mib.org>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Yes. An initializer must be an assigment-expression, which means it
cannot have a comma operator at its top level.
The same restriction applies to function arguments. In both cases,
it's because the comma is used as a delimiter in that context, so it's
not allowed as an operator.
I suppose this:
int x = 3;
int y;
would be clearer, but I have no problem with either form.
ITYM "... realize that yes, it /could modify/ the value ....". Unless
you look at func's definition, one cannot know for sure (just from
checking its interface) if it actually modifies 'x' or not.
- Anand
Many C compilers have the ability to arbitrarily inline functions that are
small enough to inline.
On Tue, 08 Apr 2008 10:45:09 -0700, peter koch wrote:
> Right. But this is only true in such trivial cases as the above where
> you allow such an absurd name. Replace the name with some reasonable
> one, such as square or fibonacci, and no one will be in doubt.
Really? Fine: try this:
string a, b;
a = somevalue;
b = trim(a);
If the intent of trim is to act like the similar function in PHP - i.e.
to remove leading and trailing whitespace - what does this tell us about
the effect, if any, on a?
Oh, right, nothing. Hmm. So what's a good name for this? How about, oh,
trim_which_modifies_its_parameter()? I don't want to type that.
> I have
> heard others complain like you, but never found statements such as the
> above to be a real problem.
Yet the problem remains: C++ makes the issue of whether a passed
parameter gets modified or not absolutely indeterminate at point of call
- meaning the poor bastard having to maintain the code has no hope, by
reading the portion of code he's trying to work on, of figuring out what
is actually going on.
This is not a particularly good design.
> A much larger problem has been in absurd declaration such as the above.
> Are you quite sure that it should read int x = 3; int y; (as you imply)
I implied that? Where? I wrote int x = 3, y; There's only one semicolon
in there, only one instance of "int". Why would you think this somehow
"implies" a completely different manner of writing code? Does your way
of writing it imply it should be re-done in Pascal? No? Why not? You
seem to be taking purely arbitrary paths, so why not that one?
> If you need to look at the code, you should know what func does anyway!
Why? Since it cannot modify the passed parameter, what it does is
fundamentally irrelevant to debugging the local problem. That is, unless
someone invented a method of passing parameters which would then be
modified, but making this absolutely invisible to the maintenance
programmer looking at the calling code.
Oh, yeah, they did exactly that, didn't they? Good, now instead of
focusing on *this* piece of code, one has to examine every called
function, not for its effects on output and the like, but its effects on
*local* variables it gives absolutely no indication it will be modifying.
Information hiding can be good. This sort is bad.
Indeed. However, the more important point is that you can be reasonably
certain that it will *not* be modifying values unless there's a pointer
involved somewhere, which is generally *visible* at the point of the call.
eg:
void func1(void)
{
int x = 3;
func2(x);
}
Anyone seeing this sort of thing wouldn't _need_ to examine "func2" to
see if x is modified. It's not. Simple as that. Unless you're using C+
+, in which case you have no way to tell either way.
> "Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message
> news:87wsn8m...@bsb.me.uk...
<snip>
>> It seemed to me that C++ was able to inline the sort comparison. Of
>> course all this will depend on the libraries involved as much as it
>> does on the compilers.
>
> Many C compilers have the ability to arbitrarily inline functions that are
> small enough to inline.
Yes, but very few (none?) can do it for the comparison function used
in qsort. I am sure it can be done, but I don't know of it being
done, which suggests it is tricky to get right.
I am not sure that this is what is happening, but the design of C++'s
STL encourages a lot of inlining in the templates used to implement
sort. Something is making the assembler that comes out of g++ -S ten
times bigger for the C version.
--
Ben.
This thread is cross-posted between comp.lang.c and comp.lang.c++,
which is rarely a good idea, especially when what's being discussed is
a controversial difference between the two languages.
I'll just point out that there are plenty of programming languages in
which functions (procedures, subprograms, whatever) can modify their
arguments, and programmers who use those languages generally manage to
cope with the pain.
If I were inventing a new language, I might consider requiring a
special syntax to distinguish between arguments that are passed by
value and arguments whose value might be changed. For example:
sin(x) /* passed by value, can't modify x */
set_sin(@x) /* can modify x */
In C++ you would never write as above. It would be
string a(somevalue);
string b(trim(a));
or
string a = somevalue;
string b = trim(a);
>
> If the intent of trim is to act like the similar function in PHP - i.e.
> to remove leading and trailing whitespace - what does this tell us about
> the effect, if any, on a?
I would assume this would mean that there would be no effect on a. If
trim would trim its parameter, the name of the function would indicate
this.
>
> Oh, right, nothing. Hmm. So what's a good name for this? How about, oh,
> trim_which_modifies_its_parameter()? I don't want to type that.
to_trim would be my choice, as_trim being the non-modifying name in
those rare cases where I would be in doubt.
>
> > I have
> > heard others complain like you, but never found statements such as the
> > above to be a real problem.
>
> Yet the problem remains: C++ makes the issue of whether a passed
> parameter gets modified or not absolutely indeterminate at point of call
> - meaning the poor bastard having to maintain the code has no hope, by
> reading the portion of code he's trying to work on, of figuring out what
> is actually going on.
>
> This is not a particularly good design.
I can't see how else you could design the code (unless you suggest
demanding pointers for the parameters which is not at all clearer), so
lets not discuss the design of C++ here. It is designed as it is and
not anything we can really do anything about.
>
> > A much larger problem has been in absurd declaration such as the above.
> > Are you quite sure that it should read int x = 3; int y; (as you imply)
>
> I implied that? Where? I wrote int x = 3, y; There's only one semicolon
> in there, only one instance of "int". Why would you think this somehow
> "implies" a completely different manner of writing code? Does your way
> of writing it imply it should be re-done in Pascal? No? Why not? You
> seem to be taking purely arbitrary paths, so why not that one?
It was just a remark about clarity - namely that I found your way of
declaring the variables unclear.
Also I do not see the connotion to Pascal. Here it was allowed (I
believe!) to declare several variables together - e.g. var i,j:
integer;
>
> > If you need to look at the code, you should know what func does anyway!
>
> Why? Since it cannot modify the passed parameter, what it does is
> fundamentally irrelevant to debugging the local problem. That is, unless
> someone invented a method of passing parameters which would then be
> modified, but making this absolutely invisible to the maintenance
> programmer looking at the calling code.
If you need to have a look at a piece of code, do you seriously
suggest that this can be done without at least a casual knowledge
about what called functions do?
>
> Oh, yeah, they did exactly that, didn't they? Good, now instead of
> focusing on *this* piece of code, one has to examine every called
> function, not for its effects on output and the like, but its effects on
> *local* variables it gives absolutely no indication it will be modifying.
>
> Information hiding can be good. This sort is bad.
Certainly. That is why you choose your function (and variable) names
with care.
/Peter
Yup - it would be much clearer. I probably would not let that one pass
in a C++ code review (also because there is a potentially
uninitialised variable), but of course C is a different beast.
I certainly would not let int* i,j pass in either language.
/Peter
>> Many C compilers have the ability to arbitrarily inline functions that are
>> small enough to inline.
>Yes, but very few (none?) can do it for the comparison function used
>in qsort. I am sure it can be done, but I don't know of it being
>done, which suggests it is tricky to get right.
It could work if qsort() itself was inlined as well as the comparison
function, because then the comparison function would be known at compile
time.
This would use more space, but in principle a compiler could generate
"curried" versions of qsort() for each (known-at-compile-time) comparison
function, to avoid replication of the code where the the same comparison
function is used in multiple calls to qsort().
(Currying is "partially evaluating" a function after fixing some of
its arguments.)
-- Richard
--
:wq
--
Ian Collins.
Inlining qsort's comparison function should be straightforward *if*
the compiler is able to inline (a call to) the qsort() function
itself. That would require either the ability to inline across
separately compiled translation units, or an implementation that in
effect delays compilation until link time.
Agreed. But this:
int *i, j;
might be acceptable in C (except that "i" is a terrible name for a
pointer). But yes, I'd prefer
int *i; // or "int* i;" if you insist
int j;
in either language.
Note that this:
int *ptr;
tends to be the preferred spacing in C, and this:
int* ptr;
tends to be the preferred spacing in C++. There are valid reasons for
either convention.
Also, I would have used this comparison function:
#include <float.h>
#include <math.h>
int double_compare(void *pd1, void *pd2)
{
double d1 = *(double *) pd1;
double d2 = *(double *) pd2;
if (d1 > d2)
if ((d1 - d2) < fabs(d1 * DBL_EPSILON))
return 0;
else
return 1;
if (d1 < d2)
if ((d2 - d1) < fabs(d2 * DBL_EPSILON))
return 0;
else
return -1;
return 0;
}
The actual problem proposed should logically have been solved like this:
#include <stdlib.h>
#include <float.h>
#include <math.h>
typedef double Etype;
extern Etype RandomSelect(Etype * A, size_t p, size_t r, size_t i);
extern size_t RandRange(size_t a, size_t b);
extern size_t RandomPartition(Etype * A, size_t p, size_t r);
extern size_t Partition(Etype * A, size_t p, size_t r);
/*
**
** In the following code, every reference to CLR means:
**
** "Introduction to Algorithms"
** By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest
** ISBN 0-07-013143-0
*/
/*
** CLR, page 187
*/
Etype RandomSelect(Etype A[], size_t p, size_t r, size_t i)
{
size_t q,
k;
if (p == r)
return A[p];
q = RandomPartition(A, p, r);
k = q - p + 1;
if (i <= k)
return RandomSelect(A, p, q, i);
else
return RandomSelect(A, q + 1, r, i - k);
}
size_t RandRange(size_t a, size_t b)
{
size_t c = (size_t) ((double) rand() / ((double) RAND_MAX + 1)
* (b - a));
return c + a;
}
/*
** CLR, page 162
*/
size_t RandomPartition(Etype A[], size_t p, size_t r)
{
size_t i = RandRange(p, r);
Etype Temp;
Temp = A[p];
A[p] = A[i];
A[i] = Temp;
return Partition(A, p, r);
}
/*
** CLR, page 154
*/
size_t Partition(Etype A[], size_t p, size_t r)
{
Etype x,
temp;
size_t i,
j;
x = A[p];
i = p - 1;
j = r + 1;
for (;;) {
do {
j--;
} while (!(A[j] <= x));
do {
i++;
} while (!(A[i] >= x));
if (i < j) {
temp = A[i];
A[i] = A[j];
A[j] = temp;
} else
return j;
}
}
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
static const double CLOCK_INV = 1.0 / CLOCKS_PER_SEC;
int compare(const void *p, const void *q)
{
register double p0 = *(double *) p;
register double q0 = *(double *) q;
if (p0 > q0)
return 1;
if (p0 < q0)
return -1;
return 0;
}
void quit()
{
fprintf(stderr, "memory exhausted\n");
exit(1);
}
int main(int argc, char *argv[])
{
int res = 100000;
char *file = argv[1];
double median = 0;
double mean = 0;
int n = 0;
double d;
clock_t start,
end;
double *buf = (double *) malloc(sizeof(double) * res);
FILE *fin = fopen(file, "r");
if (buf == 0)
quit();
start = clock();
while (fscanf(fin, "%lg", &d) == 1) {
if (n == res) {
res += res;
buf = (double *) realloc(buf, sizeof(double) * res);
if (buf == 0)
quit();
}
buf[n++] = d;
mean = (n == 1) ? d : mean + (d - mean) / n;
}
/*
if n is odd, then we use (n+1)/2
If n is even, we use [(n/2)th term + (n/2+1)th term] / 2
*/
if (n) {
int mid = n / 2;
median = (n % 2) ? RandomSelect(buf, 0, n - 1, mid) :
(RandomSelect(buf, 0, n - 1, mid-1) + RandomSelect(buf, 0, n - 1, mid))/2;
}
/*
qsort(buf, n, sizeof(double), compare);
if (n) {
int mid = n / 2;
median = (n % 2) ? buf[mid] : (buf[mid - 1] + buf[mid]) / 2;
}
*/
end = clock();
printf("number of elements = %d, median = %g, mean = %g in %f
seconds\n", n, median, mean, (end - start) * CLOCK_INV);
free(buf);
return 0;
}
/*
It would be a lot faster for odd numbered data sets:
C:\tmp>ksel n.txt
number of elements = 5000000, median = 1.00008, mean = 5.42094 in 8.000000
seconds
C:\tmp>ksel n.txt
number of elements = 5000000, median = 1.00008, mean = 5.42094 in 7.375000
seconds
C:\tmp>ksel n.txt
number of elements = 5000000, median = 1.00008, mean = 5.42094 in 7.437000
seconds
*/
I guessed wrong, the extra cost for a second call to RandomSelect is
negligible:
C:\tmp>ksel n.txt
number of elements = 5000001, median = 1.00008, mean = 5.42094 in 7.328000
seconds
C:\tmp>ksel n.txt
number of elements = 5000001, median = 1.00008, mean = 5.42094 in 7.375000
seconds
C:\tmp>ksel n.txt
number of elements = 5000001, median = 1.00008, mean = 5.42094 in 7.375000
seconds
I did not profile, but I guess that we are now limited by file I/O.
Bad idea. Treating two floating-point values as "equal" if they're
sufficiently close might be useful in some contexts, but not in a
qsort comparison function.
Suppose your array contains three elements X, Y, Z whose values are
very close to each other. Your comparison function could cause qsort
to treat them as if X==Y and Y==Z, but X < Z. Hilarity ensues.
What's wrong with just using the predefined "<" and "=="?
Just "<" will do...
int dbl_cmp(const void *lv, const void *rv)
{
const double *ld = lv;
const double *rd = rv;
return !(*ld < *rd) - (*ld < *rd);
}
Care still needs to be taken though with infinities and nans.
--
Peter
No it won't. Try it.
> What's wrong with just using the predefined "<" and "=="?
The principle of least surprise.
Trying it can't demonstrate that the behavior is well defined.
I believe that giving qsort() a comparison function that doesn't
define a complete ordering on the elements invokes undefined behavior.
The standard doesn't explicitly say so, but I believe it's implicit
in the description:
The function shall return an integer less than, equal to, or
greater than zero if the first argument is considered to be
respectively less than, equal to, or greater than the second.
Equality is an equivalence relation; X==Y and Y==Z implies X==Z.
>> What's wrong with just using the predefined "<" and "=="?
>
> The principle of least surprise.
If qsort() gives me array[N] > array[N+1], I'll be very surprised.
You go to significant additional effort to ensure that sorting works
less well than it could. I fail to see the point.
>>> int double_compare(void *pd1, void *pd2)
>>> {
>>> double d1 = *(double *) pd1;
>>> double d2 = *(double *) pd2;
>>> if (d1 > d2)
>>> if ((d1 - d2) < fabs(d1 * DBL_EPSILON))
>>> return 0;
>>> else
>>> return 1;
>>> if (d1 < d2)
>>> if ((d2 - d1) < fabs(d2 * DBL_EPSILON))
>>> return 0;
>>> else
>>> return -1;
>>> return 0;
>>> }
>> Suppose your array contains three elements X, Y, Z whose values are
>> very close to each other. Your comparison function could cause qsort
>> to treat them as if X==Y and Y==Z, but X < Z. Hilarity ensues.
>No it won't.
What won't what? Hilarity won't ensue?
Are you saying you can't have X "equal" to Y, Y equal to Z, but X less
than Z? Or are you saying this doesn't matter?
If the input is Z, Y, X, what stops qsort() leaving it unchanged, even though
X < Z?
>Try it.
I did. The program is below. The output is
comparisons: 0 0 -1
before: 1.25000000000000044409 1.25000000000000022204 1.25000000000000000000
after: 1.25000000000000044409 1.25000000000000022204 1.25000000000000000000
X compares equal to Y. Y compares equal to Z. X compares as less than Z.
When I sort them, Z wrongly comes before X.
#include <float.h>
#include <stdio.h>
#include <math.h>
int double_compare(void *pd1, void *pd2)
{
double d1 = *(double *) pd1;
double d2 = *(double *) pd2;
if (d1 > d2)
if ((d1 - d2) < fabs(d1 * DBL_EPSILON))
return 0;
else
return 1;
if (d1 < d2)
if ((d2 - d1) < fabs(d2 * DBL_EPSILON))
return 0;
else
return -1;
return 0;
}
int main(void)
{
double x = 1.25, y = x+DBL_EPSILON, z = y+DBL_EPSILON;
double a[3];
printf("comparisons: %d %d %d\n",
double_compare(&x, &y),
double_compare(&y, &z),
double_compare(&x, &z));
a[0] = z; a[1] = y; a[2] = x;
printf("before: %.20f %.20f %.20f\n", a[0], a[1], a[2]);
qsort(a, 3, sizeof(a[0]), double_compare);
printf("after: %.20f %.20f %.20f\n", a[0], a[1], a[2]);
return 0;
}
-- Richard
--
:wq
I think you may be right, in that it is possible that some implementation of
qsort() conceivably might not terminate.
> I believe that giving qsort() a comparison function that doesn't
> define a complete ordering on the elements invokes undefined behavior.
> The standard doesn't explicitly say so, but I believe it's implicit
> in the description:
>
> The function shall return an integer less than, equal to, or
> greater than zero if the first argument is considered to be
> respectively less than, equal to, or greater than the second.
>
> Equality is an equivalence relation; X==Y and Y==Z implies X==Z.
If only floating point actually worked like that, things would be a lot
easier.
>>> What's wrong with just using the predefined "<" and "=="?
>>
>> The principle of least surprise.
>
> If qsort() gives me array[N] > array[N+1], I'll be very surprised.
>
> You go to significant additional effort to ensure that sorting works
> less well than it could. I fail to see the point.
You are assuming that the ordering:
0.99999999999999978, 1, 1.0000000000000002
Is somehow better than:
0.99999999999999978, 1.0000000000000002, 1
It isn't. In fact, it is not unlikely that all three of them are really the
same number.
Ah, but those are all the same number.
>Ah, but those are all the same number.
Not according to your comparison function. The third is less than the
first (see the -1 in the first output line). qsort() has produced an
ordering inconsistent with the comparison function.
If instead of three numbers I used millions, qsort() could produce a
list where the last value was smaller than the first by millions of
times DBL_EPSILON.
Your comparison function does not define a total order, as required
by the standard. Z <= Y and Y <= X, but not Z <= X.
-- Richard
--
:wq
If you see three double numbers and they differ in relative error by
DBL_EPSILON, I submit that the probability that they are different numbers
is approximately zero and the probility that they are the same number is
approximately 1.
Depending on the order those three numbers arrive, the result of comparisons
may order them differently. But that is quite alright, since it is what the
function ought to do.
No they're not. They are exactly:
125000000000000044409 / 100000000000000000000
125000000000000022204 / 100000000000000000000
and 5 / 4
However, they may have arisen differently :-)
--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.
Or the sort may not terminate, or something else. Even if it
terminates, you now have a "sorted" array that (for example) will not
work with bsearch(), because it is not consistent with the comparison
function.
The fact is that your proposed function does not fulfill the
requirements of the standard. Programs that use it may give wrong
answers with no indication of error when presented with certain data.
-- Richard
--
:wq
I submit that it is consistent. If you really think that 1+eps is not the
same number as eps for double precision floating point, then I think you
should imagine how such numbers might arrive in your data base. I submit
that it will not interfere with correct operation of bsearch(), since
bsearch() should work in exactly the same way that qsort does for
comparisons.
> The fact is that your proposed function does not fulfill the
> requirements of the standard. Programs that use it may give wrong
> answers with no indication of error when presented with certain data.
It returns an ordering. The standard never says that the ordering must
always be identical (and indeed, such an ordering would be impossible on
many implementations due to the nature of floating point).
Your objection is correct, but specious.
There are a myriad of C functions (both that you/someone else wrote)
as well as ones in the stdio/stdlib/etc., that take pointers as
parameters. You cannot tell just by looking at the call to them
whether they modify what's being pointed at either.
Oz
--
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
It is better, even by your own standards.
> In fact, it is not unlikely that all three of them are
> really the same number.
Here's my modified version of Richard Tobin's program:
#include <float.h>
#include <stdio.h>
#include <math.h>
int double_compare(void *pd1, void *pd2)
{
double d1 = *(double *) pd1;
double d2 = *(double *) pd2;
if (d1 > d2)
if ((d1 - d2) < fabs(d1 * DBL_EPSILON)) {
printf("%.20f == %.20f\n", d1, d2);
return 0;
}
else {
printf("%.20f > %.20f\n", d1, d2);
return 1;
}
if (d1 < d2)
if ((d2 - d1) < fabs(d2 * DBL_EPSILON)) {
printf("%.20f == %.20f\n", d1, d2);
return 0;
}
else {
printf("%.20f < %.20f\n", d1, d2);
return -1;
}
return 0;
}
int main(void)
{
double x = 1.25, y = x+DBL_EPSILON, z = y+DBL_EPSILON;
double a[3];
a[0] = z; a[1] = y; a[2] = x;
printf("before: %.20f %.20f %.20f\n", a[0], a[1], a[2]);
qsort(a, 3, sizeof(a[0]), double_compare);
printf("after: %.20f %.20f %.20f\n", a[0], a[1], a[2]);
printf("a[0] vs. a[1]: ");
(void)double_compare(&a[0], &a[1]);
printf("a[1] vs. a[2]: ");
(void)double_compare(&a[1], &a[2]);
printf("a[0] vs. a[2]: ");
(void)double_compare(&a[0], &a[2]);
return 0;
}
And here's the output I get:
before: 1.25000000000000044409 1.25000000000000022204 1.25000000000000000000
1.25000000000000044409 == 1.25000000000000022204
1.25000000000000022204 == 1.25000000000000000000
after: 1.25000000000000044409 1.25000000000000022204 1.25000000000000000000
a[0] vs. a[1]: 1.25000000000000044409 == 1.25000000000000022204
a[1] vs. a[2]: 1.25000000000000022204 == 1.25000000000000000000
a[0] vs. a[2]: 1.25000000000000044409 > 1.25000000000000000000
Note that *according to your own comparison function* a[0] and a[2]
are ordered incorrectly.
This happens because your comparison function is inconsistent. qsort
isn't misbehaving; it's doing exactly what it's supposed to do with
the information available to it.
Now if you could divide the set of real numbers into intervals such
that:
All numbers within a given interval are equal.
For any two intervals Foo and Bar, either all numbers in Foo are
less than all numbers in Bar, or all numbers in Foo are
greater than all numbers in Bar
then you could have a consistent comparison function. But it would
have problems; two numbers that are arbitrarily close together could
be unequal, while two numbers farther apart (say, at opposite ends of
an interval) could be equal.
It's *so* much simpler *and better* to use the predefined comparison
operators (so that, in the model of the previous paragraph, each
distinct number is its own interval).
I think you're assuming that there's something fundamentally wrong
with built-in floating-point comparisons. There isn't. You just have
to be careful with them.
The comparison function:
return a > b ? 1 : a < b ? -1 : 0;
will also perform just as I have demonstrated on a conforming
implementation, as will yours.
Consider a machine with 80 bit floating point. It's not so far fetched, as
both Microsoft and Intel compilers will allow 80 bits of internal precision
by choosing options accordingly.
Now, imagine the values 1.25000000000000000001 and 1.25000000000000000000
are compared in two registers.
We decide that the first is bigger than the second.
Now, an interrupt fires. These values get shoved into 64 bit registers and
this time they compare equal. Is the comparison broken? Is the floating
point broken? No, and no.
What I am saying is that literally 1.25000000000000000001 and
1.25000000000000000000 are the same number if you encounter them. Consider
how numbers such as these might come about. You might have added part of a
palatte and part of a palatte or you might have combined the eaches. For
example:
(3/14 palatte) * 2 palattes + (8/14 palatte) * one palatte * 30 eaches per
palatte * 6 items per each
verses adding the individual eaches. Or something of that nature. Floating
point itself is often called an "approximate numeric" type because it does
not represent values exactly.
It is my opinion that your ordering is not better than my ordering and that
assuming 1.25000000000000000001 and 1.25000000000000000000 are different
numbers is a mistake. They are almost *surely* the same number.
I have always detested this kind of argument. "Maybe you can't, but we
who are better than you can", "it's quite easy", and then don't even
show this "easy" solution at all (which usually involves ugly hacks and
jumping through completely unnecessary hoops).
Reminds me of the C-hackers who insist that it's "easy" and "feasible"
to use dynamically bound "member functions" in structs, as if that
somehow made C as good as C++ (yet the big problem with that hack is
that each "member function" of the struct increases the size of the
struct, which in many cases is a completely useless waste and makes that
struct larger and more inefficient).
Is this on topic:
http://groups.google.com/group/comp.lang.c/msg/dbb16567e49c42fc
?
> Richard Heathfield wrote:
>> lbon...@yahoo.com said:
>>
>> <snip>
>>
>>> And in C, there's absolutely no way to know who's fiddling around with
>>> your struct members because you can't restrict access to them.
>>
>> Wrong. Maybe *you* can't restrict access to them, but I can, and so can
>> quite a few others in comp.lang.c. The technique is not difficult.
>
> I have always detested this kind of argument.
Let's see whether I can amend it to your liking.
> "Maybe you can't, but we
> who are better than you can",
Not what I meant. Mr lbonafide said that there's absolutely no way, so I
presume he accepts that he can't do it, and that his inability to do it is
not in dispute. That doesn't mean I'm "better" than him. It does, however,
appear to mean that I know something he doesn't - which is easily fixed,
of course.
> "it's quite easy",
Aye, it is. Sorry.
> and then don't even
> show this "easy" solution at all
See below.
> (which usually involves ugly hacks and
> jumping through completely unnecessary hoops).
Um, it's not /that/ ugly a hack, and I assure you I'll only jump through
*necessary* hoops.
The technique involves a minimum of three files - the interface (.h), the
implementation (.c, which you *don't release*), and the driver, which the
user-programmer writes.
Here is our interface (which we ship):
#ifndef H_POINT
#define H_POINT 1
#include <limits.h>
#define POINT_INVALID INT_MIN
typedef struct point_ point;
point *point_make(int, int);
void point_destroy(point **);
int point_setx(point *, int);
int point_sety(point *, int);
int point_set(point *, int, int);
double point_diff(const point *, const point *);
#endif
The important point (if you'll forgive the word) is that the point type is
declared but *not* defined, so its members are not visible. More about
that shortly.
The totally unimportant point is that this is just an example, and so you
shouldn't expect to see a complete function suite here!
Implementation file (NOT shipped):
#include "point.h"
#include <stdlib.h>
#include <math.h>
struct point_
{
int x;
int y;
};
point *point_make(int x, int y)
{
point *new = malloc(sizeof *new);
if(new != NULL)
{
new->x = x;
new->y = y;
}
return new;
}
void point_destroy(point **old)
{
if(old != NULL)
{
free(*old);
*old = NULL;
}
}
int point_setx(point *p, int x)
{
int old = POINT_INVALID;
if(p != NULL)
{
old = p->x;
p->x = x;
}
return old;
}
int point_sety(point *p, int y)
{
int old = POINT_INVALID;
if(p != NULL)
{
old = p->y;
p->y = y;
}
return old;
}
int point_set(point *p, int x, int y)
{
int rc = POINT_INVALID;
if(p != NULL)
{
rc = 0;
p->x = x;
p->y = y;
}
return rc;
}
double point_diff(const point *pa, const point *pb)
{
double d = 0.0;
if(pa != NULL && pb != NULL)
{
double xd = pa->x - pb->x;
double yd = pa->y - pb->y;
d = sqrt(xd * xd + yd * yd);
}
return d;
}
We don't ship this to the user. Instead, we compile it, put the resulting
object code into a library, and ship the library.
Test driver:
#include "point.h"
#include <stdio.h>
int main(void)
{
point *p1 = point_make(1, 1);
if(p1 != NULL)
{
point *p2 = point_make(4, 5);
if(p2 != NULL)
{
double d = point_diff(p1, p2);
printf("The hypotenuse is %f\n", d);
point_destroy(&p2);
}
point_destroy(&p1);
}
return 0;
}
This works. (Try it.)
The following test driver does *not* work:
#include "point.h"
#include <stdio.h>
int main(void)
{
point *p1 = point_make(1, 1);
size_t size = sizeof *p1; /* can't do this */
point instance; /* can't do this either */
instance.x = 42; /* or this! */
if(p1 != NULL)
{
point *p2 = point_make(4, 5);
if(p2 != NULL)
{
double d = point_diff(p1, p2);
printf("The hypotenuse is %f\n", d);
point_destroy(&p2);
}
point_destroy(&p1);
}
return 0;
}
When I compile this, I get the following errors:
driver.c: In function `main':
driver.c:8: dereferencing pointer to incomplete type
driver.c:9: storage size of `instance' isn't known
Thus, you *can* restrict access to members of a struct, QED.
> Reminds me of the C-hackers who insist that it's "easy" and "feasible"
> to use dynamically bound "member functions" in structs, as if that
> somehow made C as good as C++ (yet the big problem with that hack is
> that each "member function" of the struct increases the size of the
> struct, which in many cases is a completely useless waste and makes that
> struct larger and more inefficient).
<shrug> The "as if that somehow made C as good as C++" bit is mere flame
bait. C and C++ are different languages with different goals and different
user bases. To claim that one is "better" than another, without specifying
in what way it is better and for whom, is to fail to understand that there
are good reasons for having more than one programming language.
Whilst it /is/ not only feasible but even easy to use dynamically bound
member functions in structs, it is certainly true that this incurs an
overhead for the pointer in every instance of the struct, and this cost
should be considered when deciding whether or not to pursue this course.
The way I resolve this myself is simple - either I'm going to be using a
lot of structs, in which case they're probably going to be in some kind of
container, so I'll put the pointers in the container instead - which is
normally okay because all the structs in any given container are generally
going to want to be processed in the same way), or I only have a few, in
which case the overhead is no big deal.
--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Yes, I'd say it's broken, if the behavior of the operation changes if
it's interrupted. On the other hand, the C standard's requirements
for floating-point accuracy are sufficiently loose that it's probably
not a conformance violation.
> What I am saying is that literally 1.25000000000000000001 and
> 1.25000000000000000000 are the same number if you encounter them. Consider
> how numbers such as these might come about. You might have added part of a
> palatte and part of a palatte or you might have combined the eaches. For
> example:
>
> (3/14 palatte) * 2 palattes + (8/14 palatte) * one palatte * 30 eaches per
> palatte * 6 items per each
>
> verses adding the individual eaches. Or something of that nature. Floating
> point itself is often called an "approximate numeric" type because it does
> not represent values exactly.
No, they are *not* literally the same number. They are two different
numbers that might (or might not) happen to be the result of
calculations that logically should have yielded the same value. If
they were literally the same number, they wouldn't be
1.25000000000000000001 and 1.25000000000000000000; they'd be
1.25000000000000000001 and 1.25000000000000000001.
> It is my opinion that your ordering is not better than my ordering and that
> assuming 1.25000000000000000001 and 1.25000000000000000000 are different
> numbers is a mistake. They are almost *surely* the same number.
So what?
We've already demonstrated cases where numbers that are unequal even
by your standard are incorrectly sorted using your comparison
function. (I tried 10000 elements and got some interesting results.)
At the very least, my method is no worse than yours. If
1.25000000000000000001 and 1.25000000000000000000 are "the same
number" then *it doesn't matter* how they compare. You add a
multiplication, a subtraction, a call to fabs(), and a second
comparison to each comparison, all for no benefit.
In the context of sorting, assuming that two very close numbers are
different does no harm while saving considerable time and complexity.
<snip>
> Heres another C++ snippet:
>
> int x=0,y=0;
>
> x=1;
> x=x+y;
>
> y=x+2;
>
> Quick what is y?
>
> Answer no idea. Because you don't know what "+" does half the time .....
>
> The clc crowd who fix bugs from a printout would not fare too well
> methinks ....
we'd have the printout for operator+()
:-)
--
Nick Keighley
>I submit that it is consistent. If you really think that 1+eps is not the
>same number as eps for double precision floating point, then I think you
>should imagine how such numbers might arrive in your data base.
I quite agree that for practical purposes they maybe the same number,
but building this idea into a qsort() comparison function does no good
and may break things.
>I submit
>that it will not interfere with correct operation of bsearch(), since
>bsearch() should work in exactly the same way that qsort does for
>comparisons.
It works in the same way, but bsearch() assumes that the order of the
array is consistent with the comparison function, and if we use your
function it isn't.
Suppose we have x, x+e, and x+2e, x+3e, x+4e. Your comparison
function may sort them as x+4e, x+3e, x+2e, x+e, x. Now you search
for x. bsearch() will compare it with x+2e, and find it's less. It
will then compare it with x+3e and x+4e, and conclude that it is not
present in the array. It will give the wrong answer.
>It returns an ordering. The standard never says that the ordering must
>always be identical (and indeed, such an ordering would be impossible on
>many implementations due to the nature of floating point).
It says it must be a total order. Yours isn't.
-- Richard
--
:wq
Four man-years to write a 450-line C++ code? Jesus, were they high or
something?
Adam
Do something good today. Fight cancer with just a click! - Global
Cancer Research Institute
http://gcri.blogspot.com/
Actually, the standard doesn't explicitly say it must be a total
order. IMHO it should say so.
C99 7.20.5.2:
The qsort function sorts an array of nmemb objects, the initial
element of which is pointed to by base. The size of each object is
specified by size.
The contents of the array are sorted into ascending order
according to a comparison function pointed to by compar, which is
called with two arguments that point to the objects being
compared. The function shall return an integer less than, equal
to, or greater than zero if the first argument is considered to be
respectively less than, equal to, or greater than the second.
If two elements compare as equal, their order in the resulting
sorted array is unspecified.
I believe this *implies* a total ordering (since qsort() can't
reasonably be expected to cope with anything other than a total
ordering). But as we're seeing, the current wording does leave room
for debate.
> On Apr 5, 11:28 pm, Richard Heathfield <r...@see.sig.invalid> wrote:
<snip>
>>
>> > Are their any counter-arguments against Stroustroups claim?
>>
>> My favourite example is a 450-line C++ program I was shown, about eight
>> years ago, which had taken four man-years to write and took about 10
>> minutes to process a fairly trivial amount of data. I rewrote it from
>> spec (because I saw no point in rewriting it from the code) in a single
>> day, in C. My version took considerably less than a second to process
>> the same amount of data. There is more to efficiency than mere language
>> choice.
>>
> Four man-years to write a 450-line C++ code? Jesus, were they high or
> something?
Eight blokes * six months. (I don't know whether they were high. But in the
end, they were fired.) There was no excuse for it. The task was a very,
very slightly forgiving file-comparison routine, used for comparing
QARun-recorded data without worrying too much about dates and stuff.
Really, the only reason I took a day over it was to be sure that I didn't
hand back a lemon. To take four man-years was unforgivable (and was not in
fact forgiven, once it came to light).
VERY SIMPLE HUH?
verrryyy eaasyyyyyyy and nooooooooo boilerplaateeee
bah!
I always laugh to coders that loves to work for the computer
of course most of them does not have childs of wife or even a
girlfriend ;-)
computers should work to us. they are designed to do that :)
sorry for the ranting, but there are some things I can't admit today
>Actually, the standard doesn't explicitly say it must be a total
>order.
Look at the introductory text of 7.20.5. It specifically says
"total ordering", though admittedly in a slightly strange context.
-- Richard
--
:wq
Yes it does: 7.20.5p4 (always look at the parent clauses in addition to
the specific clause).
-Larry Jones
Bad news, Mom. I promised my soul to the Devil this afternoon. -- Calvin
Or an inline definition of qsort() in <stdlib.h>. A conforming
implementation can't do that directly, but it can define the inline
function with a reserved name and then define a qsort() macro that
invokes the inline function.
-Larry Jones
It works on the same principle as electroshock therapy. -- Calvin
Do you consider Bjarne Stroustrup to be unbiased in his choosing of
how to code up the C solution? Not surprisingly he poisoned the C
solution, and is arguing a case for which *BOTH* solutions are
suboptimal by design.
If you want to do language comparisons, you should use "The Great
Computer Language Shootout". They accept solutions to each problem
from anyone who cares to submit them for each language, and they use
the fastest code for each language. The results clearly show C and C+
+ to be pretty much dead even. (Mostly because C++ compilers are
being used to compile C, and C++ the language itself has no additional
performance features whatsoever.)
> ===
> The simplest specific example I can think of is a program to find the mean
> and median of a sequence of double precision floating-point numbers read
> from input.
The mean should be limited by input speed (it can be calculated
incrementally) and the median should be implemented using the "select"
algorithm, which offers no advantage for either side (its a fairly
complicated algorithm, though).
> [...] A conventional C-style solution would be:
>
> #include <stdlib.h>
> #include <stdio.h>
>
> int compare(const void *p, const void *q)
> {
> register double p0 = * (double *)p;
> register double q0 = * (double *)q;
> if (p0 > q0) return 1;
> if (p0 < q0) return -1;
> return 0;
> }
>
> void quit()
> {
> fprintf(stder, "memory exhausted\n");
> exit(1);
>
> }
>
> int main(int argc, char *argv[])
> {
> int res = 1000;
> char * file = argv[2];
>
> double *buf = (double *) malloc(sizeof (double) *res);
> if (buf==0) quit();
>
> double median = 0;
> double mean = 0;
> int n = 0;
>
> FILE *fin = fopen(file, "r");
> double d;
> while (fscanf(fin, "%lg", &d) == 1) {
> if (n==res) {
> res += res;
Integer overflow is not being tested for.
> buf = (double *)realloc(buf, sizeof(double) *res);
> if (buf==0) quit();
This is some pretty bad error handling.
> }
> buf[n++] = d;
> mean = (n == 1) ? d : mean + (d - mean) / n;
>
> }
>
> qsort(buf, n, sizeof(double), compare);
C programmers who use qsort, are programmers who have absolutely no
interest in performance. The overhead of qsort immediately makes it
slower than a well implemented heapsort almost always (and certainly
in this case of floating point sorting). The algorithmic complexity
problems of quick sort makes it slower than even badly written
heapsorts some of the time.
But as I described above, using sort is not the right way to solve
this (Its O(N*ln(N)). Use the Select algorithm; its O(N).
> if (n) {
> int mid = n/2;
> median = (n%2) ? buf[mid] : (buf[mid - 1] + buf[mid]) / 2;
> }
>
> printf("number of elements = %d, median = %g, mean = %g\n", n, median,
> mean);
> free(buf);
>
> }
>
> To compare, here is an idiomatic C++ solution:
>
> #include<vector>
> #include<fstream>
> #include<algorithm>
>
> using namespace std;
>
> int main(int argc, char * argv[])
> {
> char *file = argv[2];
> vector<double>buf;
>
> double median = 0;
> double mean = 0;
>
> fstream fin(file, ios::in);
> double d;
> while (fin >> d) {
> buf.push_back(d);
> mean = (buf.size() == 1) ? d : mean + (d - mean) / buf.size();
> }
>
> sort(buf.begin(), buf.end());
>
> if (buf.size()) {
> int mid = buf.size() / 2;
> median = (buf.size() %2) ? buf[mid] : (buf[mid - 1] + buf[mid]) / 2;
>
> }
>
> cout << "number of elements = " << buf.size()
> << " , median = " << median << ", mean = " << mean << '\n';}
The main thing Bjarne Stroustrup is trying to prove with all this is
that template code inlining is faster than eating indirect function
call overhead.
However, a completely macro-based heap sort can be implemented in C in
about 20 lines of code.
So its apples versus oranges, and specifically noting that the oranges
have been implemented in C++'s standard template library, while it has
not been in C's standard library.
> ======================
>
> He goes on to claim that, using a sample set of 5,000,000 elements, the C++
> code is more than 4 times faster than the C code. Further examples follow to
> prove the superiority of C++ even in the realm of efficiency.
The larger he makes the sample set, the more of a fool he looks like.
If you have large N, and you do *NOT* use the select algorithm, then
you are very *demonstratively* doing a clearly very wrong thing.
> So C coders who care about efficiency should switch to C++?
No. Coder who care about efficiency should first learn standard
algorithms, then understand how to recognize unnecessary performance
bottlenecks (which are always introduced when using qsort.)
> Are their any counter-arguments against Stroustroups claim?
In this case, its not a question of counter-argument, its a question
of debunking. If this is *truly* an example cited by Bjarne
Stroustrup, he deserves to be tarred and feathered on at least 3
grounds: 1) ignoring the Great Computer Language Shootout, 2) being
unaware of standard algorithms, and 3) intentionally misrepresenting
the kind of performance you can get from good C code.
--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
The reason it does not say so is that it is literally impossible to enforce
this. The example I drew previously using 80 bit registers is a real
example that is operational today using the Microsoft or Intel compilers
with certain options. The shaving off of the trailing dirt at the end of
the numbers really does happen. It was not a hypothetical case.
> C99 7.20.5.2:
>
> The qsort function sorts an array of nmemb objects, the initial
> element of which is pointed to by base. The size of each object is
> specified by size.
>
> The contents of the array are sorted into ascending order
> according to a comparison function pointed to by compar, which is
> called with two arguments that point to the objects being
> compared. The function shall return an integer less than, equal
> to, or greater than zero if the first argument is considered to be
> respectively less than, equal to, or greater than the second.
>
> If two elements compare as equal, their order in the resulting
> sorted array is unspecified.
>
> I believe this *implies* a total ordering (since qsort() can't
> reasonably be expected to cope with anything other than a total
> ordering). But as we're seeing, the current wording does leave room
> for debate.
That's because it has to.
Bad news then, since that renders some compilers non-conforming.
Tell me what this difference in numbers will make in floating point. If
this difference is important then we should never perform an addition,
subtraction, multiplication or division, because that dirt at the end of the
numbers happens when we do one of these operations.
>> What I am saying is that literally 1.25000000000000000001 and
>> 1.25000000000000000000 are the same number if you encounter them.
>> Consider
>> how numbers such as these might come about. You might have added part of
>> a
>> palatte and part of a palatte or you might have combined the eaches. For
>> example:
>>
>> (3/14 palatte) * 2 palattes + (8/14 palatte) * one palatte * 30 eaches
>> per
>> palatte * 6 items per each
>>
>> verses adding the individual eaches. Or something of that nature.
>> Floating
>> point itself is often called an "approximate numeric" type because it
>> does
>> not represent values exactly.
>
> No, they are *not* literally the same number. They are two different
> numbers that might (or might not) happen to be the result of
> calculations that logically should have yielded the same value.
There is no physical measurement on earth that will produce a number with
that sort of difference. The tiny bit of sand we find populating the tails
of floating point numbers is the result of operations performed against
those numbers and not as the result of our initial or subsequent
measurements.
> If
> they were literally the same number, they wouldn't be
> 1.25000000000000000001 and 1.25000000000000000000; they'd be
> 1.25000000000000000001 and 1.25000000000000000001.
This is a tragic misunderstanding as to the nature of floating point.
>> It is my opinion that your ordering is not better than my ordering and
>> that
>> assuming 1.25000000000000000001 and 1.25000000000000000000 are different
>> numbers is a mistake. They are almost *surely* the same number.
>
> So what?
>
> We've already demonstrated cases where numbers that are unequal even
> by your standard are incorrectly sorted using your comparison
> function. (I tried 10000 elements and got some interesting results.)
>
> At the very least, my method is no worse than yours. If
> 1.25000000000000000001 and 1.25000000000000000000 are "the same
> number" then *it doesn't matter* how they compare. You add a
> multiplication, a subtraction, a call to fabs(), and a second
> comparison to each comparison, all for no benefit.
I will agree that your ordering is not worse, but I disagree that it is
better. I think that literally *every* operation performed against floating
point should be aware of the ball of fuzz at the end of the numbers and it
should recognize that this tiny bit of dirt is not really part of the
number.
> In the context of sorting, assuming that two very close numbers are
> different does no harm while saving considerable time and complexity.
And yet there is harm. For instance, you think that those two numbers are
different and that one is larger than the other. This is real harm. I
guess that the mistake is almost universal though.
I agree that there may be a real problem here and that it is conceivable
that under some circumstances it may be possible to produce a sort that does
not terminate.
>>I submit
>>that it will not interfere with correct operation of bsearch(), since
>>bsearch() should work in exactly the same way that qsort does for
>>comparisons.
>
> It works in the same way, but bsearch() assumes that the order of the
> array is consistent with the comparison function, and if we use your
> function it isn't.
1+epsilon is the same number as 1.
> Suppose we have x, x+e, and x+2e, x+3e, x+4e. Your comparison
> function may sort them as x+4e, x+3e, x+2e, x+e, x. Now you search
> for x. bsearch() will compare it with x+2e, and find it's less. It
> will then compare it with x+3e and x+4e, and conclude that it is not
> present in the array. It will give the wrong answer.
Assuming that this is the wrong answer is a mistake. If you are searching
for:
x+2e
and you find:
x+4e
then you found the number that you were searching for, if it is floating
point. If you are not happy with that answer then you should not use
floating point.
>>It returns an ordering. The standard never says that the ordering must
>>always be identical (and indeed, such an ordering would be impossible on
>>many implementations due to the nature of floating point).
>
> It says it must be a total order. Yours isn't.
Achieving total order is literally not possible, because floating point is
inexact. This statement is absolutely true.
That seems quite impossible. Could you elaborate ?
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
Currently it affects only long double. From the Intel documentation:
"Support for Intel® Itanium® Architecture ints and 80-bit Floats
Intel® Itanium® Architecture ints are supported with the __int64 or long
long data types. The Intel C++ Compiler for Windows does support 80-bit long
doubles when the -Qlong_double option is specified. See the Floating-point
Optimizations and Floating-point Options Quick Reference sections of the
User's Guide for more information."
From the Microsoft documentation:
"Flags
precise
The default.
Improves the consistency of floating-point tests for equality and
inequality by disabling optimizations that could change the precision of
floating-point calculations, which is required for strict ANSI conformance.
By default, the compiler uses the coprocessor's 80-bit registers to hold the
intermediate results of floating-point calculations. This increases program
speed and decreases program size. Because the calculation involves
floating-point data types that are represented in memory by less than 80
bits, however, carrying the extra bits of precision (80 bits minus the
number of bits in a smaller floating-point type) through a lengthy
calculation can produce inconsistent results."
If the CPU is performing calculations in 80 bit mode and an interrupt fires,
the floating point values may lose the trailing bits of precision. Hence,
the ordering you got one microsecond ago using the simple comparison may not
agree with the ordering you get right now.
In real life there are floating point systems where a*b != b*a if you
compare all of the bits of the result. You should not be at all surprised
if (7.0*6.0*5.0)/(7.0*6.0) does not compare equal to 5.0. If you would want
them to compare equal, you should not use a simple ordering. If you add a
tall column of non-integral floating point numbers forwards and backwards
you will get two different numbers. Everyone knows this and nobody is upset
by it. (1.0/7.0)*7.0 is not 1.0 exactly except by accident (or compiler
trickery if they can see the whole expression and do some constant folding).
I find it very odd that someone might look for the 27th number in a list and
get back an answer with machine epsilon difference from the number they
wanted and then be disturbed by that. That is the very nature of floating
point. If you are searching for 3 and find 2.9999999999999999 or
3.0000000000000000001 in floating point, then you have found your answer.
If you are disturbed by the answer then you should not be using floating
point.
An aside:
On OpenVMS, the file system is called RMS for Records Management System. It
is not possible to create a key that has a floating point value in it (the
operating system does not allow it) though literally any other data type is
fine. Quite likely, the reason for this decision has to do with the nature
of floating point comparisons.
>> Actually, the standard doesn't explicitly say it must be a total
>> order. IMHO it should say so.
>The reason it does not say so
It does, see other articles.
>is that it is literally impossible to enforce this.
You think the description of the qsort() comparison function was written
with recent floating point hardware in mind?
-- Richard
--
:wq
>> Suppose we have x, x+e, and x+2e, x+3e, x+4e. Your comparison
>> function may sort them as x+4e, x+3e, x+2e, x+e, x. Now you search
>> for x. bsearch() will compare it with x+2e, and find it's less. It
>> will then compare it with x+3e and x+4e, and conclude that it is not
>> present in the array. It will give the wrong answer.
>Assuming that this is the wrong answer is a mistake. If you are searching
>for:
>x+2e
>and you find:
>x+4e
>then you found the number that you were searching for, if it is floating
>point. If you are not happy with that answer then you should not use
>floating point.
Read more carefully. It doesn't find the wrong entry. It wrongly
says there is no such value in the array. According to you, there are
several, but it doesn't find *any* of them.
>> It says it must be a total order. Yours isn't.
>Achieving total order is literally not possible, because floating point is
>inexact. This statement is absolutely true.
Then you'd better not use floating point with qsort().
-- Richard
--
:wq
This has nothing to do with qsort as such.
Suppose for the moment that the standard dictates that qsort() should
only be used with a comparison that returns a total order.
Now suppose that a compiler exists that has a qsort() function which
doesn't require that the order be total. How is that non-conforming ?
I assume you mean that compilers exist for which the 'a < b' comparison
for some datatypes does not give a total ordering. But that, in itself,
does not make them non-conforming, at least not as far as this clause
about qsort() is concerned.
In any case, if you call qsort() with a comparison function where it
doesn't hold that a=b and b=c implies a=c, then bad things might happen.
Who knows, it might run into an endless loop or something.
Well... Richards code really is not at all difficult - very much like
the C++ pimpl idiom. So information CAN be hidden in C. Just not as
elegantly as in C++, as I'm sure we all agree.
I remember in my old days writing classes with virutal functions in C.
I thought it was so elegant. Not so anymore - but you can do a lot in
C if you want to.
/Peter
What I am saying is that with 80 bit floating point it is possible that
comparisons will not be consistent. That implies that total ordering is not
possible to enforce. If the standard demands total ordering, then qsort()
cannot be called with a long double vector on those systems without creating
undefined behavior.
That is also a perfectly acceptable and expected outcome for a floating
point vector. If you are expecting to find a non-integral value, then you
should not be surprised to be disappointed. In fact, finding it will be a
little surprising.
>>> It says it must be a total order. Yours isn't.
>
>>Achieving total order is literally not possible, because floating point is
>>inexact. This statement is absolutely true.
>
> Then you'd better not use floating point with qsort().
Or with any other operation that involves comparison.
Then sorting a floating point vector may result in undefined behavior.
>>is that it is literally impossible to enforce this.
>
> You think the description of the qsort() comparison function was written
> with recent floating point hardware in mind?
I do not think the data type was considered for a fundamental algorithm. At
least, it should not have been.