std::sort causes segfault when sorting class arrays

Philip Pemberton

unread,

Feb 28, 2009, 5:07:11 PM2/28/09

to

Hi,
I'm trying to write some image processing software, and as part of that I
need to sort an array of pixel values by their luminance...

I'm using std::sort to do this, which works OK.. as long as no two pixels
in the array have the same luminance value. In that case.... it
segfaults. Here's my code:

////// CODE BEGINS

// gcc -o test test.cpp
#include <algorithm>

using namespace std;

class Pixel {
public:
double r, g, b;

Pixel(double _r=0, double _g=0, double _b=0) {
r = _r;
g = _g;
b = _b;
}

// Return the CIE luminance of this pixel
double get_luminance() const {
return (r * 255.0 * 0.212671) + (g * 255.0 *
0.715160) + (b * 255.0 * 0.072169);
}

bool operator<(const Pixel &rhs) const { return
(get_luminance() < rhs.get_luminance()); }
};

int main(int argc, char **argv)
{
for (int gx=0; gx<1000; gx++) {
Pixel pixels[15 * 15];
unsigned int pixel_count = 0;

for (int ym=0; ym<15; ym++) {
for (int xm=0; xm<15; xm++) {
#ifndef MAKE_PIXELS_THE_SAME
pixels[pixel_count].r = (xm*15)+ym;
pixels[pixel_count].g = (xm*15)+ym;
pixels[pixel_count].b = (xm*15)+ym;
#else
pixels[pixel_count].r = 0.12;
pixels[pixel_count].g = 0.93;
pixels[pixel_count].b = 0.32;
#endif
pixel_count++;
}
}

sort(pixels, pixels+pixel_count);
}

return 0;
}

////// CODE ENDS

The segfault always happens on the call to sort().

I know I'm probably missing something blindingly obvious here, but I've
spent an hour hacking away at this and still can't see what I'm doing
wrong...

Can anyone help me figure out why my code isn't working?
(this is only the second time I've used sort(), and the first time I've
used it on an array, but my C++ book says it is possible to do the
latter... is my book wrong?)

Thanks,
Phil.

fedor.c...@gmail.com

unread,

Feb 28, 2009, 5:20:37 PM2/28/09

to

sort(pixels, pixels+pixel_count*sizeof(Pixel));

Kai-Uwe Bux

unread,

Feb 28, 2009, 5:48:04 PM2/28/09

to

fedor.c...@gmail.com wrote:

No, that would try to access elements beyond the bounds of the array.

To the OP: I cannot reproduce the segfault; and I cannot find an obvious
flaw in the code. Are you sure the flaw is within the snippet you posted?
Maybe, you trimmed it down too much?

Best

Kai-Uwe BUx

Philip Pemberton

unread,

Feb 28, 2009, 6:06:44 PM2/28/09

to

On Sat, 28 Feb 2009 17:48:04 -0500, Kai-Uwe Bux wrote:
> To the OP: I cannot reproduce the segfault; and I cannot find an obvious
> flaw in the code. Are you sure the flaw is within the snippet you
> posted? Maybe, you trimmed it down too much?

Nope -- I've checked on my machine and it segfaults repeatably... :(

I rewrote it to use a vector<Pixel *> instead, which works fine, but is
slower than I'd have hoped (probably because of all the Pixel objects
being created).

This is on an x86 laptop, running Ubuntu 8.10 with the latest security
patches. gcc version is 4.3.2 (Ubuntu 4.3.2-1ubuntu12).

Thanks,
Phil.

Kai-Uwe Bux

unread,

Feb 28, 2009, 6:14:05 PM2/28/09

to

Philip Pemberton wrote:

> On Sat, 28 Feb 2009 17:48:04 -0500, Kai-Uwe Bux wrote:
>> To the OP: I cannot reproduce the segfault; and I cannot find an obvious
>> flaw in the code. Are you sure the flaw is within the snippet you
>> posted? Maybe, you trimmed it down too much?
>
> Nope -- I've checked on my machine and it segfaults repeatably... :(
>
> I rewrote it to use a vector<Pixel *> instead, which works fine, but is
> slower than I'd have hoped (probably because of all the Pixel objects
> being created).

And what happens if you use vector<Pixel> instead? what happens with other
containers?

Also, what does your comparison functor look like? Maybe, there are some
subtle differences there.

> This is on an x86 laptop, running Ubuntu 8.10 with the latest security
> patches. gcc version is 4.3.2 (Ubuntu 4.3.2-1ubuntu12).

Hm, I'm running gcc 4.3.1 on an x86 desktop.

Best

Kai-Uwe Bux

Thomas J. Gritzan

unread,

Feb 28, 2009, 8:08:17 PM2/28/09

to

Philip Pemberton schrieb:

> Hi,
> I'm trying to write some image processing software, and as part of that I
> need to sort an array of pixel values by their luminance...
>
> I'm using std::sort to do this, which works OK.. as long as no two pixels
> in the array have the same luminance value. In that case.... it
> segfaults. Here's my code:

[...]

> class Pixel {
> public:
> double r, g, b;

[...]

> // Return the CIE luminance of this pixel
> double get_luminance() const {
> return (r * 255.0 * 0.212671) + (g * 255.0 *
> 0.715160) + (b * 255.0 * 0.072169);
> }
>
> bool operator<(const Pixel &rhs) const { return
> (get_luminance() < rhs.get_luminance()); }
> };

[...]
> sort(pixels, pixels+pixel_count);
> }

std::sort requires a strikt weak ordering as comparision function. AFAIK
your operator< is a strikt weak ordering, but only as long as
get_luminance() calculates the same value for the same pixel every time.

The problem with floating point is that you might get two different
results if you calculate the same equation twice. The reason is that
some implementations use a higher precision when calculating than fits
in a double.

To avoid this problem, you could use a compiler specific parameter (like
-ffloat-store for GCC, please refer to the GCC manual).

A (more robust) C++ language solution would be to calculate the
luminance value for every pixel once, store it somewhere, and use this
for sorting.
You could sort an index like std::pair< luminance, Pixel* > and use it
to bring your pixels in the right order, or cache the luminance inside
the Pixel class.

--
Thomas

Juha Nieminen

unread,

Mar 1, 2009, 4:32:26 AM3/1/09

to

Thomas J. Gritzan wrote:
> The problem with floating point is that you might get two different
> results if you calculate the same equation twice.

I'm sorry, but this sounds like BS to me.

Care to give an example where the exact *same* equation (using the
exact same, unmodified variables) can give *different* results from
evaluation to evaluation?

Also, care to explain the process by which this happens?

The only way this would happen is if the FPU hardware (or whichever
chip is doing the FP calculations) deliberately gives random results
each time it performs the exact same calculations. Not that this
wouldn't be conceivable (eg. random rounding of the last bit), but I
have never heard of any hardware doing that. Would it even conform to
the related IEEE standard?

A third thing I would like to hear is how a broken comparison can
cause std::sort to segfault, rather than simply causing the array to not
to be properly sorted. I can't think of any sorting algorithm which
would start indexing out of boundaries even if the comparisons return
invalid results.

Kai-Uwe Bux

unread,

Mar 1, 2009, 5:16:05 AM3/1/09

to

Juha Nieminen wrote:

> Thomas J. Gritzan wrote:
>> The problem with floating point is that you might get two different
>> results if you calculate the same equation twice.
>
> I'm sorry, but this sounds like BS to me.

That might be because you snipped the context.

To restore the context, let me reproduce the operator< in question:

double get_luminance() const {
return (r * 255.0 * 0.212671)
+ (g * 255.0 * 0.715160)
+ (b * 255.0 * 0.072169);
}

bool operator<(const Pixel &rhs) const {
return (get_luminance() < rhs.get_luminance());
}

This is implicitly used in a call to sort().

What is in question is whether this comparison predicate will define a
strict weak order.

> Care to give an example where the exact *same* equation (using the
> exact same, unmodified variables) can give *different* results from
> evaluation to evaluation?

Well, I just had this program:

#include <iostream>
#include <iomanip>
#include <cmath>

using namespace std;

int main ( void ) {
double x = 1;
double y = 1;
cout
<< boolalpha
<< ( sin(x)+cos(y) == sin(x)+cos(y) ) << '\n';
}

and the output was "false".

> Also, care to explain the process by which this happens?

The program above prints "false" on my machine because of clause [5/10].
Whether the two subexpressions "sin(x)+cos(y)" qualify as the "exact *same*
equation" I don't know. But something like this _could_ cause the operator<
to yield inconsistent answers.

> The only way this would happen is if the FPU hardware (or whichever
> chip is doing the FP calculations) deliberately gives random results
> each time it performs the exact same calculations. Not that this
> wouldn't be conceivable (eg. random rounding of the last bit), but I
> have never heard of any hardware doing that. Would it even conform to
> the related IEEE standard?

Rounding of last bits can occur between registers in the CPU and floating
point objects in memory. A write-reread can therefore change the value.
Clause [5/10] gives a lot of lenience in this way.

> A third thing I would like to hear is how a broken comparison can
> cause std::sort to segfault, rather than simply causing the array to not
> to be properly sorted. I can't think of any sorting algorithm which
> would start indexing out of boundaries even if the comparisons return
> invalid results.

Hm, I can think of partition exchange implementations that would use the
pivot as a sentinel value to find the left-most index for which

*left < pivot

is false and the right-most index for which

pivot < *right

is false. One would expect that the two expressions _must_ become false
eventually since the pivot is part of the sequence. But if operator< is
broken, then left could be incremented beyond the bound of the sequence.

Let's see what the implementation on my machine does:

#include <vector>
#include <algorithm>

struct A {

bool operator< ( A const & rhs ) const {
return true;
}

};

int main ( void ) {
std::vector< A > av;
for ( int n = 0; n < 32; ++n ) {
av.push_back( A() );
}
std::sort( av.begin(), av.end() );
}

This actually seems to run forever, but it does not segfault.

Best

Kai-Uwe Bux

Juha Nieminen

unread,

Mar 1, 2009, 2:26:07 PM3/1/09

to

Kai-Uwe Bux wrote:
> Well, I just had this program:
>
> #include <iostream>
> #include <iomanip>
> #include <cmath>
>
> using namespace std;
>
> int main ( void ) {
> double x = 1;
> double y = 1;
> cout
> << boolalpha
> << ( sin(x)+cos(y) == sin(x)+cos(y) ) << '\n';
> }
>
> and the output was "false".

I honestly can't understand what's happening there. Even if it's
changed to this:

#include <iostream>
#include <iomanip>
#include <cmath>

double f(double x, double y)
{
return std::sin(x) + std::cos(y);
}

int main ( void ) {
double x = 1;
double y = 1;

std::cout << std::boolalpha
<< ( f(x, y) == f(x, y) ) << std::endl;
}

it still prints false. OTOH it does so only when compiling without
optimizations. With optimizations it prints true.

Compiling to asm (without optimizations) and examining the result is
only more puzzling. The f() function appears, quite naturally, only once
in the result, and it's called twice from main() with the *exact same
code* in both cases. I can't see any difference whatsoever between the
two calls. The exact same asm instructions are used to put the two
variables into the FPU and calling the one and same f() function. Then
an FPU comparison opcode is executed.

In other words, the exact same, bit-by-bit completely identical
machine code is executed twice, and then the results compared, and the
results are unequal? I can't understand how that is even physically
possible, unless the FPU is rounding differently at different times,
based on some randomness or side-effects of previous calls. I have never
heard the intel FPU doing that.

So I suppose I must admit that I was wrong. The exact same code can
give different results from execution to execution, although it's still
a complete mystery to me how that is even physically possible (at least
in a machine that is supposed to be deterministic). One would assume
this is something you definitely don't want to happen, ie. a function
which should not have side-effects having side-effects nevertheless, and
returning different values from identical parameters at different times.
One could assume this would break a lot of programs (and programming
languages) out there which assume that functions without side-effects
always return the same value for the same parameters.

Victor Bazarov

unread,

Mar 1, 2009, 3:00:25 PM3/1/09

to

> it still prints false. [..]

I don't know what you're talking about, guys. I just took your
code, plugged it into my test project, compiled, ran, and got
'true'. Changed FP settings, changed to optimized, same thing.
Maybe Visual C++ 2008 is wrong somehow, and I am supposed to get
'false'?

What compiler, what hardware? Can you post the assembly?

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask

Juha Nieminen

unread,

Mar 1, 2009, 3:36:49 PM3/1/09

to

Victor Bazarov wrote:
> I don't know what you're talking about, guys. I just took your
> code, plugged it into my test project, compiled, ran, and got
> 'true'. Changed FP settings, changed to optimized, same thing.
> Maybe Visual C++ 2008 is wrong somehow, and I am supposed to get
> 'false'?

Using gcc 4.3.1 on a Pentium4 (running linux) here. Without
optimizations it prints false, with optimizations it prints true. I
wouldn't have believed it if I didn't test it myself.

> What compiler, what hardware? Can you post the assembly?

The unoptimized asm is rather verbose, but here it is:

.file "test.cc"
.section
.text._ZStorSt13_Ios_FmtflagsS_,"axG",@progbits,_ZStorSt13_Ios_FmtflagsS_,comdat
.weak _ZStorSt13_Ios_FmtflagsS_
.type _ZStorSt13_Ios_FmtflagsS_, @function
_ZStorSt13_Ios_FmtflagsS_:
.LFB567:
pushl %ebp
.LCFI0:
movl %esp, %ebp
.LCFI1:
movl 8(%ebp), %edx
movl 12(%ebp), %eax
orl %edx, %eax
popl %ebp
ret
.LFE567:
.size _ZStorSt13_Ios_FmtflagsS_, .-_ZStorSt13_Ios_FmtflagsS_
.section
.text._ZStoRRSt13_Ios_FmtflagsS_,"axG",@progbits,_ZStoRRSt13_Ios_FmtflagsS_,comdat
.weak _ZStoRRSt13_Ios_FmtflagsS_
.type _ZStoRRSt13_Ios_FmtflagsS_, @function
_ZStoRRSt13_Ios_FmtflagsS_:
.LFB569:
pushl %ebp
.LCFI2:
movl %esp, %ebp
.LCFI3:
subl $8, %esp
.LCFI4:
movl 8(%ebp), %eax
movl (%eax), %edx
movl 12(%ebp), %eax
movl %eax, 4(%esp)
movl %edx, (%esp)
call _ZStorSt13_Ios_FmtflagsS_
movl %eax, %edx
movl 8(%ebp), %eax
movl %edx, (%eax)
movl 8(%ebp), %eax
leave
ret
.LFE569:
.size _ZStoRRSt13_Ios_FmtflagsS_, .-_ZStoRRSt13_Ios_FmtflagsS_
.section
.text._ZNSt8ios_base4setfESt13_Ios_Fmtflags,"axG",@progbits,_ZNSt8ios_base4setfESt13_Ios_Fmtflags,comdat
.align 2
.weak _ZNSt8ios_base4setfESt13_Ios_Fmtflags
.type _ZNSt8ios_base4setfESt13_Ios_Fmtflags, @function
_ZNSt8ios_base4setfESt13_Ios_Fmtflags:
.LFB597:
pushl %ebp
.LCFI5:
movl %esp, %ebp
.LCFI6:
subl $24, %esp
.LCFI7:
movl 8(%ebp), %eax
movl 12(%eax), %eax
movl %eax, -4(%ebp)
movl 8(%ebp), %eax
leal 12(%eax), %edx
movl 12(%ebp), %eax
movl %eax, 4(%esp)
movl %edx, (%esp)
call _ZStoRRSt13_Ios_FmtflagsS_
movl -4(%ebp), %eax
leave
ret
.LFE597:
.size _ZNSt8ios_base4setfESt13_Ios_Fmtflags,
.-_ZNSt8ios_base4setfESt13_Ios_Fmtflags
.section
.text._ZSt9boolalphaRSt8ios_base,"axG",@progbits,_ZSt9boolalphaRSt8ios_base,comdat
.weak _ZSt9boolalphaRSt8ios_base
.type _ZSt9boolalphaRSt8ios_base, @function
_ZSt9boolalphaRSt8ios_base:
.LFB608:
pushl %ebp
.LCFI8:
movl %esp, %ebp
.LCFI9:
subl $8, %esp
.LCFI10:
movl $1, 4(%esp)
movl 8(%ebp), %eax
movl %eax, (%esp)
call _ZNSt8ios_base4setfESt13_Ios_Fmtflags
movl 8(%ebp), %eax
leave
ret
.LFE608:
.size _ZSt9boolalphaRSt8ios_base, .-_ZSt9boolalphaRSt8ios_base
.text
.type _Z41__static_initialization_and_destruction_0ii, @function
_Z41__static_initialization_and_destruction_0ii:
.LFB1062:
pushl %ebp
.LCFI11:
movl %esp, %ebp
.LCFI12:
subl $24, %esp
.LCFI13:
cmpl $1, 8(%ebp)
jne .L11
cmpl $65535, 12(%ebp)
jne .L11
movl $_ZStL8__ioinit, (%esp)
call _ZNSt8ios_base4InitC1Ev
movl $_ZNSt8ios_base4InitD1Ev, %eax
movl $__dso_handle, 8(%esp)
movl $_ZStL8__ioinit, 4(%esp)
movl %eax, (%esp)
call __cxa_atexit
.L11:
leave
ret
.LFE1062:
.size _Z41__static_initialization_and_destruction_0ii,
.-_Z41__static_initialization_and_destruction_0ii
.type _GLOBAL__I__Z1fdd, @function
_GLOBAL__I__Z1fdd:
.LFB1063:
pushl %ebp
.LCFI14:
movl %esp, %ebp
.LCFI15:
subl $8, %esp
.LCFI16:
movl $65535, 4(%esp)
movl $1, (%esp)
call _Z41__static_initialization_and_destruction_0ii
leave
ret
.LFE1063:
.size _GLOBAL__I__Z1fdd, .-_GLOBAL__I__Z1fdd
.section .ctors,"aw",@progbits
.align 4
.long _GLOBAL__I__Z1fdd
.text
.globl _Z1fdd
.type _Z1fdd, @function
_Z1fdd:
.LFB1053:
pushl %ebp
.LCFI17:
movl %esp, %ebp
.LCFI18:
subl $40, %esp
.LCFI19:
movl 8(%ebp), %eax
movl %eax, -8(%ebp)
movl 12(%ebp), %eax
movl %eax, -4(%ebp)
movl 16(%ebp), %eax
movl %eax, -16(%ebp)
movl 20(%ebp), %eax
movl %eax, -12(%ebp)
fldl -8(%ebp)
fstpl (%esp)
call sin
fstpl -24(%ebp)
fldl -16(%ebp)
fstpl (%esp)
call cos
faddl -24(%ebp)
leave
ret
.LFE1053:
.size _Z1fdd, .-_Z1fdd
.globl main
.type main, @function
main:
.LFB1054:
leal 4(%esp), %ecx
.LCFI20:
andl $-16, %esp
pushl -4(%ecx)
.LCFI21:
pushl %ebp
.LCFI22:
movl %esp, %ebp
.LCFI23:
pushl %ebx
.LCFI24:
pushl %ecx
.LCFI25:
subl $48, %esp
.LCFI26:
fld1
fstpl -24(%ebp)
fld1
fstpl -16(%ebp)
fldl -16(%ebp)
fstpl 8(%esp)
fldl -24(%ebp)
fstpl (%esp)
call _Z1fdd
fstpl -32(%ebp)
fldl -16(%ebp)
fstpl 8(%esp)
fldl -24(%ebp)
fstpl (%esp)
call _Z1fdd
fldl -32(%ebp)
fucompp
fnstsw %ax
sahf
sete %al
setnp %dl
andl %edx, %eax
movzbl %al, %ebx
movl $_ZSt9boolalphaRSt8ios_base, 4(%esp)
movl $_ZSt4cout, (%esp)
call _ZNSolsEPFRSt8ios_baseS0_E
movl %ebx, 4(%esp)
movl %eax, (%esp)
call _ZNSolsEb
movl $_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_, 4(%esp)
movl %eax, (%esp)
call _ZNSolsEPFRSoS_E
movl $0, %eax
addl $48, %esp
popl %ecx
popl %ebx
popl %ebp
leal -4(%ecx), %esp
ret
.LFE1054:
.size main, .-main
.local _ZStL8__ioinit
.comm _ZStL8__ioinit,1,1
.weakref _ZL20__gthrw_pthread_oncePiPFvvE,pthread_once
.weakref _ZL27__gthrw_pthread_getspecificj,pthread_getspecific
.weakref _ZL27__gthrw_pthread_setspecificjPKv,pthread_setspecific
.weakref
_ZL22__gthrw_pthread_createPmPK14pthread_attr_tPFPvS3_ES3_,pthread_create
.weakref _ZL22__gthrw_pthread_cancelm,pthread_cancel
.weakref
_ZL26__gthrw_pthread_mutex_lockP15pthread_mutex_t,pthread_mutex_lock
.weakref
_ZL29__gthrw_pthread_mutex_trylockP15pthread_mutex_t,pthread_mutex_trylock
.weakref
_ZL28__gthrw_pthread_mutex_unlockP15pthread_mutex_t,pthread_mutex_unlock
.weakref
_ZL26__gthrw_pthread_mutex_initP15pthread_mutex_tPK19pthread_mutexattr_t,pthread_mutex_init
.weakref
_ZL30__gthrw_pthread_cond_broadcastP14pthread_cond_t,pthread_cond_broadcast
.weakref
_ZL25__gthrw_pthread_cond_waitP14pthread_cond_tP15pthread_mutex_t,pthread_cond_wait
.weakref _ZL26__gthrw_pthread_key_createPjPFvPvE,pthread_key_create
.weakref _ZL26__gthrw_pthread_key_deletej,pthread_key_delete
.weakref
_ZL30__gthrw_pthread_mutexattr_initP19pthread_mutexattr_t,pthread_mutexattr_init
.weakref
_ZL33__gthrw_pthread_mutexattr_settypeP19pthread_mutexattr_ti,pthread_mutexattr_settype
.weakref
_ZL33__gthrw_pthread_mutexattr_destroyP19pthread_mutexattr_t,pthread_mutexattr_destroy
.section .eh_frame,"a",@progbits
.Lframe1:
.long .LECIE1-.LSCIE1
.LSCIE1:
.long 0x0
.byte 0x1
.globl __gxx_personality_v0
.string "zP"
.uleb128 0x1
.sleb128 -4
.byte 0x8
.uleb128 0x5
.byte 0x0
.long __gxx_personality_v0
.byte 0xc
.uleb128 0x4
.uleb128 0x4
.byte 0x88
.uleb128 0x1
.align 4
.LECIE1:
.LSFDE9:
.long .LEFDE9-.LASFDE9
.LASFDE9:
.long .LASFDE9-.Lframe1
.long .LFB1062
.long .LFE1062-.LFB1062
.uleb128 0x0
.byte 0x4
.long .LCFI11-.LFB1062
.byte 0xe
.uleb128 0x8
.byte 0x85
.uleb128 0x2
.byte 0x4
.long .LCFI12-.LCFI11
.byte 0xd
.uleb128 0x5
.align 4
.LEFDE9:
.LSFDE15:
.long .LEFDE15-.LASFDE15
.LASFDE15:
.long .LASFDE15-.Lframe1
.long .LFB1054
.long .LFE1054-.LFB1054
.uleb128 0x0
.byte 0x4
.long .LCFI20-.LFB1054
.byte 0xc
.uleb128 0x1
.uleb128 0x0
.byte 0x9
.uleb128 0x4
.uleb128 0x1
.byte 0x4
.long .LCFI21-.LCFI20
.byte 0xc
.uleb128 0x4
.uleb128 0x4
.byte 0x4
.long .LCFI22-.LCFI21
.byte 0xe
.uleb128 0x8
.byte 0x85
.uleb128 0x2
.byte 0x4
.long .LCFI23-.LCFI22
.byte 0xd
.uleb128 0x5
.byte 0x4
.long .LCFI25-.LCFI23
.byte 0x84
.uleb128 0x4
.byte 0x83
.uleb128 0x3
.align 4
.LEFDE15:
.ident "GCC: (SUSE Linux) 4.3.1 20080507 (prerelease) [gcc-4_3-branch
revision 135036]"
.section .note.GNU-stack,"",@progbits

Thomas J. Gritzan

unread,

Mar 1, 2009, 4:06:24 PM3/1/09

to

Sorry for being off-topic, but this is directly related to the C++
problem of this thread.

Juha Nieminen wrote:
> Victor Bazarov wrote:
>> I don't know what you're talking about, guys. I just took your
>> code, plugged it into my test project, compiled, ran, and got
>> 'true'. Changed FP settings, changed to optimized, same thing.
>> Maybe Visual C++ 2008 is wrong somehow, and I am supposed to get
>> 'false'?
>
> Using gcc 4.3.1 on a Pentium4 (running linux) here. Without
> optimizations it prints false, with optimizations it prints true. I
> wouldn't have believed it if I didn't test it myself.
>
>> What compiler, what hardware? Can you post the assembly?
>
> The unoptimized asm is rather verbose, but here it is:

You don't need to post all the iostream stuff. Cutting down to essential
parts:

in main():

> fld1
> fstpl -24(%ebp)
> fld1
> fstpl -16(%ebp)
> fldl -16(%ebp)
> fstpl 8(%esp)
> fldl -24(%ebp)
> fstpl (%esp)

The above puts x and y onto the stack.

> call _Z1fdd
> fstpl -32(%ebp)

This calls f(x,y) and stores the result into a temporary.

> fldl -16(%ebp)
> fstpl 8(%esp)
> fldl -24(%ebp)
> fstpl (%esp)
> call _Z1fdd

Again loading x and y and calling f(x,y).

> fldl -32(%ebp)

This loads the first result back from the temporary, while the result of
the second call to f is still loaded.

> fucompp

Compares the result of the first call to the result of the second call.

So whats the difference between the two results? The first result was
stored in a temporary, the second result remained in the floating point
unit.

The x86 floating point co-processor operates on 80 bit floating point
values. Temporaries stored in memory are in 64 bit double precision
format. So by storing the first result into a temporary, it was
truncated and rounded to 64 bit, and by loading it back, it was extended
to 80 bit again. The second result still is 80 bit when the two are
compared.

Sounds like BS to you, Juha? ;-)

That's why your teacher told you never to compare two floating point
values using equality.

--
Thomas

Juha Nieminen

unread,

Mar 1, 2009, 4:47:09 PM3/1/09

to

Thomas J. Gritzan wrote:
> Sounds like BS to you, Juha? ;-)

Sounds like a compiler bug to me. Especially the fact that compiling
with optimizations results in different behavior than without
optimizations is rather telling.

On the other hand, I suppose this is, once again, one of those things
where compilers are free to do whatever they want, so even if one call
to f() results in -2.1 and a second call results in pi, that would still
be "allowed". After all, does the standard impose any requirements on
the correctness of floating point calculations?

I actually find it rather ironic that it gives the incorrect answer
without optimizations and the correct answer with optimizations. One
would think that, if it makes any difference, it would be the optimized
version which cuts corners at the cost of accuracy, and the unoptimized
version would do all steps in the same way (because speed is not, after
all, a concern).

Philip Pemberton

unread,

Mar 1, 2009, 4:51:30 PM3/1/09

to

On Sat, 28 Feb 2009 18:14:05 -0500, Kai-Uwe Bux wrote:

> And what happens if you use vector<Pixel> instead? what happens with
> other containers?

Segfault. :(

> Also, what does your comparison functor look like? Maybe, there are some
> subtle differences there.

There isn't a comparison functor. std::sort is using the "operator<"
overloaded operator in the Pixel class.

I did try refactoring the code to use a functor instead, and got the same
result (segfault).

Thanks,
Phil.

Kai-Uwe Bux

unread,

Mar 1, 2009, 4:56:45 PM3/1/09

to

Thomas J. Gritzan wrote:

> Sorry for being off-topic, but this is directly related to the C++
> problem of this thread.
>
> Juha Nieminen wrote:
>> Victor Bazarov wrote:
>>> I don't know what you're talking about, guys. I just took your
>>> code, plugged it into my test project, compiled, ran, and got
>>> 'true'. Changed FP settings, changed to optimized, same thing.
>>> Maybe Visual C++ 2008 is wrong somehow, and I am supposed to get
>>> 'false'?

No, the outcome is implementation defined. Arguably, your implementation has
the _better_ result.

Great explanation! Thanks.

> So whats the difference between the two results? The first result was
> stored in a temporary, the second result remained in the floating point
> unit.
>
> The x86 floating point co-processor operates on 80 bit floating point
> values. Temporaries stored in memory are in 64 bit double precision
> format. So by storing the first result into a temporary, it was
> truncated and rounded to 64 bit, and by loading it back, it was extended
> to 80 bit again. The second result still is 80 bit when the two are
> compared.

Yup, and that's ok with the standard by [5/10].

> Sounds like BS to you, Juha? ;-)
>
> That's why your teacher told you never to compare two floating point
> values using equality.

Actually, _that_ is not the reason I heard. I thaught equality is very
likely not what you want anyway because of what the floating point numbers
in your program represent.

However, it is true that the excess precision in registers can screw up
comparisons. But it can be prevented forcing a write-reread cycle. The
program

#include <iostream>
#include <iomanip>
#include <cmath>

using namespace std;

bool equal ( double const & lhs, double const & rhs ) {
return ( lhs == rhs );
}

int main ( void ) {
double x = 1;
double y = 1;
cout
<< boolalpha

<< equal( sin(x)+cos(y), sin(x)+cos(y) ) << '\n';
}

prints true. I am not certain whether the standard requires that or not, but
since the OP also uses gcc on an intel platform, the example pertains to
the behavior he observes.

BTW:

#include <iostream>
#include <iomanip>
#include <cmath>

#include <functional>

using namespace std;

int main ( void ) {
double x = 1;
double y = 1;
cout
<< boolalpha

<< std::equal_to< double >()( sin(x)+cos(y), sin(x)+cos(y) ) << '\n';
}

prints "true", as well: the std::equal_to() also forces a write-reread
cycle. The OP therefore could test whether this is the problem by changing
the operator< to:

bool operator<(const Pixel &rhs) const {

return std::less< double >()( get_luminance(), rhs.get_luminance() );
}

Best

Kai-Uwe Bux

unread,

Mar 1, 2009, 5:04:45 PM3/1/09

to

Philip Pemberton wrote:

> On Sat, 28 Feb 2009 18:14:05 -0500, Kai-Uwe Bux wrote:
>
>> And what happens if you use vector<Pixel> instead? what happens with
>> other containers?
>
> Segfault. :(
>
>> Also, what does your comparison functor look like? Maybe, there are some
>> subtle differences there.
>
> There isn't a comparison functor. std::sort is using the "operator<"
> overloaded operator in the Pixel class.

Huh? If you are using std::vector< Pixel* >, then there _must_ be a
comparison functor since otherwise you are using

std::less< Pixel* >

which will compare the addresses of the Pixels and not their luminations.
You would the functor to forward the comparison from the pointer to the
pointee.

> I did try refactoring the code to use a functor instead, and got the same
> result (segfault).

Out of curiosity, I suggested elsethread the following implementation for
Pixel::operator<:

bool operator<(const Pixel &rhs) const {

return std::less< double >()( get_luminance(), rhs.get_luminance() );
}

Could you try that?

Best

Kai-Uwe Bux

James Kanze

unread,

Mar 2, 2009, 6:10:35 AM3/2/09

to

On Mar 1, 10:32 am, Juha Nieminen <nos...@thanks.invalid> wrote:
> Thomas J. Gritzan wrote:
> > The problem with floating point is that you might get two
> > different results if you calculate the same equation twice.

> I'm sorry, but this sounds like BS to me.

Regretfully, it's not.

> Care to give an example where the exact *same* equation (using
> the exact same, unmodified variables) can give *different*
> results from evaluation to evaluation?

It depends on the machine. The standard explicitly allows
expressions to be evaluated with greater precision than
requested, and many compilers for Intel processors will
systematically use long double for all intermediate results.

The standard does require that if the value is assigned to a
double, or used to initialize a double, it should be truncated
to double. G++ (and probably other compilers as well) ignores
this, at least by default---if I'm not mistaken, it returns
doubles in a floating point register, *without* truncating the
precision to double. So if you write something like:

f( a, b, c ) == f( a, b, c )

(where a, b and c are doubles, and f simply returns some
arbitrary expression), it's entirely possible that one of the
return values will be spilled to memory (and thus truncated to
double), and the other not. The results of the fucntion are the
same in both cases, but one has been truncated to 52 bits
precision, and the other maintains 64 bits precision. Unless
the results are exactly representable n 52 bits, the above won't
compare equal.

> Also, care to explain the process by which this happens?

> The only way this would happen is if the FPU hardware (or
> whichever chip is doing the FP calculations) deliberately
> gives random results each time it performs the exact same
> calculations. Not that this wouldn't be conceivable (eg.
> random rounding of the last bit), but I have never heard of
> any hardware doing that. Would it even conform to the related
> IEEE standard?

> A third thing I would like to hear is how a broken comparison
> can cause std::sort to segfault, rather than simply causing
> the array to not to be properly sorted. I can't think of any
> sorting algorithm which would start indexing out of boundaries
> even if the comparisons return invalid results.

Quick sort will.

--
James Kanze (GABI Software) email:james...@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

James Kanze

unread,

Mar 2, 2009, 6:20:32 AM3/2/09

to

> using namespace std;

Paragraph [5/10] only applies within expressions. When the
value is used to initialize an object (e.g. the return value),
the standard requires it to be truncated to the correct
precision. Some compilers don't do this, at least by default.

> > The only way this would happen is if the FPU hardware (or
> > whichever chip is doing the FP calculations) deliberately
> > gives random results each time it performs the exact same
> > calculations. Not that this wouldn't be conceivable (eg.
> > random rounding of the last bit), but I have never heard of
> > any hardware doing that. Would it even conform to the
> > related IEEE standard?

> Rounding of last bits can occur between registers in the CPU
> and floating point objects in memory. A write-reread can
> therefore change the value. Clause [5/10] gives a lot of
> lenience in this way.

But only within an expression. Technically, the return
statement initializes a variable of the given type, so the
excess precision must be lost at this point. (Formally, Juha's
suggestion concerning some random element in rounding is
conformant. Like Juha, I don't know of a machine where this
actually happens, however, and it wouldn't be IEEE conform. I'm
pretty sure that we're dealing with a lack of conformaty here,
however, and that extended precision is being extended to places
where it isn't allowed.)

> *left < pivot

> pivot < *right

> #include <vector>
> #include <algorithm>

Yes. But as I'm sure you know, this doesn't prove anything. It
all depends on the values, where they're situated in the array,
and some of the finer details of how the partition is actually
implemented. You might access outside array bounds; you might
run forever; you might return with the array incorrectly sorted,
or it might even work.

We've certainly seen examples here before where an incorrect
comparison algorithm resulted in core dumps.

James Kanze

unread,

Mar 2, 2009, 6:28:35 AM3/2/09

to

> >> using namespace std;

You're supposed to get true, at least if the floating point
hardware doesn't to any random rounding. (I think we all agree
that if you change the rounding mode of the FPU between the
calls, all bets are off.) This is a well known issue with g++
(I'd call it a bug, but it is intentional), which optimizes more
than is allowed by the standard.

Note that if you change the code to manually inline the
functions (and thus eliminate the initialization of the double
return value), the standard does allow the results not to
compare equal. Curiously enough, with g++, it's more likely to
work in the case where it's not guaranteed; unless the
expression requires enough temporaries to spill to memory,
everything will be done with 64 bits precision, and you'll get
the wrong results. The problem here is that the compiler cannot
see the register use in the function, so after the first call,
it must assume that all floating point registers are used, and
spill those results to memory. It then (incorrectly) compares
the 64 bit results with the truncated results.

On an Intel, and probably on other processors which do all
floating point arithmetic with extended precision by default.
On a Sparc (which doesn't have a 64 bit precision double), there
is no problem.

James Kanze

unread,

Mar 2, 2009, 6:35:51 AM3/2/09

to

On Mar 1, 10:47 pm, Juha Nieminen <nos...@thanks.invalid> wrote:
> Thomas J. Gritzan wrote:
> > Sounds like BS to you, Juha? ;-)

> Sounds like a compiler bug to me. Especially the fact that
> compiling with optimizations results in different behavior
> than without optimizations is rather telling.

> On the other hand, I suppose this is, once again, one of those
> things where compilers are free to do whatever they want, so
> even if one call to f() results in -2.1 and a second call
> results in pi, that would still be "allowed". After all, does
> the standard impose any requirements on the correctness of
> floating point calculations?

Actually, in this particular case, it is a case of g++ not being
conform. The problem is still present in general; Kai-Uwe cited
the passage which creates the problem.

> I actually find it rather ironic that it gives the incorrect
> answer without optimizations and the correct answer with
> optimizations. One would think that, if it makes any
> difference, it would be the optimized version which cuts
> corners at the cost of accuracy, and the unoptimized version
> would do all steps in the same way (because speed is not,
> after all, a concern).

The difference is probably due to the fact the with optimizing,
if the function is defined in the same translation unit, g++
will inline it, thus seeing that it has registers to spare, and
doesn't need to store the results of the first call in memory.
In other words, it cuts the same corner both times, resulting in
the same formally wrong value being used on both sides of the
equation. A case of two wrongs making a right.

Kai-Uwe Bux

unread,

Mar 2, 2009, 6:47:42 AM3/2/09

to

James Kanze wrote:

I don't see how to prove from the standard that the code should
yield "true". For

... << ( sin(x)+cos(y) == sin(x)+cos(y) ) ...

it is clear that [5/10] gives the permission to print "false". For

... << ( f(x, y) == f(x, y) ) ...

I am not sure. What I don't see is whether return value optimization may
allow the implementation to _not_ truncate precision.

Another observation:

... << std::equal_to< double >()( sin(x)+cos(y), sin(x)+cos(y) ) ...

prints "true". This would be in line with RVO being an issue since passing
values into functions is not subject to optimizations allowed by the
standard.

[snip]

Best

Kai-Uwe

James Kanze

unread,

Mar 2, 2009, 6:53:44 AM3/2/09

to

On Mar 1, 10:56 pm, Kai-Uwe Bux <jkherci...@gmx.net> wrote:
> Thomas J. Gritzan wrote:
> > Sorry for being off-topic, but this is directly related to
> > the C++ problem of this thread.

> > Juha Nieminen wrote:
> >> Victor Bazarov wrote:
> >>> I don't know what you're talking about, guys. I just took
> >>> your code, plugged it into my test project, compiled, ran,
> >>> and got 'true'. Changed FP settings, changed to
> >>> optimized, same thing. Maybe Visual C++ 2008 is wrong
> >>> somehow, and I am supposed to get 'false'?

> No, the outcome is implementation defined. Arguably, your
> implementation has the _better_ result.

The outcome is definitely not implementation defined. The
accuracy (and repeatability?) of expression evaluation is
unspecified.

> > The x86 floating point co-processor operates on 80 bit
> > floating point values. Temporaries stored in memory are in
> > 64 bit double precision format. So by storing the first
> > result into a temporary, it was truncated and rounded to 64
> > bit, and by loading it back, it was extended to 80 bit
> > again. The second result still is 80 bit when the two are
> > compared.

> Yup, and that's ok with the standard by [5/10].

Within an expression. As far as I know, the value with extended
precision *must* be rounded to double when it is used to
initialize a double, e.g. the return value.

In g+++, this is under control of the option -ffloat-store.
Regretfully, the default results in the non-conformant and
unexpected behavior, and the documentation of the option
suggests that it usually doesn't matter, where as it often does.

> > Sounds like BS to you, Juha? ;-)

> > That's why your teacher told you never to compare two
> > floating point values using equality.

> Actually, _that_ is not the reason I heard. I thaught equality
> is very likely not what you want anyway because of what the
> floating point numbers in your program represent.

Actually, I've yet to hear anyone who really knows floating
point make such a recommendation. (It certainly doesn't apply
here, because there is no comparison for equality in the
original code.) What I have heard is that to use machine
floating point, you need to understand it first. And that it
can have some rather unexpected properties; the fact that you
can't actually know the precision of intermediate values---that
a double isn't necessarily a double---is perhaps the most
surprising ones (and is, of course, not inherent in floating
point, but rather a particularity of one particular
implementation).

> However, it is true that the excess precision in registers can
> screw up comparisons. But it can be prevented forcing a
> write-reread cycle.

Not with g++, unless you specify -ffloat-store.

> The program

> #include <iostream>
> #include <iomanip>
> #include <cmath>

> using namespace std;

> bool equal ( double const & lhs, double const & rhs ) {
> return ( lhs == rhs );
> }

> int main ( void ) {
> double x = 1;
> double y = 1;
> cout
> << boolalpha
> << equal( sin(x)+cos(y), sin(x)+cos(y) ) << '\n';
> }

> prints true. I am not certain whether the standard requires
> that or not, but since the OP also uses gcc on an intel
> platform, the example pertains to the behavior he observes.

Note, however, that the above doesn't force a write-reread cycle
any more than the original, at least according to the standard.
(None of the writes or reads are "observable behavior".) It
would be interesting to look at the generated code under
optimization; I wouldn't be surprised if g++ drops both writes,
and that it works because neither value has been truncated to
double, not because both have been.

Kai-Uwe Bux

unread,

Mar 2, 2009, 7:13:37 AM3/2/09

to

James Kanze wrote:

> On Mar 1, 10:56 pm, Kai-Uwe Bux <jkherci...@gmx.net> wrote:
>> Thomas J. Gritzan wrote:

[snip]

>> > The x86 floating point co-processor operates on 80 bit
>> > floating point values. Temporaries stored in memory are in
>> > 64 bit double precision format. So by storing the first
>> > result into a temporary, it was truncated and rounded to 64
>> > bit, and by loading it back, it was extended to 80 bit
>> > again. The second result still is 80 bit when the two are
>> > compared.
>
>> Yup, and that's ok with the standard by [5/10].
>
> Within an expression. As far as I know, the value with extended
> precision *must* be rounded to double when it is used to
> initialize a double, e.g. the return value.

Which clause says that? As said elsethread, I am worried that RVO might
_allow_ the compiler to cut corners in some unexpected cases.

[snip]

>> However, it is true that the excess precision in registers can
>> screw up comparisons. But it can be prevented forcing a
>> write-reread cycle.
>
> Not with g++, unless you specify -ffloat-store.

I don't understand: are you implying that a write-reread cycle would not be
sufficient to truncate the values _or_ that with g++ one needs the
option -ffloat-store to force a write-reread cycle? You make it sound like
you contradict what I am saying whereas, in fact, you might not.

>> The program
>
>> #include <iostream>
>> #include <iomanip>
>> #include <cmath>
>
>> using namespace std;
>
>> bool equal ( double const & lhs, double const & rhs ) {
>> return ( lhs == rhs );
>> }
>
>> int main ( void ) {
>> double x = 1;
>> double y = 1;
>> cout
>> << boolalpha
>> << equal( sin(x)+cos(y), sin(x)+cos(y) ) << '\n';
>> }
>
>> prints true. I am not certain whether the standard requires
>> that or not, but since the OP also uses gcc on an intel
>> platform, the example pertains to the behavior he observes.
>
> Note, however, that the above doesn't force a write-reread cycle
> any more than the original, at least according to the standard.

The original, has

sin(x)+cos(y) == sin(x)+cos(y)

where the amended version has

equal( sin(x)+cos(y), sin(x)+cos(y) )

The difference is that equal() is a function call that requires
initialization of the parameters.

> (None of the writes or reads are "observable behavior".)

I don't think that that matters, and it is not what I meant to imply by
using the term write-reread cycle. What _does_ matter is that an object is
initialized under certain circumstances and that that initialization
requires truncation.

> It
> would be interesting to look at the generated code under
> optimization; I wouldn't be surprised if g++ drops both writes,
> and that it works because neither value has been truncated to
> double, not because both have been.

I had a look (before I posted), but unfortunately I do not understand
assembler all that well :-( So, it didn't really help.

Best

Kai-Uwe Bux

thari...@gmail.com

unread,

Mar 2, 2009, 7:30:31 AM3/2/09

to

Well this segfault always happens to me (in g++) when I incorrectly
implement the comparison method (i.e forgetting the strict weak
ordering and just returning result of equal (==) operator.

Why don't you compare the difference with a predefined zero value when
dealing with floating point values?

ex:-

#define __ZERO__ 0.00001

bool operator ()(const double left, const double right) const
{
return left - right < __ZERO__;
}

Juha Nieminen

unread,

Mar 2, 2009, 10:42:37 AM3/2/09

to

James Kanze wrote:
>> A third thing I would like to hear is how a broken comparison
>> can cause std::sort to segfault, rather than simply causing
>> the array to not to be properly sorted. I can't think of any
>> sorting algorithm which would start indexing out of boundaries
>> even if the comparisons return invalid results.
>
> Quick sort will.

I don't understand why.

The partitioning process first selects one of the elements as the
pivot value, and then goes from the beginning towards the end and from
the end towards the beginning, until these two index values are the
same. Along the way it may swap pairs of values depending on how they
compare to the pivot value. However, I don't see how this can make the
index values go out of range. They are only compared to each other, not
to the array elements or the pivot element.

James Kanze

unread,

Mar 3, 2009, 5:49:58 AM3/3/09

to

> > Quick sort will.

The basic partitionning loop contains two smaller loops, one to
advance the bottom, and the other to descend the top. These
loops don't (necessarily) contain any check on the indexes,
since you've an established pre-condition that the pivot value
is in the sequence, and you'll stop when you hit it, if not
before.

The code for the partitioning in g++ is:

template<typename _RandomAccessIterator, typename _Tp, typename
_Compare>
_RandomAccessIterator
__unguarded_partition(_RandomAccessIterator __first,
_RandomAccessIterator __last,
_Tp __pivot, _Compare __comp)
{
while (true)
{
while (__comp(*__first, __pivot))
++__first;
--__last;
while (__comp(__pivot, *__last))
--__last;
if (!(__first < __last))
return __first;
std::iter_swap(__first, __last);
++__first;
}
}

Note the two inner loops. If __comp always returns true (or
even if it returns true from time to time when it shouldn't),
you'll get an array bounds violation.

If you're still not convinced, try:

struct BadCompare
{
bool operator()( int, int ) const
{
return true ;
}
} ;

int
main()
{
std::vector< int > v( 10000, 43 ) ;
std::sort( v.begin(), v.end(), BadCompare() ) ;
return 0 ;
}

compiled with bounds checking turned on (-D_GLIBCXX_DEBUG with
g++). (Interestingly enough, compiled without bounds checking,
and with optimization, it seems to run forever.)

James Kanze

unread,

Mar 3, 2009, 6:09:14 AM3/3/09

to

On Mar 2, 1:13 pm, Kai-Uwe Bux <jkherci...@gmx.net> wrote:
> James Kanze wrote:
> > On Mar 1, 10:56 pm, Kai-Uwe Bux <jkherci...@gmx.net> wrote:
> >> Thomas J. Gritzan wrote:
> [snip]
> >> > The x86 floating point co-processor operates on 80 bit
> >> > floating point values. Temporaries stored in memory are
> >> > in 64 bit double precision format. So by storing the
> >> > first result into a temporary, it was truncated and
> >> > rounded to 64 bit, and by loading it back, it was
> >> > extended to 80 bit again. The second result still is 80
> >> > bit when the two are compared.

> >> Yup, and that's ok with the standard by [5/10].

> > Within an expression. As far as I know, the value with
> > extended precision *must* be rounded to double when it is
> > used to initialize a double, e.g. the return value.

> Which clause says that? As said elsethread, I am worried that
> RVO might _allow_ the compiler to cut corners in some
> unexpected cases.

RVO doesn't change floating point semantics. A double is a
double. All RVO does is allow two objects with basically
non-overlapping lifetimes to be merged, but the double object is
still there. And the question isn't which clause says that the
compiler can't do this; the question is which clause allows the
compiler to use additional bits in an actual object. A return
value is an object, with a specific type.

(Footnote 55 makes it clear that an explicit type conversion or
an assignment must revert to standard precision. Are you saying
that an initialization needed, although an assignment must?)

> [snip]
> >> However, it is true that the excess precision in registers can
> >> screw up comparisons. But it can be prevented forcing a
> >> write-reread cycle.

> > Not with g++, unless you specify -ffloat-store.

> I don't understand: are you implying that a write-reread cycle
> would not be sufficient to truncate the values _or_ that with
> g++ one needs the option -ffloat-store to force a write-reread
> cycle?

The latter, but at the hardware level. In C++, returning a
double is a write (the return value is a object with a specific
type which must be initialized), and using it is a read. G++
does not respect this unless you specify -ffloat-store.

And of course, in practice, I think that the Intel FPU has an
instruction to round to floating precision, without actually
doint the write-reread. Which is all that is really required.

> You make it sound like you contradict what I am saying
> whereas, in fact, you might not.

The question is what you mean by write-reread. The C++
operations of initializing or modifying an object, then
accessing that object, or hardware operations.

> >> The program

> >> using namespace std;

> The original, has

> sin(x)+cos(y) == sin(x)+cos(y)

> equal( sin(x)+cos(y), sin(x)+cos(y) )

Yes. And the original compared the results of the built-in
operator+. My mistake---there is a definite difference.

> > (None of the writes or reads are "observable behavior".)

> I don't think that that matters, and it is not what I meant to
> imply by using the term write-reread cycle. What _does_ matter
> is that an object is initialized under certain circumstances
> and that that initialization requires truncation.

I'm beginning to get confused about what I wrote myself.
Basically, what I was trying to say is that g++ doesn't always
respect the write-rereads required of the abstract machine by
the standard. Whether you use pass by value or pass by
reference, for example, shouldn't change anything in your
example, since in both cases, the compiler must initialize an
object. In the original, though, there weren't any intermediate
objects declared, so that the extended precision could be
maintained.

Kai-Uwe Bux

unread,

Mar 3, 2009, 6:44:14 AM3/3/09

to

James Kanze wrote:

Yes :-) I guess that's what I'm saying.

The problem is wether a return statement always implies object
initialization. If I look at a composite expression such as

sin(x) + cos(y)

then sin() and cos() are function calls. I am not certain that the standard
requires truncation of the returned values because in the expression they
occur, those return values are not treated as objects but just as values.
The standard allows RVO and I understand that as a way to get rid of
temporaries. If the object is gone, there is no need for it to be
initialized. I also think that RVO may change the semantics of a program.
I.e., eliding the copy-constructor is allowed to change the semantics, why
couldn't that mean that truncation is elided? In fact, I would think that
[5/10] looses much of its appeal should it _not_ apply in a case like
above. However, in

equal( sin(x), cos(y) )

the return values are used to initialize parameters, and in this case, I can
see that the standard would require the abstract machine to perform an
object initialization.

None of this is firm belief in any way. I find the standard very confusing
with regard to floating point arithmetic.

Best

Kai-Uwe Bux

Philip Pemberton

unread,

Mar 3, 2009, 6:00:51 PM3/3/09

to

On Sun, 01 Mar 2009 17:04:45 -0500, Kai-Uwe Bux wrote:

> Out of curiosity, I suggested elsethread the following implementation
> for Pixel::operator<:
>
> bool operator<(const Pixel &rhs) const {
> return std::less< double >()( get_luminance(), rhs.get_luminance()
> );
> }
>
> Could you try that?

That fixed it -- the segfaults have stopped, and the reference image
actually looks reasonably sane now (which is hardly surprising). I can't
say I've managed to figure out exactly why the implementation of
std::sort in libstdc++ kept dying like that, but at least it works now...

Looks like it was a rounding issue of some description, though -- and
there is a mention about it being the "most frequently reported non-bug"
on the gcc website... nice how I found that /now/...

What I'm not 100% sure on is why feeding this through a comparison
functor stops the segfaults. Is it just down to the fact that gcc is
forced to typecast both sides of the comparison into doubles (applying
rounding in the process) to pass to less_than::operator(), or am I
missing something?

(As you can probably tell, I'm quite new to STL -- I did a few modules on
object oriented programming in C++ in college and university, but none of
them covered much of STL beyond containers and algorithms...)

Thanks,
Phil.

Juha Nieminen

unread,

Mar 3, 2009, 8:04:08 PM3/3/09

to

James Kanze wrote:
> Note the two inner loops. If __comp always returns true (or
> even if it returns true from time to time when it shouldn't),
> you'll get an array bounds violation.

I suppose you are right.

Given that if the comparator does not work as required the compiler is
free to do whatever it pleases, skipping the extra conditionals in the
name of efficiency is a valid strategy.

Kai-Uwe Bux

unread,

Mar 3, 2009, 8:57:24 PM3/3/09

to

Philip Pemberton wrote:

> On Sun, 01 Mar 2009 17:04:45 -0500, Kai-Uwe Bux wrote:
>
>> Out of curiosity, I suggested elsethread the following implementation
>> for Pixel::operator<:
>>
>> bool operator<(const Pixel &rhs) const {
>> return std::less< double >()( get_luminance(), rhs.get_luminance()
>> );
>> }
>>
>> Could you try that?
>
> That fixed it -- the segfaults have stopped, and the reference image
> actually looks reasonably sane now (which is hardly surprising). I can't
> say I've managed to figure out exactly why the implementation of
> std::sort in libstdc++ kept dying like that, but at least it works now...
>
> Looks like it was a rounding issue of some description, though -- and
> there is a mention about it being the "most frequently reported non-bug"
> on the gcc website... nice how I found that /now/...
>
> What I'm not 100% sure on is why feeding this through a comparison
> functor stops the segfaults. Is it just down to the fact that gcc is
> forced to typecast both sides of the comparison into doubles (applying
> rounding in the process) to pass to less_than::operator(), or am I
> missing something?

Elsethread, there is an ongoing, extensive discussion of what might cause
the segfault. One hypothesis was a comparison predicate that yields
inconsistent answers due to internal excess precision. As you point out,
the ideal behind using std::less instead of operator< is to force
truncation. What is still under discussion is whether the original code is
required to work by the standard.

> (As you can probably tell, I'm quite new to STL -- I did a few modules on
> object oriented programming in C++ in college and university, but none of
> them covered much of STL beyond containers and algorithms...)

Well, the issue is not so much with the STL. The issue is with floating
point arithmetic. That is just hard.

Best

Kai-Uwe Bux

James Kanze

unread,

Mar 4, 2009, 5:30:18 AM3/4/09

to

> [snip]

I don't think there's any question about that. §8.5/12 says
that the "initialization that occurs in [...], function return,
[...] is called copy initialization [...]". On the other hand,
in §6.6.3, all it says is "[return with value...], the value of
the expression is returned to the caller", which is rather vague
about how, and in other contexts, the standard does distinguish
between class type and non-class type return values; the return
value is a rvalue, which only behaves like an object if it has
class type.

IMHO, the fact that there is "copy initialization" (and that the
standard doesn't seem to address the question of what it means
to return a value otherwise) means that the semantics of
initializing an object must apply (and an object of type double
may not have extended precision). Whether this is intentional,
or what was intended, however, is another question---I think
it's probably worth taking up in comp.std.c++. And regardless
of what was intended, the standard should be clearer about it.

> If I look at a composite expression such as

> sin(x) + cos(y)

> then sin() and cos() are function calls. I am not certain that
> the standard requires truncation of the returned values
> because in the expression they occur, those return values are
> not treated as objects but just as values.

That's a good example, especially as I expect that on some
architectures, sin and cos (or if not, maybe abs(double) or some
such function) in fact resolve to compiler built-ins, generating
only short sequence of machine instructions, leaving the value
in a floating point register.

> The standard allows RVO and I understand that as a way to get
> rid of temporaries. If the object is gone, there is no need
> for it to be initialized.

I'm not sure, but as I understand it, RVO only allows merging a
temporary with another object. There's always an object
involved. (But admittedly, I've only thought about it in terms
of class types.)

> I also think that RVO may change the semantics of a program.
> I.e., eliding the copy-constructor is allowed to change the
> semantics, why couldn't that mean that truncation is elided?
> In fact, I would think that [5/10] looses much of its appeal
> should it _not_ apply in a case like above. However, in

> equal( sin(x), cos(y) )

> the return values are used to initialize parameters, and in
> this case, I can see that the standard would require the
> abstract machine to perform an object initialization.

I think the same logic should apply to value arguments as to
return values. For example, if sin(x) really does use extended
precision, which is allowed to be kept as a return value, why
should the extended precision of multiplication be lost in
something like "sin(x*x)"?

Your example used double const&, of course, which does change
things. A reference always refers to an object, by definition,
so there can be no doubt that the standard requires the "value"
to be converted to an object. (The value has been lvaluized, to
coin a word.)

> None of this is firm belief in any way. I find the standard
> very confusing with regard to floating point arithmetic.

Agreed. I think, partially, this is intentional. The authors
of the C standard wanted to give implementations as much leeway
as possible, leaving most of the issues as "quality of
implementation". (The authors of the C standard did introduce
some restrictions with regards to K&R C, which allowed the
compiler to implement a+(b+c) exactly as if it were (a+b)+c, or
even (a+c)+b, even when the results might be different---usually
the case in floating point.)

Greg Herlihy

unread,

Mar 5, 2009, 2:07:38 PM3/5/09

to

On Mar 1, 1:47 pm, Juha Nieminen <nos...@thanks.invalid> wrote:
> Thomas J. Gritzan wrote:
> > Sounds like BS to you, Juha? ;-)
>
> Sounds like a compiler bug to me. Especially the fact that compiling
> with optimizations results in different behavior than without
> optimizations is rather telling.

The particular floating point values in this case are influenced by
register "pressure", that is, how many floating point registers happen
to be available at a particular point in the program's execution flow.
The gcc manual refers to this behavior as the numerical instability of
'387 code. Nonetheless, the problem is gcc's:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323

The most straightforward solution is to use SSE instructions for
floating point operations. The SSE registers (available since the
Pentium III and Athlon) are IEEE conformant - and do not have the same
instability issues as the old 387 registers.

> On the other hand, I suppose this is, once again, one of those things
> where compilers are free to do whatever they want, so even if one call
> to f() results in -2.1 and a second call results in pi, that would still
> be "allowed". After all, does the standard impose any requirements on
> the correctness of floating point calculations?

C99 does not allow any floating point values in registers to "spill".
Moreover any assignment or cast to a floating point type must truncate
any additional precision in the value being assigned or converted.

> I actually find it rather ironic that it gives the incorrect answer
> without optimizations and the correct answer with optimizations. One
> would think that, if it makes any difference, it would be the optimized
> version which cuts corners at the cost of accuracy, and the unoptimized
> version would do all steps in the same way (because speed is not, after
> all, a concern).

In this case, any change in how floating point registers are allocated
(whether for optimizations or other reasons) can affect the result. So
any code that depends on stable floating points computations (as
should be the case) is clearly in trouble in this situation.

Greg