C/C++ pitfalls related to 64-bits (unsigned long & double)

Alex Vinokur

unread,

Feb 13, 2012, 7:41:44 AM2/13/12

to

Hi,

unsigned long a = -1;
double b = a;
unsigned long c = b;

Model 32-bits: a == c
Model 64-bits: a != c

It is not a compiler bug for 64-bits.

Is it a design bug of C/C++ languge for 64-bits?

Alex

Leigh Johnston

unread,

Feb 13, 2012, 7:49:21 AM2/13/12

to

No.

a == c for both "32-bits" and "64-bits" on my compiler (VC9).

/Leigh

Goran

unread,

Feb 13, 2012, 7:54:36 AM2/13/12

to

No. Your code snippet will produce "c" that equals "a" on both 32- and
64-bit systems.

If you think there is, post an example that shows the problem, e.g.:

#include <iostream>
int main()
{

unsigned long a = -1;
double b = a;
unsigned long c = b;

if (a != c)
{
std::cout << "error!";
}
}

Goran.

Alex Vinokur

unread,

Feb 13, 2012, 7:59:01 AM2/13/12

to

aCC: HP C/aC++ B3910B A.06.25.01 [May 16 2010]
For 64-bits:
a = 0xffffffffffffffff
c = 0x8000000000000000

Intel(R) C++ Intel(R) 64 Compiler XE for applications running on
Intel(R) 64, Version 12.0.4.191 Build 20110427
For 64-bits:
a = 0xffffffffffffffff
c = 0

Alex Vinokur

unread,

Feb 13, 2012, 8:00:26 AM2/13/12

to

#include <iostream>
#include <cassert>

typedef unsigned char uchar;

#define SHOW_HEX(x) std::cerr << #x << " = " << std::hex <<
std::showbase << x << std::dec << std::endl
#define SHOW_DOUBLE(x) std::cerr << #x << " = " << x << std::endl
#define SHOW_CHAR(x) std::cerr << #x << " = " << std::hex <<
std::showbase << std::size_t(uchar(x)) << std::dec << std::endl

int main()
{
// -------------------------------------
std::cout << "Model: " << sizeof(void*) * CHAR_BIT << "-bits"<<
std::endl;
// -------------------------------------

// std::size_t a = std::size_t(-1);
// double b = a;
std::size_t a = std::numeric_limits<std::size_t>::max();
double b = a;
std::size_t c = b;
char* pa = reinterpret_cast<char*>(&a);
char* pb = reinterpret_cast<char*>(&b);
char* pc = reinterpret_cast<char*>(&c);

SHOW_HEX(a);
SHOW_DOUBLE(b);
SHOW_HEX(c);

std::cerr << std::endl;
for (std::size_t i = 0; i < (sizeof(std::size_t)/sizeof(char)); i++)
{
SHOW_CHAR(pa[i]);
}
std::cerr << std::endl;
for (std::size_t i = 0; i < (sizeof(double)/sizeof(char)); i++)
{
SHOW_CHAR(pb[i]);
}
std::cerr << std::endl;
for (std::size_t i = 0; i < (sizeof(std::size_t)/sizeof(char)); i++)
{
SHOW_CHAR(pc[i]);
}

assert (a == c);

return 0;
}

Leigh Johnston

unread,

Feb 13, 2012, 8:07:54 AM2/13/12

to

Well it seems obvious that that won't work if 'sizeof(unsigned long) ==
sizeof(double)' for 64-bit builds on the compilers you are using. This
is not a problem with the C++ language but how you are using your
compilers. Try using 'unsigned int' instead.

/Leigh

Eric Sosman

unread,

Feb 13, 2012, 8:14:16 AM2/13/12

to

Whether the language design is faulty seems a matter of opinion.
However, part of the "spirit of C" is to stay fairly close to the
hardware. Since hardware that offers 64 bits of precision in the
floating-point format used for `double', some loss of precision in
`b = a' must be expected.

The language *could* have been defined to raise an exception
whenever a floating-point operation delivers an inexact result, but
that would have meant raising such exceptions for a large fraction
of all F-P calculations, perhaps many times in a single expression.
Or the language could have left the inexact conversion entirely
undefined, in which case there'd be no reason to expect `a == c'
(or even that the execution would get that far). The behavior
actually chosen (conversion yields one of the two representable
neighbors) seems fairly benign, not something I'd call a bug. But,
as I say, that's a matter of opinion.

(The language *could* have been defined to deliver exact F-P
results for all calculations, widening the representation at need.
That's the approach used on the Starship Enterprise, where Kirk
crippled the computer by asking it to calculate pi ...)

--
Eric Sosman
eso...@ieee-dot-org.invalid

Ben Bacarisse

unread,

Feb 13, 2012, 8:16:04 AM2/13/12

to

Alex Vinokur <alex.v...@gmail.com> writes:

Talking from the C perspective here...

> unsigned long a = -1;
> double b = a;
> unsigned long c = b;
>
> Model 32-bits: a == c
> Model 64-bits: a != c
>
> It is not a compiler bug for 64-bits.

Quite. Both outcomes are permitted.

> Is it a design bug of C/C++ languge for 64-bits?

No.

Your use of "64-bits" is a little confusing. Not all 64-bit systems
have 64 bit unsigned longs which is, I think, what you are talking
about.

On systems with 64-bit longs and standard 56-bit mantissa doubles, you
can not represent ULONG_MAX (the value of 'a' in the above code) exactly
in a double. C mandates that you get one of the two nearest
representable values, but it wont be exact. When the conversion goes
the other way the result can be undefined (if the floating point values
was rounded up to a value larger that ULONG_MAX), but, even if the
double has a value in the range of unsigned long, it will not longer
equal ULONG_MAX.

I said "no" to it being an error in the design of the language because
solving it would impose the kind of burden on implementations that C
rejects. C is designed to use native machine types wherever possible.

--
Ben.

Eric Sosman

unread,

Feb 13, 2012, 8:17:22 AM2/13/12

to

On 2/13/2012 8:14 AM, Eric Sosman wrote:
> [...] Since hardware that offers 64 bits of precision in the

> floating-point format used for `double', some loss of precision in
> `b = a' must be expected.

Oh, drat. There was supposed to be an "is fairly rare" just
before the comma ...

--
Eric Sosman
eso...@ieee-dot-org.invalid

Ben Bacarisse

unread,

Feb 13, 2012, 8:27:26 AM2/13/12

to

Goran <goran...@gmail.com> writes:

> On Feb 13, 1:41 pm, Alex Vinokur <alex.vino...@gmail.com> wrote:
>> Hi,
>>
>> unsigned long a = -1;
>> double b = a;
>> unsigned long c = b;
>>
>> Model 32-bits: a == c
>> Model 64-bits: a != c
>>
>> It is not a compiler bug for 64-bits.
>>
>> Is it a design bug of C/C++ languge for 64-bits?
>
> No. Your code snippet will produce "c" that equals "a" on both 32- and
> 64-bit systems.

Not always. His (and your) use of "64-bit systems" hides that fact that
they are not all the same:

$ cat t.c
#include <stdio.h>

int main(void)
{

unsigned long a = -1;

puts(a == (unsigned long)(double)a ? "same" : "different");
}
$ gcc -o t -std=c99 -pedantic t.c
$ ./t
different

(g++ will do the same, here).

<snip>
--
Ben.

Noob

unread,

Feb 13, 2012, 9:22:21 AM2/13/12

to

Nicely done. You've hit all the right nails.

Remaining conspicuously vague, conflating C and C++ while cross-posting
to both groups, claiming a defect in the language, conjuring the ever
so misunderstood floating-point format, ...

You would make Kenny so proud!

If you're bored, you could read Goldberg's paper (all of it!)
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

Goran

unread,

Feb 13, 2012, 10:14:59 AM2/13/12

to

On Feb 13, 1:54 pm, Goran <goran.pu...@gmail.com> wrote:
> On Feb 13, 1:41 pm, Alex Vinokur <alex.vino...@gmail.com> wrote:
>
> > Hi,
>
> > unsigned long a = -1;
> > double b = a;
> > unsigned long c = b;
>
> > Model 32-bits: a == c
> > Model 64-bits: a != c
>
> > It is not a compiler bug for 64-bits.
>
> > Is it a design bug of C/C++ languge for 64-bits?
>
> No. Your code snippet will produce "c" that equals "a" on both 32- and
> 64-bit systems.

Here I stand ashamed. I overlook the "unsigned" part. :-(

It's what Noob says, first and foremost. max of size_t is likely
2^64-1. That's more digits than number of significant digits "double"
can carry.

Goran.

BGB

unread,

Feb 13, 2012, 1:18:23 PM2/13/12

to

yeah.

also, even though double has more bits than, say, an integer, does not
mean it will reliably encode an integer's value (it can do so in theory,
and will most often do so, but whether or not it will actually always do
so is more "up for grabs").

it is much less reliable with float (since float has only about 23 bits
to hold an integer's value, vs the 52 bits or so in double).

hence, float can't reliably hold the entire integer range, and double
can't reliably hold the entire long-long range (the size of long is
target specific, even for the same CPU architecture and operating mode,
it may still vary between the OS and compiler in use).

the most common behavior seems to be:
int -> float or double, may produce a value slightly below the integer;
float or double to int, will generally truncate the value, yielding the
integer representation as rounded towards 0.

the result then is a tendency for an int->double->int conversion to have
a small chance to drop the integer value towards 0 (why? I don't know
exactly, but I have observed it before).

one can counteract this by fudging the value with a small epsilon prior
to converting back into an integer.

say, for example (untested, from memory):
(v>=0)?((int)(v+0.0001)):((int)(v-0.0001));

can't say it will always work, but similar seems to work fairly well IME
(at least on generic x86 based targets).

or such...

Richard

unread,

Feb 13, 2012, 1:49:31 PM2/13/12

to

[Please do not mail me a copy of your followup]

Alex Vinokur <alex.v...@gmail.com> spake the secret code
<2d5ceb71-98bb-440a...@s13g2000yqe.googlegroups.com> thusly:

>unsigned long a = -1;

Isn't this undefined behavior right from the get-go?
--
"The Direct3D Graphics Pipeline" -- DirectX 9 version available for download
<http://legalizeadulthood.wordpress.com/the-direct3d-graphics-pipeline/>

Legalize Adulthood! <http://legalizeadulthood.wordpress.com>

James Kuyper

unread,

Feb 13, 2012, 1:55:56 PM2/13/12

to

On 02/13/2012 01:49 PM, Richard wrote:
...

> Alex Vinokur <alex.v...@gmail.com> spake the secret code
> <2d5ceb71-98bb-440a...@s13g2000yqe.googlegroups.com> thusly:
>
>> unsigned long a = -1;
>
> Isn't this undefined behavior right from the get-go?

No, why do you think so?

The behavior is defined by 6.3.1.3p2 in the C standard, which has been
quoted several times already in this thread. Do you have any reason to
doubt the accuracy or applicability of that section to this code, or
have you simply not been paying attention?
.
Since this is cross-posted comp.lang.c++, section 4.7p2 is the
appropriate location in the C++ standard; it says essentially the same
thing, but with different language.

Juha Nieminen

unread,

Feb 13, 2012, 2:14:57 PM2/13/12

to

In comp.lang.c++ James Kuyper <james...@verizon.net> wrote:
> On 02/13/2012 01:49 PM, Richard wrote:
> ...
>> Alex Vinokur <alex.v...@gmail.com> spake the secret code
>> <2d5ceb71-98bb-440a...@s13g2000yqe.googlegroups.com> thusly:
>>
>>> unsigned long a = -1;
>>
>> Isn't this undefined behavior right from the get-go?
>
> No, why do you think so?

Even if it were undefined, you could simply change it to:

unsigned long a = ~0UL;

Barry Schwarz

unread,

Feb 13, 2012, 3:14:15 PM2/13/12

to

It is simply an artifact of the number of significant digits in a
double and an unsigned long which I think the implementation must
document.

--
Remove del for email

MikeWhy

unread,

Feb 13, 2012, 4:18:49 PM2/13/12

to

BGB wrote:
> ... even though double has more bits than, say, an integer, does not

> mean it will reliably encode an integer's value (it can do so in
> theory, and will most often do so, but whether or not it will
> actually always do so is more "up for grabs").

What circumstances are those? Integers hold integer values. AFAIK, all
integer values encode correctly in FP of higher precision. It's a simple
matter of normalization. (On Intel, a bit-scan operation.)

James Kuyper

unread,

Feb 13, 2012, 4:26:27 PM2/13/12

to

You've covered the correct point, but have apparently not realized that
it was relevant. He'd talking about converting an integer value to a
floating point type when the floating point type has insufficient
precision to encode the value correctly. This is less than clear,
because he's making all kinds of implementation-specific assumptions
about the sizes of various types, and (inconsistently) using 'integer'
as if it were synonymous with 'int'.

The key point is that, for instance, a 32-bit integer type can represent
values too large to be converted without loss of precision to a 32-bit
floating point type, because the floating point type uses some of those
bits for the exponent. The same thing applies to 64 bit integer types
and and 64 bit floating point types.

glen herrmannsfeldt

unread,

Feb 13, 2012, 4:52:29 PM2/13/12

to

In comp.lang.c++ Eric Sosman <eso...@ieee-dot-org.invalid> wrote:
> On 2/13/2012 8:14 AM, Eric Sosman wrote:
>> [...] Since hardware that offers 64 bits of precision in the
>> floating-point format used for `double', some loss of precision in
>> `b = a' must be expected.

> Oh, drat. There was supposed to be an "is fairly rare" just
> before the comma ...

x87 hardware isn't that rare. Depending on the implementation,
the compiler might do the calculation in temporary real form,
with all 64 bits.

But, yes, the usual double is 64 total bits, so fewer than 64
for the significand. As far as I know, though, there is no
restriction in C or C++ against a larger double, such as
a 64 bit float and 128 bit double.

-- glen

Ben Bacarisse

unread,

Feb 13, 2012, 5:00:13 PM2/13/12

to

BGB <cr8...@hotmail.com> writes:
<snip>

> also, even though double has more bits than, say, an integer, does not
> mean it will reliably encode an integer's value (it can do so in
> theory, and will most often do so, but whether or not it will actually
> always do so is more "up for grabs").

It's not up for grabs in C (and C++ is essentially the same in this
regard). If the integer can be represented exactly in the floating
point type, it must be.

<snip>

> float or double to int, will generally truncate the value, yielding
> the integer representation as rounded towards 0.

That's true of the truncated value can be represented as an int. If
not, the behaviour is undefined. For example, in the example that
triggered this thread my implementation produces zero as the result.

> the result then is a tendency for an int->double->int conversion to
> have a small chance to drop the integer value towards 0 (why? I don't
> know exactly, but I have observed it before).

If the int is "in range" you don't have a conforming C implementation.

> one can counteract this by fudging the value with a small epsilon
> prior to converting back into an integer.
>
> say, for example (untested, from memory):
> (v>=0)?((int)(v+0.0001)):((int)(v-0.0001));

I can't see how this helps. If v is representable exactly as a double,
the round trip has no effect so this code is not needed. Can you give me
a use-case?

> can't say it will always work, but similar seems to work fairly well
> IME (at least on generic x86 based targets).

--
Ben.

MikeWhy

unread,

Feb 13, 2012, 5:05:57 PM2/13/12

to

Point taken, which is the OP point of 64-bit ULL in a 53 bit mantissa (or
32-bit int in a single precision float). This is easily understood and
documented for the architecture in numeric_limits. Still wondering here
about the "up for grabs" part. It seems to imply some edge condition that
isn't accounted for.

James Kuyper

unread,

Feb 13, 2012, 5:08:08 PM2/13/12

to

On 02/13/2012 04:52 PM, glen herrmannsfeldt wrote:
...

> for the significand. As far as I know, though, there is no
> restriction in C or C++ against a larger double, such as
> a 64 bit float and 128 bit double.

True. Such a restriction applies only to C implementations that
pre-#define __STDC_IEC_559__, in which case "The double type matches the
IEC 60559 double format" (F.2p1).

It's not clear to me that the C++ standard makes any such requirement if
std::numeric_limits<double>.is_iec559 is true. It seem to me that
is_iec559 could still be true if double is an extended double IEC 559 type.

James Kuyper

unread,

Feb 13, 2012, 5:13:57 PM2/13/12

to

On 02/13/2012 05:05 PM, MikeWhy wrote:
> James Kuyper wrote:
>> On 02/13/2012 04:18 PM, MikeWhy wrote:
>>> BGB wrote:
>>>> ... even though double has more bits than, say, an integer, does not
>>>> mean it will reliably encode an integer's value (it can do so in
>>>> theory, and will most often do so, but whether or not it will
>>>> actually always do so is more "up for grabs").

...

> documented for the architecture in numeric_limits. Still wondering here
> about the "up for grabs" part. It seems to imply some edge condition that
> isn't accounted for.

It sounds to me like he's not sure how to detect situations where the
conversion would be unsafe.

Ben Bacarisse

unread,

Feb 13, 2012, 6:35:13 PM2/13/12

to

That's not how I took it. I think the condition "even though double has
more bits than, say, an integer" is meant to suggest that exact
conversion is possible. Of course, that's not a literal consequence of
double having more bits (it must have a nearly equal number of mantissa
bits) but it's the only way I could make sense of "it can do so in
theory". All such things are guesses, of course, so I anticipate being
wrong.

--
Ben.

BGB

unread,

Feb 13, 2012, 7:04:07 PM2/13/12

to

On 2/13/2012 3:00 PM, Ben Bacarisse wrote:
> BGB<cr8...@hotmail.com> writes:
> <snip>
>> also, even though double has more bits than, say, an integer, does not
>> mean it will reliably encode an integer's value (it can do so in
>> theory, and will most often do so, but whether or not it will actually
>> always do so is more "up for grabs").
>
> It's not up for grabs in C (and C++ is essentially the same in this
> regard). If the integer can be represented exactly in the floating
> point type, it must be.
>

theoretically, yes.

given a double has 52 bits, and int is 32 bits, and it is possible to
convert exactly, it should always be reliable.

but, I have seen it not work exactly right, albeit in rare cases (IME,
usually on 64-bit Linux systems, generally fairly rare and when using
AMD chips IME).

I haven't seen the issue on Win64 though that I can remember, nor with
32-bit code, so I don't know.

it "might" have something to do with SSE for all I know (since 32-bit
code typically uses x87, and 64-bit typically uses SSE), or maybe
something to do with GCC, or similar.

might require researching, like trying to figure how exactly AMD chips
implement the "cvtsi2sd" and "cvtsd2si" instructions or similar... (I am
not even particularly sure which side of the conversion would have been
introducing a loss of accuracy, or if the cause could be something else
"in the middle" somewhere).

> <snip>
>> float or double to int, will generally truncate the value, yielding
>> the integer representation as rounded towards 0.
>
> That's true of the truncated value can be represented as an int. If
> not, the behaviour is undefined. For example, in the example that
> triggered this thread my implementation produces zero as the result.
>

I didn't notice that part of the thread until after I posted.

>> the result then is a tendency for an int->double->int conversion to
>> have a small chance to drop the integer value towards 0 (why? I don't
>> know exactly, but I have observed it before).
>
> If the int is "in range" you don't have a conforming C implementation.
>

could be.

all I know is I am fairly sure I have seen it happen in the past (unless
I remember seeing an issue probably with other causes, like maybe
arithmetic was being done somewhere was messing it up or similar).

I never really went and did an in depth investigation of the problem, as
it was fixed easily enough when I ran into it (and is not too much
different than other epsilon-type stuff when working with floating-point
types).

like, the whole: "if(fabs(a-b)<0.000001)" thing to compare for equality
or similar.

>> one can counteract this by fudging the value with a small epsilon
>> prior to converting back into an integer.
>>
>> say, for example (untested, from memory):
>> (v>=0)?((int)(v+0.0001)):((int)(v-0.0001));
>
> I can't see how this helps. If v is representable exactly as a double,
> the round trip has no effect so this code is not needed. Can you give me
> a use-case?
>

generally, I had seen it in my 3D engine, where in some cases integers
ended up getting converted to doubles and back, and sometimes they would
get "bumped" in this way. adding a small adjustment seemed to fix the
problem.

mostly since then I have been working under the assumption of trying to
avoid conversions to/from floating point types when possible (partly
also as I had made the past observation that these conversions have also
tended to be costly).

or such...

Malcolm McLean

unread,

Feb 13, 2012, 7:15:30 PM2/13/12

to

On Feb 14, 12:04 am, BGB <cr88...@hotmail.com> wrote:
> On 2/13/2012 3:00 PM, Ben Bacarisse wrote:

> > BGB<cr88...@hotmail.com> writes:

>
> >> the result then is a tendency for an int->double->int conversion to
> >> have a small chance to drop the integer value towards 0 (why? I don't
> >> know exactly, but I have observed it before).
>
> > If the int is "in range" you don't have a conforming C implementation.
>

> all I know is I am fairly sure I have seen it happen in the past (unless
> I remember seeing an issue probably with other causes, like maybe
> arithmetic was being done somewhere was messing it up or similar).
>

Sounds like a hardware bug. The C compiler can't necessarily work
round those.
--
C programming material on my website
http://www.malcolmmclean.site11.com/www

BGB

unread,

Feb 13, 2012, 7:25:06 PM2/13/12

to

as I understand it, the entire range of 32 bit integers can be exactly
represented by a double.

theoretically, it should be a matter of sticking the bits into the
mantissa and adjusting the exponent as needed (so that the value is
normalized).

the issue is that, assuming my memory is correct, I had seen systems
where it didn't always work, but it was more like "once in a great
while", rather than the conversion being consistently wrong.

this wasn't being an issue with large values either, but more like a
value of "1000" would occasionally end up as "999" and similar, but it
did seem to always tend towards 0, so it wasn't like it was getting
"1001" or similar.

IIRC, when measuring, it was typically off by a tiny amount.
I am not certain whether or not any arithmetic was being performed on
the values.

what I remember about the configuration I saw it on:
Linux x86-64 (Fedora 11 IIRC), compiling with GCC;
CPU: AMD Athlon X2 (I forget which core).

at the time I was also compiling for Win64 ("Windows XP x64") using
MSVC, but did not see the issue with this configuration.

IIRC, there was a difference, namely that GCC was doing conversions
directly using "cvtsi2sd" and "cvtsd2si", whereas MSVC was doing the
conversion via internal function calls (this particular difference
seemed fairly common between GCC and MSVC, where GCC would typically
directly use math instructions, but MSVC would call functions to do
stuff like this, even with compiler optimizations turned on).

but, this is not to say my memory is being entirely accurate though (all
this was several years ago).

Ben Bacarisse

unread,

Feb 13, 2012, 7:46:48 PM2/13/12

to

Since this behaviour is required for C implementations to be conforming,
deviation from it is important. Was there perhaps a bug report filed?

<snip>

>>> one can counteract this by fudging the value with a small epsilon
>>> prior to converting back into an integer.
>>>
>>> say, for example (untested, from memory):
>>> (v>=0)?((int)(v+0.0001)):((int)(v-0.0001));
>>
>> I can't see how this helps. If v is representable exactly as a double,
>> the round trip has no effect so this code is not needed. Can you give me
>> a use-case?
>>
>
> generally, I had seen it in my 3D engine, where in some cases integers
> ended up getting converted to doubles and back, and sometimes they
> would get "bumped" in this way. adding a small adjustment seemed to
> fix the problem.

Can you add a test to the code to print v when

(int)(double)v != v &&
(v >= 0 ? (int)(v+0.0001) : (int)(v-0.0001)) == v

? That way we might get an example of the problem you are reporting.

<snip>
--
Ben.

BGB

unread,

Feb 13, 2012, 8:31:48 PM2/13/12

to

not at the time, I merely thought of it as an interesting occurrence and
worked around it.

> <snip>
>>>> one can counteract this by fudging the value with a small epsilon
>>>> prior to converting back into an integer.
>>>>
>>>> say, for example (untested, from memory):
>>>> (v>=0)?((int)(v+0.0001)):((int)(v-0.0001));
>>>
>>> I can't see how this helps. If v is representable exactly as a double,
>>> the round trip has no effect so this code is not needed. Can you give me
>>> a use-case?
>>>
>>
>> generally, I had seen it in my 3D engine, where in some cases integers
>> ended up getting converted to doubles and back, and sometimes they
>> would get "bumped" in this way. adding a small adjustment seemed to
>> fix the problem.
>
> Can you add a test to the code to print v when
>
> (int)(double)v != v&&
> (v>= 0 ? (int)(v+0.0001) : (int)(v-0.0001)) == v
>
> ? That way we might get an example of the problem you are reporting.
>

I would have to find an example of it again...

I remember seeing the problem a few years ago in some code of mine, but
don't have any recent memory of bugs resulting from it (but, then again,
this could also be due to code paranoia...).

just went and tried to recreate it, with mixed results:
a raw conversion does not show any issues (seems to always be reliable);
if I add a value to the double, and subtract the same value, then it
starts acting up.

testing the code below in Fedora 13 x86-64 within VMware (yes, not the
raw HW, but I would otherwise have to reboot).

#include <stdio.h>

int main()
{
double d;
int i, j, k;

for(i=0; i<100000000; i++)
{
j=rand()*rand()*i;
d=j;
d=d+1.0; //(1)
d=d-1.0; //(1)
k=d; //(2)
k=(d>=0?(int)(d+0.0001):(int)(d-0.0001)); //(2)
if(j!=k)
printf("%d %d\n", j, k);
}
}

1: if these lines are commented out, then the printf is never called,
but if uncommented (along with using different constant values), then I
start seeing messages (with it off-by-one, rounded towards 0).

2: if I switch to the second form, which makes the fudging, then the
messages disappear (they still appear with the first form).

so, it would seem to be mostly an issue in this case of whether or not
one does any arithmetic on the values (not sure whether or not this
still counts). CPU is an "AMD Athlon II X4 630".

or, at least, this is what I am seeing here...

here is the inner part of the loop (in ASM):
.L4:
movl $0, %eax
call rand
movl %eax, %ebx
movl $0, %eax
call rand
imull %ebx, %eax
imull -20(%rbp), %eax
movl %eax, -24(%rbp)
cvtsi2sd -24(%rbp), %xmm0
movsd %xmm0, -32(%rbp)
movsd -32(%rbp), %xmm1
movsd .LC0(%rip), %xmm0
addsd %xmm1, %xmm0
movsd %xmm0, -32(%rbp)
movsd -32(%rbp), %xmm0
movsd .LC0(%rip), %xmm1
subsd %xmm1, %xmm0
movsd %xmm0, -32(%rbp)
movsd -32(%rbp), %xmm0
cvttsd2si %xmm0, %eax
movl %eax, -36(%rbp)
movl -24(%rbp), %eax
cmpl -36(%rbp), %eax
je .L3
movl $.LC1, %eax
movl -36(%rbp), %edx
movl -24(%rbp), %ecx
movl %ecx, %esi
movq %rax, %rdi
movl $0, %eax
call printf
.L3:

...

.LC0:
.long 0
.long 1072693248

BGB

unread,

Feb 13, 2012, 8:41:24 PM2/13/12

to

oh yeah, here is an example of the output (with a slight tweak to show
the value held by the double):
1073741824 1073741823 41CFFFFFFFFFFFFF
1073741824 1073741823 41CFFFFFFFFFFFFF
262144 262143 410FFFFFFFFFFFFF
262144 262143 410FFFFFFFFFFFFF
262144 262143 410FFFFFFFFFFFFF
67108864 67108863 418FFFFFFFFFFFFF
4 3 400FFFFFFFFFFFFF
1073741824 1073741823 41CFFFFFFFFFFFFF
1073741824 1073741823 41CFFFFFFFFFFFFF
1073741824 1073741823 41CFFFFFFFFFFFFF
4194304 4194303 414FFFFFFFFFFFFF
4194304 4194303 414FFFFFFFFFFFFF
16384 16383 40CFFFFFFFFFFFFF
67108864 67108863 418FFFFFFFFFFFFF
16384 16383 40CFFFFFFFFFFFFF

hmm, a lot of the same values seem to repeat...

Eric Sosman

unread,

Feb 13, 2012, 8:44:41 PM2/13/12

to

On 2/13/2012 4:52 PM, glen herrmannsfeldt wrote:
> In comp.lang.c++ Eric Sosman<eso...@ieee-dot-org.invalid> wrote:
>> On 2/13/2012 8:14 AM, Eric Sosman wrote:
>>> [...] Since hardware that offers 64 bits of precision in the
>>> floating-point format used for `double', some loss of precision in
>>> `b = a' must be expected.
>
>> Oh, drat. There was supposed to be an "is fairly rare" just
>> before the comma ...
>
> x87 hardware isn't that rare. Depending on the implementation,
> the compiler might do the calculation in temporary real form,
> with all 64 bits.

... which wouldn't help, as the eventual result must be
converted to plain `double'. Yes, there are systems that support
a floating-point format with >64 bits' precision, but are there
any that use such a wide precision for `double'? `long double',
maybe, but plain `double'?

Hands up: Who's got a C implementation where

sizeof(double) * CHAR_BIT > 64

? Or, more accurately to the O.P.'s question, where

DBL_MANT_DIG * log(FLT_RADIX) / log(2) >= 64?

?

--
Eric Sosman
eso...@ieee-dot-org.invalid

Ben Bacarisse

unread,

Feb 13, 2012, 8:45:47 PM2/13/12

to

In the architecture in question, yes.

> theoretically, it should be a matter of sticking the bits into the
> mantissa and adjusting the exponent as needed (so that the value is
> normalized).
>
> the issue is that, assuming my memory is correct, I had seen systems
> where it didn't always work, but it was more like "once in a great
> while", rather than the conversion being consistently wrong.
>
> this wasn't being an issue with large values either, but more like a
> value of "1000" would occasionally end up as "999" and similar, but it
> did seem to always tend towards 0, so it wasn't like it was getting
> "1001" or similar.

It would seem you are talking about a hardware bug. I'd say it was one
if it weren't for the fact that you are sure you recall correctly.

Had it been me, I'd have documented it. You can get famous for finding
Intel floating points bugs! Maybe it's not too late (see my other
post).

Can you recall which part of the round-trip was going wrong? Did
cvtsi2sd turn 1000 into something less that 1000 or did cvtsd2si turn
1000 into 999?

> IIRC, when measuring, it was typically off by a tiny amount.
> I am not certain whether or not any arithmetic was being performed on
> the values.

Oh, if there might have been arithmetic being done, how do you know the
conversion was not being done as it should? Maybe the arithmetic was
rounding in some way you did not expect?

<snip>
--
Ben.

BGB

unread,

Feb 13, 2012, 8:55:43 PM2/13/12

to

I don't remember, I think my thoughts at the time were "well, I am
getting values which are off by a tiny amount, oh well, I will fudge it".

it was a situation roughly along the lines of integers being converted
to doubles, maybe having arithmetic done on them (mostly still with
integer values), and converted back to integers later.

since it was off by a tiny amount, I just added code to fix it.

>> IIRC, when measuring, it was typically off by a tiny amount.
>> I am not certain whether or not any arithmetic was being performed on
>> the values.
>
> Oh, if there might have been arithmetic being done, how do you know the
> conversion was not being done as it should? Maybe the arithmetic was
> rounding in some way you did not expect?
>

it is possible, in my test elsewhere, it seems I can only really
recreate the issue if a value is added and then subtracted again from
the same value (in double form).

so, this may not be a conversion bug, but more of an "integer arithmetic
with doubles isn't exact" issue (leads to values ever slightly smaller
than what they would need to be).

fudging it does fix the problem, which was either-way, the original
intent of the "add a tiny amount to fudge it to the correct value"
kludge (I was not worried about the exact cause of the inexactness, I
just added something to compensate for it).

theoretically, the epsilon could probably be a bit smaller though...

Ben Bacarisse

unread,

Feb 13, 2012, 9:33:19 PM2/13/12

to

I've set followup-to: since the code is all C.

BGB <cr8...@hotmail.com> writes:

> On 2/13/2012 6:31 PM, BGB wrote:

<snip>

>> just went and tried to recreate it, with mixed results:
>> a raw conversion does not show any issues (seems to always be reliable);
>> if I add a value to the double, and subtract the same value, then it
>> starts acting up.

Yes, it looks like conversion is not the issue.

>> testing the code below in Fedora 13 x86-64 within VMware (yes, not the
>> raw HW, but I would otherwise have to reboot).
>>
>> #include <stdio.h>
>>
>> int main()
>> {
>> double d;
>> int i, j, k;
>>
>> for(i=0; i<100000000; i++)
>> {
>> j=rand()*rand()*i;
>> d=j;
>> d=d+1.0; //(1)
>> d=d-1.0; //(1)
>> k=d; //(2)
>> k=(d>=0?(int)(d+0.0001):(int)(d-0.0001)); //(2)
>> if(j!=k)
>> printf("%d %d\n", j, k);
>> }
>> }
>>
>> 1: if these lines are commented out, then the printf is never called,
>> but if uncommented (along with using different constant values), then I
>> start seeing messages (with it off-by-one, rounded towards 0).

That suggests that the round-trip conversion is happening as expected.

>> 2: if I switch to the second form, which makes the fudging, then the
>> messages disappear (they still appear with the first form).

And this is no longer a mystery. If the +1.0 and -1.0 is producing a
non-integer result, then, yes, this fudge factor will repair it.

>> so, it would seem to be mostly an issue in this case of whether or not
>> one does any arithmetic on the values (not sure whether or not this
>> still counts). CPU is an "AMD Athlon II X4 630".

I see nothing on my Intel hardware (gcc version 4.6.1).

>> or, at least, this is what I am seeing here...
>>
>
> oh yeah, here is an example of the output (with a slight tweak to show
> the value held by the double):
> 1073741824 1073741823 41CFFFFFFFFFFFFF
> 1073741824 1073741823 41CFFFFFFFFFFFFF
> 262144 262143 410FFFFFFFFFFFFF
> 262144 262143 410FFFFFFFFFFFFF
> 262144 262143 410FFFFFFFFFFFFF
> 67108864 67108863 418FFFFFFFFFFFFF
> 4 3 400FFFFFFFFFFFFF
> 1073741824 1073741823 41CFFFFFFFFFFFFF
> 1073741824 1073741823 41CFFFFFFFFFFFFF
> 1073741824 1073741823 41CFFFFFFFFFFFFF
> 4194304 4194303 414FFFFFFFFFFFFF
> 4194304 4194303 414FFFFFFFFFFFFF
> 16384 16383 40CFFFFFFFFFFFFF
> 67108864 67108863 418FFFFFFFFFFFFF
> 16384 16383 40CFFFFFFFFFFFFF
>
> hmm, a lot of the same values seem to repeat...

They are more interesting in hex. (0x4, 0x4000, 0x40000 and so on).

If the result is reliable with 4 (i.e. if you take the loop out, set
j = 4 and find that j != k) it will be simple to see if it the +1.0 or
-1.0 that leads to the loss of precision.

However, this is not a problem for the C implementation because the
accuracy of floating point arithmetic is implementation defined. It may
be a problem for the chip, in that the specification might be that this
should not happen, but such error are very rare, so it's more likely to
be by design.

You said elsewhere that there might have been arithmetic being done.
Had that come right away, we could have cut the whole discussion!

--
Ben.

Miles Bader

unread,

Feb 13, 2012, 9:51:23 PM2/13/12

to

BGB <cr8...@hotmail.com> writes:
> so, this may not be a conversion bug, but more of an "integer
> arithmetic with doubles isn't exact" issue (leads to values ever
> slightly smaller than what they would need to be).

Integer arithmetic with doubles _is_ exact though, if the integers can
be exactly-represented as doubles (which appears to be the case here).

-miles

--
Custard, n. A vile concoction produced by a malevolent conspiracy of the hen,
the cow, and the cook.

Ben Bacarisse

unread,

Feb 13, 2012, 10:13:13 PM2/13/12

to

Miles Bader <mi...@gnu.org> writes:

> BGB <cr8...@hotmail.com> writes:
>> so, this may not be a conversion bug, but more of an "integer
>> arithmetic with doubles isn't exact" issue (leads to values ever
>> slightly smaller than what they would need to be).
>
> Integer arithmetic with doubles _is_ exact though, if the integers can
> be exactly-represented as doubles (which appears to be the case here).

I agree it should be, but is it a bug if it isn't? IEEE floating-point
mandates that the results of the basic arithmetic operators be exactly
rounded (i.e. the result is the closest representable number to the
mathematical result) but maybe the hardware in question does not claim
to conform to the IEEE spec.

Neither C nor C++ require such accuracy on their own (though an
implementation can claim to be using IEEE conforming floating-point) so
it's not a bug as far as the language is concerned either.

--
Ben.

BGB

unread,

Feb 13, 2012, 10:20:49 PM2/13/12

to

yep, that was the point...

>>> so, it would seem to be mostly an issue in this case of whether or not
>>> one does any arithmetic on the values (not sure whether or not this
>>> still counts). CPU is an "AMD Athlon II X4 630".
>
> I see nothing on my Intel hardware (gcc version 4.6.1).
>

GCC version: "GCC: (GNU) 4.5.1 20100924 (Red Hat 4.5.1-4)"

could be a HW specific issue.

I had to test to find out whether or not it was due to arithmetic.

I remembered seeing the issue, but didn't entirely remember the cause,
but originally it didn't seem particularly relevant, just something
worth fixing...

Ben Bacarisse

unread,

Feb 13, 2012, 10:26:24 PM2/13/12

to

BGB <cr8...@hotmail.com> writes:
<snip>

>> And this is no longer a mystery. If the +1.0 and -1.0 is producing a
>> non-integer result, then, yes, this fudge factor will repair it.
>>
>
> yep, that was the point...

Not originally. We've gone from "conversions aren't always exact"
(i.e. implementing C would be hard in this machine) "to arithmetic is
not as accurate as I'd like" which is not a issue for C implementations.

<snip>
--
Ben.

BGB

unread,

Feb 13, 2012, 10:39:29 PM2/13/12

to

On 2/13/2012 7:51 PM, Miles Bader wrote:
> BGB<cr8...@hotmail.com> writes:
>> so, this may not be a conversion bug, but more of an "integer
>> arithmetic with doubles isn't exact" issue (leads to values ever
>> slightly smaller than what they would need to be).
>
> Integer arithmetic with doubles _is_ exact though, if the integers can
> be exactly-represented as doubles (which appears to be the case here).
>

yes.

the issue may be partly a matter of HW though, as I am seeing it on my
HW (using an AMD chip), but apparently someone else is not seeing it
(with an Intel chip), but there does seem to be a pattern in the values
(apparently: 0x4, 0x40, 0x400, 0x4000, ..., so for whatever reason
integer results which should land on one of these values is off by a
tiny amount...).

it could be a minor issue of the "arithmetic with doubles may not be
exact even if the doubles represent integers" variety, which is odd, but
whatever (it can be compensated for by fudging the value).

in theory though, these sorts of calculations should probably be exact.

> -miles
>

BGB

unread,

Feb 13, 2012, 10:49:10 PM2/13/12

to

ok then, I failed to note the difference, and probably applied a
specific / special-case observation to the more general case than it
applied.

either way...

Eric Sosman

unread,

Feb 13, 2012, 11:04:23 PM2/13/12

to

On 2/13/2012 9:51 PM, Miles Bader wrote:
> BGB<cr8...@hotmail.com> writes:
>> so, this may not be a conversion bug, but more of an "integer
>> arithmetic with doubles isn't exact" issue (leads to values ever
>> slightly smaller than what they would need to be).
>
> Integer arithmetic with doubles _is_ exact though, if the integers can
> be exactly-represented as doubles (which appears to be the case here).

C doesn't actually guarantee this. It guarantees exact
conversion to an F-P type for all values the type can represent
exactly (for example, 42 must convert to exactly 42.0, not to
42.0000000000000010173 or some such), but it does not guarantee
that 42.0 (exact) plus 1.0 (exact) equals 43.0 (exact).

5.2.4.2.2p4: "The accuracy of the floating-point
operations (+, -, *, /) [...] is implementation-
defined. The implementation may state that the
accuracy is unknown."

C implementations that define __STDC_IEC_559__ provide
additional guarantees that may make your statement true -- for
those implementations. But as far as I can tell, it is not a
guarantee for C with "J. Random Floating-Point."

--
Eric Sosman
eso...@ieee-dot-org.invalid

James Kuyper

unread,

Feb 13, 2012, 11:05:45 PM2/13/12

to

On 02/13/2012 09:51 PM, Miles Bader wrote:
> BGB <cr8...@hotmail.com> writes:
>> so, this may not be a conversion bug, but more of an "integer
>> arithmetic with doubles isn't exact" issue (leads to values ever
>> slightly smaller than what they would need to be).
>
> Integer arithmetic with doubles _is_ exact though, if the integers can

> be exactly-represented as doubles ...

For IEEE double precision, 2^100 and 1 are both exactly representable,
so is the result of multiplying or dividing them, but their sum and
difference are not exactly representable.
--
James Kuyper

Miles Bader

unread,

Feb 13, 2012, 11:25:34 PM2/13/12

to

BGB <cr8...@hotmail.com> writes:
>> Integer arithmetic with doubles _is_ exact though, if the integers can
>> be exactly-represented as doubles (which appears to be the case here).
>
> yes.
>
> the issue may be partly a matter of HW though, as I am seeing it on my
> HW (using an AMD chip), but apparently someone else is not seeing it
> (with an Intel chip), but there does seem to be a pattern in the values
> (apparently: 0x4, 0x40, 0x400, 0x4000, ..., so for whatever reason
> integer results which should land on one of these values is off by a
> tiny amount...).

Any conventional PC-type system these days is going to use IEEE FP,
and if the system claims to support IEEE FP, it has to be exact. If
it isn't, it's a bug.

I did run your program on my AMD system (phenom I), and it showed no
output. It would be interesting to see somebody with an identical CPU
to yours try it...

> it could be a minor issue of the "arithmetic with doubles may not be
> exact even if the doubles represent integers" variety, which is odd, but
> whatever (it can be compensated for by fudging the value).

I'm not sure you could call it a minor issue. A lot of software
assumes that FP arithmetic is exact for integer values within a
certain range, and isn't going to do any fudging (because it shouldn't
be necessary, and would have a severe performance impact), so such a
system where fudging is necessary would have ... problems.

-miles

--
Clarionet, n. An instrument of torture operated by a person with cotton in his
ears. There are two instruments that are worse than a clarionet -- two
clarionets.

Miles Bader

unread,

Feb 14, 2012, 12:00:47 AM2/14/12

to

Eric Sosman <eso...@ieee-dot-org.invalid> writes:
>> Integer arithmetic with doubles _is_ exact though, if the integers can
>> be exactly-represented as doubles (which appears to be the case here).
>
> C doesn't actually guarantee this. It guarantees exact
> conversion to an F-P type for all values the type can represent
> exactly (for example, 42 must convert to exactly 42.0, not to
> 42.0000000000000010173 or some such), but it does not guarantee
> that 42.0 (exact) plus 1.0 (exact) equals 43.0 (exact).

Not C, but C-on-a-system-using-IEEE-FP, which is basically everything
mainstream. [In practice it's a pretty good bet that even wackier FP
hardware actually maintains the same constraint.]

Although C-the-language hedges its bets for extreme portability (and
to some degree, history: things were a lot more wild-n-wooly when C
was created), people writing the actual applications tend to be a bit
more practical, and _do_ assume things that aren't guaranteed by the
language, if the likelihood of that assumption being violated is
infinitesimal. I think this is a reasonable stance where the cost of
not making such assumptions is non-trivial.

-miles

--
XML is like violence. If it doesn't solve your problem, you're not
using enough of it.

Miles Bader

unread,

Feb 14, 2012, 12:01:26 AM2/14/12

to

James Kuyper <james...@verizon.net> writes:
>>> so, this may not be a conversion bug, but more of an "integer
>>> arithmetic with doubles isn't exact" issue (leads to values ever
>>> slightly smaller than what they would need to be).
>>
>> Integer arithmetic with doubles _is_ exact though, if the integers can
>> be exactly-represented as doubles ...
>
> For IEEE double precision, 2^100 and 1 are both exactly representable,
> so is the result of multiplying or dividing them, but their sum and
> difference are not exactly representable.

Well "the integers" should include the answer of course!

-miles

--
Accordion, n. An instrument in harmony with the sentiments of an assassin.

BGB

unread,

Feb 14, 2012, 1:22:04 AM2/14/12

to

On 2/13/2012 9:25 PM, Miles Bader wrote:
> BGB<cr8...@hotmail.com> writes:
>>> Integer arithmetic with doubles _is_ exact though, if the integers can
>>> be exactly-represented as doubles (which appears to be the case here).
>>
>> yes.
>>
>> the issue may be partly a matter of HW though, as I am seeing it on my
>> HW (using an AMD chip), but apparently someone else is not seeing it
>> (with an Intel chip), but there does seem to be a pattern in the values
>> (apparently: 0x4, 0x40, 0x400, 0x4000, ..., so for whatever reason
>> integer results which should land on one of these values is off by a
>> tiny amount...).
>
> Any conventional PC-type system these days is going to use IEEE FP,
> and if the system claims to support IEEE FP, it has to be exact. If
> it isn't, it's a bug.
>
> I did run your program on my AMD system (phenom I), and it showed no
> output. It would be interesting to see somebody with an identical CPU
> to yours try it...
>

I don't know.

in the past, the issue seemed specific to Linux x86-64 (and SSE), but I
am also currently getting the same results from a 32-bit Windows program
(using the x87 FPU, compiled with GCC).

testing with MSVC, the results slightly are different (only 4 different
values show up), but the basic issue is still present.

>> it could be a minor issue of the "arithmetic with doubles may not be
>> exact even if the doubles represent integers" variety, which is odd, but
>> whatever (it can be compensated for by fudging the value).
>
> I'm not sure you could call it a minor issue. A lot of software
> assumes that FP arithmetic is exact for integer values within a
> certain range, and isn't going to do any fudging (because it shouldn't
> be necessary, and would have a severe performance impact), so such a
> system where fudging is necessary would have ... problems.
>

potentially, yes.

however, other than this, the computer seems to work fine (nothing is
obviously acting buggy or crashing...).

Robert Wessel

unread,

Feb 14, 2012, 1:24:09 AM2/14/12

to

The explanation for this, at least, seems simple. You're overflowing
the dynamic range of a double. 32767*32767*100000000 needs a bit more
than 56 bits to represent exactly. That doesn't fit in a double
(assuming an IEEE double, of course). Nor do the results of that
number plus or minus 1.0. If the ints on your system are 32 bit,
there should be no problem, as the number will be reduced to 32 bits
first, if they're 64 bits, you'll have round off error.

I've not looks through all the combinations, but the compilers could
be setting rounding differently, and even for x87 and SSE2
instructions rounding can be set differently at the same time.
Rounding can be different between the x87 store-as-int (FIST rounds
according to the selected x87 rounding mode), SSE2 convert-to-int
(CVTTSD2S always truncates towards zero) and software convert-to-int
(which can do whatever the implementers fancied). Those different
rounding modes could to a lesser or greater degree be masking the
problem. Another variable is that on some implementations, the x87 FP
calculations are done in 80-bit format, at least some of the time, so
you might see no round off errors at all in those cases.

BGB

unread,

Feb 14, 2012, 1:42:33 AM2/14/12

to

this is on x86 and x86-64, and in both cases "int" is 32 bits.
storing the expression into an int essentially truncates it to 32-bits.

if it were a "long long", it would be a different matter.

> I've not looks through all the combinations, but the compilers could
> be setting rounding differently, and even for x87 and SSE2
> instructions rounding can be set differently at the same time.
> Rounding can be different between the x87 store-as-int (FIST rounds
> according to the selected x87 rounding mode), SSE2 convert-to-int
> (CVTTSD2S always truncates towards zero) and software convert-to-int
> (which can do whatever the implementers fancied). Those different
> rounding modes could to a lesser or greater degree be masking the
> problem. Another variable is that on some implementations, the x87 FP
> calculations are done in 80-bit format, at least some of the time, so
> you might see no round off errors at all in those cases.

yeah.

dunno the what exactly is the issue, but it is probably fairly minor
given the lack of any obvious problems...

as noted, when I had seen it before, I had simply assumed that it was
some basic property of floating-point behavior and fudged it. if other
people have been seeing similar behavior, it is possible maybe such
fudging is fairly common?...

Robert Wessel

unread,

Feb 14, 2012, 1:45:17 AM2/14/12

to

On Tue, 14 Feb 2012 14:00:47 +0900, Miles Bader <mi...@gnu.org> wrote:

>Eric Sosman <eso...@ieee-dot-org.invalid> writes:
>>> Integer arithmetic with doubles _is_ exact though, if the integers can
>>> be exactly-represented as doubles (which appears to be the case here).
>>
>> C doesn't actually guarantee this. It guarantees exact
>> conversion to an F-P type for all values the type can represent
>> exactly (for example, 42 must convert to exactly 42.0, not to
>> 42.0000000000000010173 or some such), but it does not guarantee
>> that 42.0 (exact) plus 1.0 (exact) equals 43.0 (exact).
>
>Not C, but C-on-a-system-using-IEEE-FP, which is basically everything
>mainstream. [In practice it's a pretty good bet that even wackier FP
>hardware actually maintains the same constraint.]

Apparently you never tried to multiply or divide on any early Crays.
;-)

Multiplication could be off by a full ULP. So products that exactly
filled all 48 bits with significant bits often produced slightly odd
results.

The divides tended to curdle the low few bits of the quotient. And
there were definitely divisions where the dividend, divisor and
quotient all had exact representations, but the quotient was a several
ULPs off.

The basic problem was that division was implemented with the usual
reciprocal approximation (hardware) and an iteration of Newton-Raphson
(software). That produced a bit less than 47 bits of precision. In a
float with a 48 bit mantissa.

An extra iteration of N-R wouldn't have helped, since rounding was
definitely odd on the Cray, and multiplication had issues in those bit
ranges too.

It's been said that numerical analysis on the early Crays was a full
contact sport.

MikeWhy

unread,

Feb 14, 2012, 3:56:05 AM2/14/12

to

The MSVC warnings are rather explicit:
"... possible loss of data".
'initializing' : truncation from 'const uint_64' to 'const double'
'initializing' : conversion from 'const double' to 'const uint_64', possible
loss of data

Z:\Foo>fooFP.exe
sizeof(uint_64) ==> 8
std::numeric_limits<double>::digits = 53
std::numeric_limits<uint_64>::digits = 64
(18446744073709551615 == 9223372036854775808) ==> false
(18446744073709551615 == 9223372036854775808) ==> false

On gcc:

mikey@boatVPM-Linux:~/Foo$ g++ --version
g++ (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

mikey@boatVPM-Linux:~/Foo$ g++ -std=c++0x -Wall -pedantic fooFP.cpp
fooFP.cpp: In function âint main()â:
fooFP.cpp:21:30: warning: overflow in implicit constant conversion
[-Woverflow]
mikey@boatVPM-Linux:~/Foo$ ./a.out
sizeof(uint_64) ==> 8
std::numeric_limits<double>::digits = 53
std::numeric_limits<uint_64>::digits = 64
(18446744073709551615 == 0) ==> false
(18446744073709551615 == 18446744073709551615) ==> true
mikey@boatVPM-Linux:~/Foo$

The first case was similar to what you had written it. The second case used
'const uint_64' and const double as hints to the compiler. gcc actually
managed to get the second case right. I don't know what MSVC was thinking.
The integer value evaluated to (2^64 - 1) as expected. Converting back to
ULL from double produced 2^63. It apparently stored 1/2 * 2^64 as the FP
value, which is a surprising conversion (but still within the definition of
undefined behavior).

> however, other than this, the computer seems to work fine (nothing is
> obviously acting buggy or crashing...).

??!! It's a coding error. The compiler recognized the error and printed
diagnostics, warning about truncation or overflow in the conversion.
Comparing std::numeric_limits<double>::digits against
std::numeric_limits<unsigned long long>::digits makes clear the nature of
the problem.

Keith Thompson

unread,

Feb 14, 2012, 6:01:18 AM2/14/12

to

Ben Bacarisse <ben.u...@bsb.me.uk> writes:
> BGB <cr8...@hotmail.com> writes:
[...]

>> as I understand it, the entire range of 32 bit integers can be exactly
>> represented by a double.
>
> In the architecture in question, yes.

[...]

In any conforming C implementation; the requirements on double are such
that it can represent any 32-bit integer value exactly.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

James Kuyper

unread,

Feb 14, 2012, 6:48:04 AM2/14/12

to

On 02/14/2012 01:24 AM, Robert Wessel wrote:
> On Mon, 13 Feb 2012 18:31:48 -0700, BGB <cr8...@hotmail.com> wrote:

...

>> #include <stdio.h>
>>
>> int main()
>> {
>> double d;
>> int i, j, k;
>>
>> for(i=0; i<100000000; i++)
>> {
>> j=rand()*rand()*i;
>> d=j;
>> d=d+1.0; //(1)
>> d=d-1.0; //(1)
>> k=d; //(2)
>> k=(d>=0?(int)(d+0.0001):(int)(d-0.0001)); //(2)
>> if(j!=k)
>> printf("%d %d\n", j, k);
>> }
>> }
>>
>> 1: if these lines are commented out, then the printf is never called,
>> but if uncommented (along with using different constant values), then I
>> start seeing messages (with it off-by-one, rounded towards 0).

That's to be expected. As i gets larger, for some of the values that can
be returned by rand(),

>>
>> 2: if I switch to the second form, which makes the fudging, then the
>> messages disappear (they still appear with the first form).
>>
>> so, it would seem to be mostly an issue in this case of whether or not
>> one does any arithmetic on the values (not sure whether or not this
>> still counts). CPU is an "AMD Athlon II X4 630".
>>
>> or, at least, this is what I am seeing here...

...

> The explanation for this, at least, seems simple. You're overflowing
> the dynamic range of a double. 32767*32767*100000000 needs a bit more
> than 56 bits to represent exactly.

That number is 32767*32767*5^8 * 2^8. Representing the first part
exactly only requires 48 bits, and the 2^8 just shifts the exponent.
However, the actual maximum value that d can have is
32767*32767*99999999, and that does require 56 bits to represent exactly.

> ... That doesn't fit in a double

> (assuming an IEEE double, of course). Nor do the results of that

> number plus or minus 1.0. ...

That part is true, even when 'i' has a large power of 2 among it's factors.

There's no mystery about these results anymore - he's looking at integer
values that are too large for N and N+1 to both be exactly representable.
--
James Kuyper

Ben Bacarisse

unread,

Feb 14, 2012, 8:51:19 AM2/14/12

to

James Kuyper <james...@verizon.net> writes:

> On 02/14/2012 01:24 AM, Robert Wessel wrote:
>> On Mon, 13 Feb 2012 18:31:48 -0700, BGB <cr8...@hotmail.com> wrote:
> ...
>>> #include <stdio.h>
>>>
>>> int main()
>>> {
>>> double d;
>>> int i, j, k;
>>>
>>> for(i=0; i<100000000; i++)
>>> {
>>> j=rand()*rand()*i;
>>> d=j;
>>> d=d+1.0; //(1)
>>> d=d-1.0; //(1)
>>> k=d; //(2)
>>> k=(d>=0?(int)(d+0.0001):(int)(d-0.0001)); //(2)

[The reported error come when the above line is commented out]

>>> if(j!=k)
>>> printf("%d %d\n", j, k);
>>> }
>>> }

<snip>

>> ... That doesn't fit in a double
>> (assuming an IEEE double, of course). Nor do the results of that
>> number plus or minus 1.0. ...
>
> That part is true, even when 'i' has a large power of 2 among it's factors.
>
> There's no mystery about these results anymore - he's looking at integer
> values that are too large for N and N+1 to both be exactly
> representable.

That's not what's happening. On the system in question, int has only
32 bits. The large and possibly overflowing value is assigned to an
int first. This conversion is implementation defined, but it can't make
j too large to represented exactly in d. In fact, one of his reported
error cases was caused when j == 4. This *must* set d to 4.0 (and it
did). The subsequent +1.0 and -1.0 and conversion back to int produces
a value not == 4.

I agree there is no mystery, but it's not using integers too large for
double -- it's floating-point arithmetic is not exact.

[Aside: since the integer arithmetic can overflow, technically the
program has undefined behaviour. So while d may be assigned 4.0, say,
it is quite permissible (from C's point of view) for it to become 5.0 or
109.2 or any other value at any time. I'm assuming a sane
implementation.]

--
Ben.

glen herrmannsfeldt

unread,

Feb 14, 2012, 9:22:25 AM2/14/12

to

In comp.lang.c++ Robert Wessel <robert...@yahoo.com> wrote:

(snip, someone wrote)

>>Not C, but C-on-a-system-using-IEEE-FP, which is basically everything
>>mainstream. [In practice it's a pretty good bet that even wackier FP
>>hardware actually maintains the same constraint.]

> Apparently you never tried to multiply or divide on any early Crays.
> ;-)

> Multiplication could be off by a full ULP. So products that exactly
> filled all 48 bits with significant bits often produced slightly odd
> results.

Is that also the one with the non-commutative multiplication?

-- glen

glen herrmannsfeldt

unread,

Feb 14, 2012, 9:27:08 AM2/14/12

to

In comp.lang.c++ Robert Wessel <robert...@yahoo.com> wrote:

(snip)

> I've not looks through all the combinations, but the compilers could
> be setting rounding differently, and even for x87 and SSE2
> instructions rounding can be set differently at the same time.
> Rounding can be different between the x87 store-as-int (FIST rounds
> according to the selected x87 rounding mode), SSE2 convert-to-int
> (CVTTSD2S always truncates towards zero) and software convert-to-int
> (which can do whatever the implementers fancied). Those different
> rounding modes could to a lesser or greater degree be masking the
> problem. Another variable is that on some implementations, the x87 FP
> calculations are done in 80-bit format, at least some of the time, so
> you might see no round off errors at all in those cases.

The x87 registers are always 80 bit, with 64 bit significand.

The result is that intermediate values are often kept to 64
significant bits, while those stored to memory are (usually
rounded) to 53. In higher optimization modes, compilers will
keep values in registers longer.

There are mode bits that specify the precision, but they don't
apply to all operations. The extra precision is supposed to be
good, but the variability (not knowing which are stored in
between and which aren't) can be surprising.

-- glen

BGB

unread,

Feb 14, 2012, 10:42:18 AM2/14/12

to

but, if you store it into a 32 bit integer (first), all the high-order
bits end up cut off anyways. a desktop PC doesn't have any such 64-bit
"int" type, so the above can't be the issue.

>> ... That doesn't fit in a double
>> (assuming an IEEE double, of course). Nor do the results of that
>> number plus or minus 1.0. ...
>
> That part is true, even when 'i' has a large power of 2 among it's factors.
>
> There's no mystery about these results anymore - he's looking at integer
> values that are too large for N and N+1 to both be exactly representable.

no, that is not the problem here.

I suspect results would be far more obvious if something like this was
going on (not just a few misses, but an absurdly long list of misses,
all with huge values...).

BGB

unread,

Feb 14, 2012, 11:10:25 AM2/14/12

to

On 2/14/2012 7:27 AM, glen herrmannsfeldt wrote:
> In comp.lang.c++ Robert Wessel<robert...@yahoo.com> wrote:
>
> (snip)
>> I've not looks through all the combinations, but the compilers could
>> be setting rounding differently, and even for x87 and SSE2
>> instructions rounding can be set differently at the same time.
>> Rounding can be different between the x87 store-as-int (FIST rounds
>> according to the selected x87 rounding mode), SSE2 convert-to-int
>> (CVTTSD2S always truncates towards zero) and software convert-to-int
>> (which can do whatever the implementers fancied). Those different
>> rounding modes could to a lesser or greater degree be masking the
>> problem. Another variable is that on some implementations, the x87 FP
>> calculations are done in 80-bit format, at least some of the time, so
>> you might see no round off errors at all in those cases.
>
> The x87 registers are always 80 bit, with 64 bit significand.
>

yes.

the mystery here is this:
on an older core (an "Athlon 64 X2"), I was only seeing this issue when
using SSE for the arithmetic (on Linux x86-64). this was because on
64-bit targets, people said NO to x87, and switched over to using almost
entirely SSE for floating-point math.

on my newer CPU ("Althon II X4"), it appears I am seeing the same
behaviors from both SSE and x87.

someone is not seeing any issues with an "AMD Phenom", which is more
curious as both chips use an "AMD K10" based core (in my case, the
"Propus" core).

apparently, it still has separate x87 and SSE (unlike, say, the newer
"Bulldozer" core), so it is mildly curious.

> The result is that intermediate values are often kept to 64
> significant bits, while those stored to memory are (usually
> rounded) to 53. In higher optimization modes, compilers will
> keep values in registers longer.
>
> There are mode bits that specify the precision, but they don't
> apply to all operations. The extra precision is supposed to be
> good, but the variability (not knowing which are stored in
> between and which aren't) can be surprising.
>

possibly, but theory says it "should fit" in the intermediate forms as
well (when saved out to double, say, both "16384" and "16385" should be
exactly representable).

it apparently does fit for the vast majority of values, but for whatever
reason for certain values it comes out "slightly off".

but, whatever, I am ending up wasting too much time here thinking about
it...

BGB

unread,

Feb 14, 2012, 11:13:44 AM2/14/12

to

you sure we are talking about the same piece of code?...
my example wasn't using any "uint_64", just a plain "int" type (which is
32 bits).

MikeWhy

unread,

Feb 14, 2012, 12:16:08 PM2/14/12

to

"BGB" <cr8...@hotmail.com> wrote in message
news:jhe19u$fi7$1...@news.albasani.net...

> On 2/14/2012 1:56 AM, MikeWhy wrote:
>> The MSVC warnings are rather explicit:
>> "... possible loss of data".
>> 'initializing' : truncation from 'const uint_64' to 'const double'
>> 'initializing' : conversion from 'const double' to 'const uint_64',
>> possible loss of data
>>
>
> you sure we are talking about the same piece of code?...
> my example wasn't using any "uint_64", just a plain "int" type (which is
> 32 bits).

No, I'm not sure at all. At some point, the conversation turned it into
64-bit unsigned, possibly only in my head.

Both gcc and MSVC get this right:

const int_64 ifoo = -1LL;
{
double foox = ifoo;
uint_64 foo2 = foox;

std::cout << '(' << (uint_64)ifoo << " == " << foo2 << ") ==> "
<< ((uint_64)ifoo == foo2 ? "true" : "false")
<< '\n';
}

... but, both get the following wrong:
{
uint_64 foo = ifoo;
double foox = foo;
uint_64 foo2 = foox;

std::cout << '(' << foo << " == " << foo2 << ") ==> "
<< (foo == foo2 ? "true" : "false")
<< '\n';
}

gcc: (18446744073709551615 == 0) ==> false
MSVC: (18446744073709551615 == 9223372036854775808) ==> false

... and gcc gets it right if the values are marked const:
{
const uint_64 foo = ifoo;
const double foox = foo;
const uint_64 foo2 = foox;

std::cout << '(' << foo << " == " << foo2 << ") ==> "
<< (foo == foo2 ? "true" : "false")
<< '\n';
}

gcc: (18446744073709551615 == 18446744073709551615) ==> true
MSVC: (18446744073709551615 == 9223372036854775808) ==> false

gcc apparently keeps enough precision for the const double initializer
value, possibly as a long double. MSVC obviously does not.

All the same, both compilers print diagnostics about signed/unsigned
mismatches; possible loss of data in conversions; truncation/overflow in
conversion.

Lowell Gilbert

unread,

Feb 14, 2012, 12:53:38 PM2/14/12

to

Keith Thompson <ks...@mib.org> writes:

> Ben Bacarisse <ben.u...@bsb.me.uk> writes:
>> BGB <cr8...@hotmail.com> writes:
> [...]
>>> as I understand it, the entire range of 32 bit integers can be exactly
>>> represented by a double.
>>
>> In the architecture in question, yes.
> [...]
>
> In any conforming C implementation; the requirements on double are such
> that it can represent any 32-bit integer value exactly.

Because of the Usual Arithmetic Conversions,
or is there some other reason?

--
Lowell Gilbert, embedded/networking software engineer
http://be-well.ilk.org/~lowell/

Robert Wessel

unread,

Feb 14, 2012, 2:06:25 PM2/14/12

to

Yep. Reverse the operands, get a different "round off" error.

BGB

unread,

Feb 14, 2012, 2:27:22 PM2/14/12

to

On 2/14/2012 10:16 AM, MikeWhy wrote:
>
> "BGB" <cr8...@hotmail.com> wrote in message
> news:jhe19u$fi7$1...@news.albasani.net...
>> On 2/14/2012 1:56 AM, MikeWhy wrote:
>>> The MSVC warnings are rather explicit:
>>> "... possible loss of data".
>>> 'initializing' : truncation from 'const uint_64' to 'const double'
>>> 'initializing' : conversion from 'const double' to 'const uint_64',
>>> possible loss of data
>>>
>>
>> you sure we are talking about the same piece of code?...
>> my example wasn't using any "uint_64", just a plain "int" type (which
>> is 32 bits).
>
> No, I'm not sure at all. At some point, the conversation turned it into
> 64-bit unsigned, possibly only in my head.
>

fair enough, we were talking about different pieces of code with
different behaviors (the "64-bit unsigned" was from a different part of
the thread).

James Kuyper

unread,

Feb 14, 2012, 2:39:58 PM2/14/12

to

On 02/14/2012 12:53 PM, Lowell Gilbert wrote:
> Keith Thompson <ks...@mib.org> writes:
>
>> Ben Bacarisse <ben.u...@bsb.me.uk> writes:
>>> BGB <cr8...@hotmail.com> writes:
>> [...]
>>>> as I understand it, the entire range of 32 bit integers can be exactly
>>>> represented by a double.
>>>
>>> In the architecture in question, yes.
>> [...]
>>
>> In any conforming C implementation; the requirements on double are such
>> that it can represent any 32-bit integer value exactly.
>
> Because of the Usual Arithmetic Conversions,
> or is there some other reason?

Those rules tell you what conversions must occur; they don't tell you
what the results must be, and in particular they don't tell you anything
about which values can be represented exactly.

Section 5.2.4.2.2, "Characteristics of floating types" describes a model
for a floating point representation. An implementation is not required
to use precisely that representation, but it must use one that's
sufficiently similar that it's characteristics can be described in terms
of that model. In that model, the smallest integer that cannot be
represented exactly is pow(b,p+1)+1. Paragraph 11 requires that
FLT_RADIX, (corresponding to 'b' in the model) must be an integer with a
value of at least 2. Paragraph 13 requires that DBL_EPSILON
(corresponding to pow(b,1-p) in the model) must be 1e-9 or smaller,
which imposes a minimum value of p for any given b.

The description of 'p' implies, but does not explicitly require, that it
is an integer. I can't figure out how a non-integral value for 'p' could
be implemented. With integer values for both b and p, the smallest
value you can get for an unrepresentable integer occurs for b==2 and
p==31, for which DBL_EPSILON is 2^-30; the corresponding smallest
unrepresentable integer is 2^32+1.

BGB

unread,

Feb 14, 2012, 2:40:23 PM2/14/12

to

seems like fun...

also reminds me of a tricks used to implement fixed-point multiplies:
if the intermediate result wont fit in the word, shift one or both of
the arguments right some to make the result smaller.

this was a way to do fast/simple fixed point math without needing a
larger register to hold the intermediate results, but would also cost
some in terms of accuracy.

makes me wonder if maybe Cray was doing something similar?... (say,
essentially just discarding low-order bits from the operands in an
uneven manner?).

glen herrmannsfeldt

unread,

Feb 14, 2012, 2:58:33 PM2/14/12

to

In comp.lang.c++ BGB <cr8...@hotmail.com> wrote:

(snip)

> but, if you store it into a 32 bit integer (first), all the high-order
> bits end up cut off anyways. a desktop PC doesn't have any such 64-bit
> "int" type, so the above can't be the issue.

IA32 doesn't do it in one instruction, but many compilers targeting
the 32 bit IA32 have a 64 bit (long long) data type. Add and
subtract can usually be done inline, with two instructions.
Multiply and divide might be done as subroutine call, but many
will recognize the case of 32 bit operands cast to (long long),
and use a single multiply instruction with 64 bit product.

I never tested for it, but they might also recognize the case of
a 64 bit dividend and 32 bit divisor, generating a 32 bit quotient.

(But a large fraction of desktop PCs now have 64 bit processors.)

-- glen

glen herrmannsfeldt

unread,

Feb 14, 2012, 3:02:50 PM2/14/12

to

In comp.lang.c++ BGB <cr8...@hotmail.com> wrote:

(snip, I wrote)

>> The x87 registers are always 80 bit, with 64 bit significand.

> yes.

> the mystery here is this:
> on an older core (an "Athlon 64 X2"), I was only seeing this
> issue when using SSE for the arithmetic (on Linux x86-64).
> this was because on 64-bit targets, people said NO to x87,
> and switched over to using almost entirely SSE for
> floating-point math.

There is another effect to watch for, which I remember became
obvious when testing for the Pentium FDIV bug. Many compilers
evaluate constant expressions at compile time using different
arithmetic than they do at run time. Specifically, they did
not use FDIV, either at compile time or run time when given
a constant expression.

Many of the expressions in this thread were constants known
at compile time, which the compiler could be evaluating.

-- glen

Joe keane

unread,

Feb 14, 2012, 3:45:45 PM2/14/12

to

In article <jhc1g6$afj$1...@dont-email.me>,
MikeWhy <boat042...@yahoo.com> wrote:
>Still wondering here about the "up for grabs" part. It seems to imply
>some edge condition that isn't accounted for.

a)

machine has internal format that is superset of 'long' and 'double'

b)

compiler notice the code does
long -> double -> long
and say 'screw that, don't do nothing'[1]

[1] in general, people don't get mad if you give -more- precision than
that required, just supply a 'yes i'm a masochist' option

glen herrmannsfeldt

unread,

Feb 14, 2012, 5:57:20 PM2/14/12

to

In comp.lang.c++ Joe keane <j...@panix.com> wrote:

(snip)

> a) machine has internal format that is superset of 'long' and 'double'

> b) compiler notice the code does: long -> double -> long
> and say 'screw that, don't do nothing'[1]

> [1] in general, people don't get mad if you give -more- precision than
> that required, just supply a 'yes i'm a masochist' option

gcc has the --float-store option, requiring it to store intermediate
values and refetch them (hopefully in cache) to avoid the excess
precision problem. I don't know if that also applied to compile time
expression evaluation, though.

There are some algorithms that require a consistent precision,
and that will fail if sometimes given more.

-- glen

Keith Thompson

unread,

Feb 14, 2012, 6:38:09 PM2/14/12

to

j...@panix.com (Joe keane) writes:
[...]

> [1] in general, people don't get mad if you give -more- precision than
> that required, just supply a 'yes i'm a masochist' option

People will sometimes get quite mad if you give them *inconsistent*
precision.

glen herrmannsfeldt

unread,

Feb 14, 2012, 7:22:03 PM2/14/12

to

In comp.lang.c++ Keith Thompson <ks...@mib.org> wrote:

> j...@panix.com (Joe keane) writes:

>> [1] in general, people don't get mad if you give -more- precision than
>> that required, just supply a 'yes i'm a masochist' option

> People will sometimes get quite mad if you give them *inconsistent*
> precision.

Well, for the long story, when the 8087 was developed, the idea
was for an infinite register stack. When it was full, there
would be an interrupt, some would be spilled to memory, and
then continue on. The chip was designed and built before anyone
tried to write the software to do it, when it was found
not to be possible.

Why it wasn't fixed for the 80287 and 80837, I don't know.

That would, at least, have allowed for intermediate values
to always have extra precision.

Now, when you add optimizations, such as common subexpression
elimination, and keep values in registers between statements,
then again you have inconsistent precision. The infinite
stack would have allowed for more consistency.

As an example of what is possible, without actually seeing
a compiler do it, if you:

X=A*B+C*D;
Y=A*B-C*D;

The compiler might keep C*D with extra precision, but not A*B.

Now, if A=C and B=D, Y should be zero but it isn't.

With an infinite stack, one might hope that the intermediates
for A*B and C*D had the same precision, extra or not.

-- glen

Robert Wessel

unread,

Feb 15, 2012, 1:39:38 AM2/15/12

to

On Wed, 15 Feb 2012 00:22:03 +0000 (UTC), glen herrmannsfeldt
<g...@ugcs.caltech.edu> wrote:

>In comp.lang.c++ Keith Thompson <ks...@mib.org> wrote:
>> j...@panix.com (Joe keane) writes:
>
>>> [1] in general, people don't get mad if you give -more- precision than
>>> that required, just supply a 'yes i'm a masochist' option
>
>> People will sometimes get quite mad if you give them *inconsistent*
>> precision.
>
>Well, for the long story, when the 8087 was developed, the idea
>was for an infinite register stack. When it was full, there
>would be an interrupt, some would be spilled to memory, and
>then continue on. The chip was designed and built before anyone
>tried to write the software to do it, when it was found
>not to be possible.
>
>Why it wasn't fixed for the 80287 and 80837, I don't know.
>
>That would, at least, have allowed for intermediate values
>to always have extra precision.

I believe the original plan was to do the spills in hardware, but they
didn't have the die space, so they decided to punt to software. But
as you said, they never tried doing it until it was too late. The
problems center around being able to handle multiple exceptional
conditions at once - basically you can't quite figure out the exact
state of the 8087 at the point where it generates the stack overflow.
Once the interrupt happens, several possible exceptions have gotten
merged together.

And no, it was never fixed.

glen herrmannsfeldt

unread,

Feb 15, 2012, 3:00:13 AM2/15/12

to

In comp.lang.c++ Robert Wessel <robert...@yahoo.com> wrote:

(snip on 8087 stack overflow detection)

> I believe the original plan was to do the spills in hardware, but they
> didn't have the die space, so they decided to punt to software. But
> as you said, they never tried doing it until it was too late. The
> problems center around being able to handle multiple exceptional
> conditions at once - basically you can't quite figure out the exact
> state of the 8087 at the point where it generates the stack overflow.
> Once the interrupt happens, several possible exceptions have gotten
> merged together.

> And no, it was never fixed.

Multiple imprecise interrupts. The IBM 360/91 allows for out of
order execution. If a program interrupt occurs, it has to finish
all instructions that are in the pipeline, possibly generating
more exceptions. Also, the PSW might be nowhere near the actual
cause. Probably too long ago for C++, though. I know one was
running at least until 1981.

This was 'fixed' in S/370, as it is hard to do virtual
storage if you don't know where the page fault occurred.

-- glen