A New Hardware/Software Pentium FDIV Workaround

Cleve Moler

unread,

Dec 1, 1994, 12:12:02 AM12/1/94

to

Greetings.

Last week, I suggested a hardware/software workaround for the Pentium
FDIV bug. And, I promised that we would release a Pentium-aware
version of MATLAB that incorporated the workaround. Since then,
I have heard from several people suggesting improvements. In
particular, Terje Mathisen and Tim Coe have made such good suggestions
that I want to consider a different algorithm. I have enough
confidence in this new algorithm that we are now in the process
of using it for the Pentium-aware MATLAB.

Mathisen is the PC programming expert from Norsk Hydro in Norway
who confirmed Thomas Nicely's original bug report, and who wrote
P87TEST. Tim Coe is the semiconductor design engineer from
Vitesse Semiconductor whose model of the chip's behavior led him
to examples of the worse-case behavior, including the
4195835/3145727 example.

With this posting, the three of us are providing a status report
on a more ambitious project. We believe it should be possible
to modify software so that it can run on defective Pentiums
and provide IEEE compliant floating point arithmetic with very
little, if any, degradation in speed.

This is work in progress. We're not absolutely sure yet that it
is guaranteed to do everything we want. We will continue to
study, refine and check the ideas and the code. (I particularly
want to thank Audrey Benevento and Marc Ullman for their
contributions to our work here at the MathWorks.)

The algorithm that I was so enthusiastic about a week ago starts
with an FDIV to compute z = x/y. Then the residual, r = x - y*z,
is computed. If the residual is "small", z is deemed to be OK.
If not, x and y are both scaled by 3/4, and the process is repeated.
It is very difficult to define "small" in such a way that the
resulting z is certainly the correctly rounded result. Our
new algorithm completely avoids such delicate decisions.

One personal comment: As far as I am concerned, the activity in
the comp.sys.intel newsgroup over the past three weeks has been
a mixed blessing. It brought this problem to the attention of
a wide audience quickly. It hooked me up with experts like
Mathisen and Coe that I would never meet otherwise. It definitely
contributed to the creation of the solution we are proposing.
But now I am finding it impossible to read everything being
posted in the group. In fact, it has become counter productive.
I may be missing some important posts, buried in among all the
stuff of questionable technical merit.

Years ago, Alex Chorin, a math professor at Berkeley, began a
book review with one of my all-time favorite quotes,

"This book detracts from the sum total of human knowledge"

I am afraid we may have now reached that point with the activity
in comp.soft-sys.intel. I will be looking to comp.soft-sys.matlab
and sci.math.num-analysis for future discussion of the aspects
of this problem that I am interested in.

-- Cleve Moler

-----------------

A New Hardware/Software Pentium FDIV Workaround

Tim Coe, c...@vitsemi.com
Terje Mathisen, Terje.M...@hda.hydro.com
Cleve Moler, mo...@mathworks.com

One of us (Coe) has developed a model which explains all the known
examples of FDIV errors. We believe we now understand the bit
patterns well enough that we can identify certain bands in the space
of floating point numbers which are "at risk". The test involves
looking only at the denominator, although the numerator also affects
whether or not an error actually occurs. If the denominator is found
to be outside the at-risk bands, the FDIV instruction can be safely
used to produce the correctly rounded IEEE result. If the denominator
is in one of the at-risk bands, then scaling both the numerator and
denominator by 15/16 eliminates the risk. Moreover, if the scaling
and subsequent FDIV instruction are carefully carried out in the
extended precision format, it is still possible to produce the
correctly rounded result.

No attempt is made to predict before the FDIV whether or not an error
will actually occur, and no attempt is made after the FDIV to assess
the size of the residual.

Where, exactly, are the bands of at-risk denominators used in our
new algorithm? Between each two powers of two, Coe specifies five
small intervals to avoid. Consider the interval 4096 <= y < 8192.
For values of y in this range, the intervals to avoid have width one.

Here they are, in decimal and in hex.

4607. 4607.999999...
5375. 5375.999999...
6143. 6143.999999...
6911. 6911.999999...
7679. 7679.999999...

40b1ff0000000000 40b1ffffffffffff
40b4ff0000000000 40b4ffffffffffff
40b7ff0000000000 40b7ffffffffffff
40baff0000000000 40baffffffffffff
40bdff0000000000 40bdffffffffffff

A picture of the interval would look something like this. Each of
the five *'s represents a subinterval to be avoided. The length of
each subinterval is 1/4096-th of the length of the overall length.

[.....*........*........*........*........*.......)

It is interesting to note that scaling the third subinterval by 3/4
sends it into the first subinterval. This is another weakness of
Moler's original proposal. But scaling by 15/16 shifts each
subinterval to a safe region.

Think of the 64 bit floating point representation of the denominator as
consisting of 16 four-bit half bytes, or "nibbles." These are represented
by the 16 characters in the hex printout. Also, think of the binary
point as being just after the third byte or sixth nibble. That's between
the pair of f's and the string of zeros in the hex numbers given above.
Then, with the hidden bit and the binary point shown explicitly, the
numbers are of the form:

1aff.xxxxxxxxxx * 2^e

The algorithm says to avoid denominators which have eight consecutive
ones in the positions indicated by ff and a = 1, 4, 7, 10, or 13 in the
first nibble of the fraction. The exponent and the low order 40 bits
of the fraction are not involved in the decision.

Here is some pseudo C. The input is a pair of 64 bit double precision
floating point numbers, x and y. The output is intended to be the 64 bit
double precision floating point number which is closest to the exact
quotient x/y.

double x,y,z
long double xx,yy,zz
char ychar[8]
nibble ynib[16], a
/* Use same 64-bit storage for y, ychar and ynib */

/* Check for 8 consecutive ones

if (ychar[3] = 0xff) {

/* Check for the five bands

a = ynib[4]
if (a == 1) || (a == 4) || (a == 7) || (a == 10) || (a == 13) {
xx = 0.9375*x
yy = 0.9375*y
zz = xx/zz
z = zz
return
}
}

z = x/y
return

This is probably best implemented in assembly language, both because we
want it to be fast, and because some C compilers don't give us proper
access to the long double format.

Here is today's (11/29) version of the assembly code written by one of
us (Mathisen). It is intended to be used with the in-lining facility of
the WATCOM 10 C compiler. Its unique feature is a 32 entry table which
drives the test. The table can be set to all zeros for chips without
the bug and to the appropriate nonzero quantities if a defective
chip is detected at startup. The code also deals with denormals,
which we have so far ignored in this discussion.

We include the code here primarily to show our approach. It is not
expected to be the final, complete solution. Anybody interested in
details should contact Terje.M...@hda.hydro.com via email.

#include <math.h>
#include <stdio.h>

static float _fdiv_scale = 0.9375;
static char _fdiv_risc[32] = {0};

long double lfdiv(long double x, long double y);
#pragma aux lfdiv = \
" sub esp,12"\
" fld st"\
" fstp tbyte ptr [esp]"\
" mov eax,[esp+4]"\
" shr eax,15"\
" cmp al,255"\
" jne ok"\
" mov al,ah"\
" and eax,31"\
" cmp _fdiv_risc[eax],ah"\
" jz ok"\
" fld [_fdiv_scale]"\
" fmul st(2),st"\
" fmulp st(1),st"\
"ok:"\
" fdivp st(1),st"\
" add esp,12"\
parm reverse [8087] [8087]\
modify nomemory exact [eax 8087]\
value [8087];

void _fdiv_risc_init(void)
{
int i;
if (fdiv_fail()) {
for (i = 1; i < 16; i += 3)
_fdiv_risc[i+16] = i;
}
}

void main(void)
{
double z;

z = lfdiv(425678.0, 3142567.0);
printf("z = %G\n",z);
}

Terje Mathisen

unread,

Dec 1, 1994, 4:01:16 AM12/1/94

to

In <3bjlv2$r...@acoma.mathworks.com>, mo...@mathworks.com (Cleve Moler) writes:
>Greetings.
>
>Last week, I suggested a hardware/software workaround for the Pentium
>FDIV bug. And, I promised that we would release a Pentium-aware
>version of MATLAB that incorporated the workaround. Since then,
>I have heard from several people suggesting improvements. In
>particular, Terje Mathisen and Tim Coe have made such good suggestions
>that I want to consider a different algorithm. I have enough
>confidence in this new algorithm that we are now in the process
>of using it for the Pentium-aware MATLAB.
>
>Mathisen is the PC programming expert from Norsk Hydro in Norway
>who confirmed Thomas Nicely's original bug report, and who wrote
>P87TEST. Tim Coe is the semiconductor design engineer from
>Vitesse Semiconductor whose model of the chip's behavior led him
>to examples of the worse-case behavior, including the
>4195835/3145727 example.
>

Here is my currently best (in a numerical sense) version of the FDIV
workaround: It will handle all single and double precision numbers exactly,
including Nan, Inf, Denormals and Zero. With long double (80-bit) numbers,
it can be maximally 1ulp off from the IEEE spec. It will also handle
denormalized long doubles, with the same maximum error term.

A key part of the algorithm is a 16-byte table that contains non-zero
values for the five critical nibble values that must occur directly
after the decimal point in the mantissa. This table is initially zero'ed.

During startup, the chip is tested. If the FDIV bug is found, entry
number 1,4,7,10 and 13 in the table are made non-zero, which will trigger
the rescaling of at-risk numbers.

If the chip is determined to be bug-free, the table is left empty, and no
scaling will occur. This means that the same code can be used on all
cpu's, with very little overhead.

Here are the results from a test run:

I:\C\FDIV>fdivtest
You have the FDIV bug, installing patch table:
Testing 5678901.000000 / 4789012.000000:
FDIV : 39 S = 1.185818912126342 3ff2f91d4068f99b
FDIV + 1 iteration: 56 S = 1.185818912126342 3ff2f91d4068f99b
FDIV + 2 iterations: 68 S = 1.185818912126342 3ff2f91d4068f99b
FDIV + Moler test: 78 S = 1.185818912126342 3ff2f91d4068f99b
fdiv + Coe test: 60 S = 1.185818912126342 3ff2f91d4068f99b
FDIV + inline Coe: 48 S = 1.185818912126342 3ff2f91d4068f99b
LFDIV 49 S = 1.185818912126342 3ff2f91d4068f99b
Testing 4195835.000000 / 3145727.000000:
FDIV : 39 S = 1.333739068902038 3ff556fec7254ed1
FDIV + 1 iteration: 56 S = 1.333820449136241 3ff557541c7c6b43
FDIV + 2 iterations: 68 S = 1.333820449136241 3ff557541c7c6b43
FDIV + Moler test: 192 S = 1.333820449136241 3ff557541c7c6b43
fdiv + Coe test: 68 S = 1.333820449136241 3ff557541c7c6b43
FDIV + inline Coe: 62 S = 1.333820449136241 3ff557541c7c6b43
LFDIV 60 S = 1.333820449136241 3ff557541c7c6b43
Testing 1.000000 / 824633702449.000000:
FDIV : 39 S = 1.212659624879394e-012 3d7555555bfa8e3b
FDIV + 1 iteration: 56 S = 1.212659629396902e-012 3d7555555d4fe391
FDIV + 2 iterations: 68 S = 1.212659629396902e-012 3d7555555d4fe391
FDIV + Moler test: 192 S = 1.212659629396902e-012 3d7555555d4fe391
fdiv + Coe test: 68 S = 1.212659629396902e-012 3d7555555d4fe391
FDIV + inline Coe: 62 S = 1.212659629396902e-012 3d7555555d4fe391
LFDIV 59 S = 1.212659629396902e-012 3d7555555d4fe391

The different labels correspond to:

FDIV : Compiler-generated x/y
FDIV + 1 iteration: One Newton-Raphson (sp?) iteration
FDIV + 2 iterations: Two NR iterations
FDIV + Moler test: Cleve Moler: test result of x/y by back-multiplication
fdiv + Coe test: Coe test, written in C + asm (Does not handle denormals)
FDIV + inline Coe: #define version of Coe test
LFDIV Long double and denormal-capable inline asm version

The LFDIV code is included below. This is what I currently recommend people
use instead of FDIV to work around the bug.

The first pair of numbers are not in one of the critical regions, so the
fixup code is not invoked.

As seen from the timing coloumn, a naked FDIV does take 39 cycles, as
shown in the Pentium manuals. For normal numbers, not needing rescaling,
or when running on a bug-free cpu, the LFDIV code has a 10-cycle total
overhead.

If the divisor is found to be at risk, I rescale by 15/16, which increase
the running time by another 10 cycles. This means that the workaround
have a normal overhead of about 25%, increasing to about 50% for worst-
case numbers.

======== Pentium FDIV bug workaround (Watcom inline version) ============

/* Lookup table for critical nibble patterns, initally empty: */
static char _fdiv_risc_table[16] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};

/* Rescaling factor for at-risk numbers: 15/16 */

static float _fdiv_scale = 0.9375;

/* Rescaling factor for long double denormals, 2^51, defined as a long: */
static long two_to_the_power_of_51 = 0x72000000;

/* Denormal-capable long double fdiv: */

long double lfdiv(long double x, long double y);

#pragma aux lfdiv = \
" sub esp,12"\

"restart:"\

" fld st"\
" fstp tbyte ptr [esp]"\
" mov eax,[esp+4]"\

" add eax,eax"\
" jnc denormal"\
" shr eax,20"\

" cmp al,255"\
" jne ok"\
" mov al,ah"\

" and eax,15"\
" cmp _fdiv_risc_table[eax],ah"\

" jz ok"\
" fld [_fdiv_scale]"\
" fmul st(2),st"\
" fmulp st(1),st"\

" jmp ok"\
"denormal:"\
" or eax,[esp]"\
" jz zero"\
" fld [two_to_the_power_of_51]"\
" fmul st(2),st"\
" fmulp st(1),st"\
" jmp restart"\
"zero:"\

"ok:"\
" fdivp st(1),st"\
" add esp,12"\
parm reverse [8087] [8087]\
modify nomemory exact [eax 8087]\
value [8087];

/* Define a single FDIV instruction as an inline asm macro, to avoid any
problems with the C optimizer messing up the check for faulty FDIV's:
*/
double _fdiv_test(double x, double y);
#pragma aux _fdiv_test = \
" fdivp st(1),st"\
parm reverse [8087] [8087]\
modify nomemory exact []\
value [8087];

#define a_bug 4195835
#define b_bug 3145727

/* Startup code to check for FDIV bug, and install nibble table if needed */
void test_fdiv(void)
{
double z;
z = _fdiv_test(a_bug,b_bug);
if (a_bug - z*b_bug > 1e-10) {
int i;
/* Error found, so init the pattern table! */
#ifdef DEBUG
printf("You have the FDIV bug, installing patch table:\n");
#endif
for (i=1; i < sizeof(_fdiv_risc_table); i += 3)
_fdiv_risc_table[i] = i;
}
}

-Terje Mathisen (include std disclaimer) <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Michael C. Grant

unread,

Dec 1, 1994, 1:34:00 PM12/1/94

to

I'm somewhat concerned about the cost of even this new FDIV workaround.
Can the authors post the relative speeds of a single division in
the following circumstances:

1) an FDIV call
2) a call to the FDIV replacement on a bug-free chip
3) a call to the FDIV replacement on a buggy chip, with a
non-offending denominator
4) a call to the FDIV replacement on a buggy chip, with an
offending denominator

I'm aware that Amdahl's law tells me that, since the number of divides
in a typical program is so small, the slowdown due to this new FDIV
will be minimal.

But, for example, if you're using MATLAB more than occaisonally, I
certainly think you qualify for a free replacement! If Intel says no
to such a person, they're probably going to get into hot water.

Mike

--
Michael C. Grant Information Systems Laboratory, Stanford University
mcg...@isl.stanford.edu <A HREF="http://www-isl.stanford.edu/~mcgrant">
------------------------------------------------------------------------------
"When you get right down to it, your "Long hair, short hair---what's
average pervert is really quite the difference once the head's
thoughtful." (David Letterman) blowed off?" (Nat'l Lampoon)

Craig Milo Rogers

unread,

Dec 1, 1994, 7:55:11 PM12/1/94

to

In article <3bk3cs$5...@vkhdsu01.hda.hydro.com> Terje.M...@hda.hydro.com (Terje Mathisen) writes:
>If the divisor is found to be at risk, I rescale by 15/16, which increase
>the running time by another 10 cycles. This means that the workaround
>have a normal overhead of about 25%, increasing to about 50% for worst-
>case numbers.

Thank you for the excellent work you have done for all of us.
I would, however, like to point out that the overhead may be higher
then your post indicates:

1) If you place this code inline, there will be a (probably minor) loss
in cache efficiency, depending, of course, upon the ratio of
FDIVs to other instructions.

2) If you use this code via a subroutine call, when a single DIV
would have been inline, the call overhead needs to be considered.

Also, there may be a problem with the conditional
initialization of the table in some shared libraries -- they should
use a table that assumes the bug is present, or use a table mapped
into a known fixed location and guaranteed to be initialized, etc.

Thanks again.

Craig Milo Rogers

Peter McGavin

unread,

Dec 1, 1994, 11:10:06 PM12/1/94

to

Terje.M...@hda.hydro.com (Terje Mathisen) writes:
>Here is my currently best (in a numerical sense) version of the FDIV
>workaround: It will handle all single and double precision numbers exactly,
>including Nan, Inf, Denormals and Zero. With long double (80-bit) numbers,
>it can be maximally 1ulp off from the IEEE spec. It will also handle
>denormalized long doubles, with the same maximum error term.

Thanks, nice workaround!

Is it known whether the FDIV bug also affects transcendental
instructions? If any are affected, then the workaround is not a
complete solution.

One suspects that transcendental instructions might be implemented
internally as rational approximation functions using the same buggy
FDIV microcode for the division. However I have no information to
confirm or deny this.
--
Peter McGavin. (pet...@maths.grace.cri.nz)

Cleve Moler

unread,

Dec 2, 1994, 3:17:50 AM12/2/94

to

In article <PETERM.94...@whio.grace.cri.nz>,
Peter McGavin <pet...@maths.grace.cri.nz> wrote:
>...

>Is it known whether the FDIV bug also affects transcendental
>instructions? If any are affected, then the workaround is not a
>complete solution.
>One suspects that transcendental instructions might be implemented
>internally as rational approximation functions using the same buggy
>FDIV microcode for the division. However I have no information to
>confirm or deny this.

We will check on this as well. Thanks.

-- Cleve

Terje Mathisen

unread,

Dec 2, 1994, 4:02:23 AM12/2/94

to

Good thinking, I forgot to ask Intel during our conference call yesterday,
but I suspect that all the appromation functions are defined purely by
FADD and FMUL operations, using reciprocals when needed, since this is much
faster.

Terje Mathisen

unread,

Dec 2, 1994, 3:53:20 AM12/2/94

to

In <MCGRANT.94...@rascals.stanford.edu>, mcg...@rascals.stanford.edu (Michael C. Grant) writes:
>I'm somewhat concerned about the cost of even this new FDIV workaround.
>Can the authors post the relative speeds of a single division in
>the following circumstances:
>
>1) an FDIV call
>2) a call to the FDIV replacement on a bug-free chip
>3) a call to the FDIV replacement on a buggy chip, with a
> non-offending denominator
>4) a call to the FDIV replacement on a buggy chip, with an
> offending denominator

Intel set up a world-wide conference call yesterday, where we discussed the
possible workarounds. The hardware gurus, Tim Coe & Peter Tang agreed that
we should rescale more numbers, since they hadn't been able to prove
conformance for all the bitpatterns cleared by my initial test.

The current version just checks the first 6 bits of the mantissa, including
the sign bit, via a direct lookup table. Passing numbers (27 out of 32)
are sent on immediately, while those at risc are rescaled. Denormal
numbers are handled separately.

Intel are hard at work putting this code into a format that is directly
usable by compiler vendors. They will send out the official version.

My latest tests, using an inline Watcom asm macro, indicate the followings
timings:

1 Naked FDIV: 40 cycles
2 FDIV, no bugs: 44 cycles
3 FDIV, bug, no fix needed: 58 cycles
4 FDIV, bug, fix needed: 64 cycles

So, we're talking about a 10% slowdown for bug-free chips, to be able to
use the same code on all cpus. The average running time on a buggy P5 will
be below 60 cycles, which corresponds to a 50% slowdown.

If I specialcase inner loop code for the bugfree and buggy FDIVs, I can get
the overhead down by about 3-4 cycles for both cases.

>
>I'm aware that Amdahl's law tells me that, since the number of divides
>in a typical program is so small, the slowdown due to this new FDIV
>will be minimal.

I bet you won't be able to measure any differences, unless you use the
internal Pentium cycle counters.

>
>But, for example, if you're using MATLAB more than occaisonally, I
>certainly think you qualify for a free replacement! If Intel says no
>to such a person, they're probably going to get into hot water.

MATLAB will probably have a FDIV-bug aware version available within a
few days, they intended to put my fixup code into their inner loops
yesterday so they could start verifying it's correctness.

I do agree though that Intel will probably have to bite the bullet and
replace all the Pentiums for those who demand it.

Terje Mathisen

unread,

Dec 2, 1994, 3:59:56 AM12/2/94

to

In <3blr9f$e...@drax.isi.edu>, rog...@drax.isi.edu (Craig Milo Rogers) writes:
>In article <3bk3cs$5...@vkhdsu01.hda.hydro.com> Terje.M...@hda.hydro.com (Terje Mathisen) writes:
>>If the divisor is found to be at risk, I rescale by 15/16, which increase
>>the running time by another 10 cycles. This means that the workaround
>>have a normal overhead of about 25%, increasing to about 50% for worst-
>>case numbers.
>
> Thank you for the excellent work you have done for all of us.
>I would, however, like to point out that the overhead may be higher
>then your post indicates:
>
>1) If you place this code inline, there will be a (probably minor) loss
> in cache efficiency, depending, of course, upon the ratio of
> FDIVs to other instructions.

My inline fixup code uses about 70 bytes of code, so yes, it would impact
the cache a bit. Divisions are so rare however (even MATLAB only have
about 70 in all their source code), that the total cost would be
very small for most programs.

>
>2) If you use this code via a subroutine call, when a single DIV
> would have been inline, the call overhead needs to be considered.
>

When optimizing for size, I'd put a test of a global flag in front
of all FDIVs, and then skip the call to the fixup code if it was ok.

The overhead for this would be the same for bug-free chips (3-4 cycles),
and add about 5 cycles on buggy machines.

>
> Also, there may be a problem with the conditional
>initialization of the table in some shared libraries -- they should
>use a table that assumes the bug is present, or use a table mapped
>into a known fixed location and guaranteed to be initialized, etc.

With the global flag approach, the table could be statically initialized
at compile time, and just the flag set according the an initial test.

Hendrik Boshoff

unread,

Dec 2, 1994, 6:35:46 AM12/2/94

to

Cleve Moler (mo...@mathworks.com) wrote:

<snip>

> that I want to consider a different algorithm. I have enough
> confidence in this new algorithm that we are now in the process
> of using it for the Pentium-aware MATLAB.

<snip>

Would there be a way to patch a workaround like this into the OS,
so that everyone can stop worrying about errors, and need only be
concerned about the slowdown?

Hendrik Boshoff

Terje Mathisen

unread,

Dec 2, 1994, 9:12:59 AM12/2/94

to

No, FDIV is a regular user mode opcode, so there is no way to trap them.

On a Dos system, using emulation + back-patching if the hw is available,
you could special-case the divisions, so that they would always trap, and
then fix the problem in your interrupt handler. This would require a
main program compiled with emulation enabled though.

It might be possible to write a (not-so-small) TSR, that would regrab the
emulation vectors after fp library initialization in the running program,
allow all regular operations to be handled normally, but fixup the FDIVs.

Do you think there would be a market for such a program? Remember, it would
not work for anything but pure Dos programs, compiled with emulation enabled!

Craig Milo Rogers

unread,

Dec 2, 1994, 2:41:25 PM12/2/94

to

In article <3bmna0$n...@vkhdsu01.hda.hydro.com> Terje.M...@hda.hydro.com (Terje Mathisen) writes:
>My latest tests, using an inline Watcom asm macro, indicate the followings
>timings:
>
>1 Naked FDIV: 40 cycles
>2 FDIV, no bugs: 44 cycles
>3 FDIV, bug, no fix needed: 58 cycles
>4 FDIV, bug, fix needed: 64 cycles
>
>So, we're talking about a 10% slowdown for bug-free chips, to be able to
>use the same code on all cpus. The average running time on a buggy P5 will
>be below 60 cycles, which corresponds to a 50% slowdown.

Thank you for all the work you're doing. May I suggest a
little more? :-)

By "same code on all cpus", I assume you meant the same code
on Pentium(tm)s, buggy or not. What about someone compiling software
that they want to run on Pentium(tm)s, 486s, 386s, or (gasp!) earlier
CPUs? Some people might even want to run their code on non-Intel
CPUs!

So, for completeness, some kind person should put together a
table showing the cost in cycles or each of the four cases above on
each of the major x86-type CPUs. I expect that with the global flag
approach that the cost won't be very high, particularly considering
how slow FDIV is on pre-Pentium(tm)s, but we should have these facts
on hand for comparison.

I would also like to speculate that on a multiissue pipelined
architecture, it may be possible for the global flag check to be
nearly cost-free, particularly if a standard instruction sequence is
agreed upon and optimized during instruction decode processing in
future processor designs.

Craig Milo Rogers

George White

unread,

Dec 3, 1994, 12:06:55 PM12/3/94

to

-- Cleve

Please do -- even Dell seems pretty confused about this:

>From: sup...@us.dell.com (John Richards)
>Organization: Dell Computer Corporation
...

>Point # 1 - The inaccuracy that can occur is rare and is based on the
>combination of a small set of very large prime numbers, occurring
>usually in division operations.

As you note in your posting of the revised algorithm, there is so much
noise on usenet by now that it is hard to avoid the nonsense. Many
organizations won't act until Intel provides a definitive summary of
the problem in a non-restricted form (all I have officially from Intel
as a fax labelled "Intel confidential" that gives some estimates of
how many years different types of users might go before seeing an error).

If you adopt a fixup that minimizes the overhead in all but a very small
number of cases, you can afford to add some data collection in the code
that handles the special cases. In the long run, this sort of data might
help verify Intel's own estimates for the impact of the bug and could be
useful documentation for a user who feels they need a replacement CPU.

Despite the noise, it appears that usenet has facilitated the development
of a rational solution to a problem where Intel appeared to be floundering.
--
George White, Biological Oceanography, 6-8509

Cleve Moler

unread,

Dec 3, 1994, 3:39:53 PM12/3/94

to

In article <GWHITE.94...@cabot.bio.dfo.ca>,
George White <gwh...@cabot.bio.dfo.ca> wrote:
> ....

> If you adopt a fixup that minimizes the overhead in all but a very small
> number of cases, you can afford to add some data collection in the code
> that handles the special cases. In the long run, this sort of data might
> help verify Intel's own estimates for the impact of the bug and could be
> useful documentation for a user who feels they need a replacement CPU.

> ...

> George White, Biological Oceanography, 6-8509

Hey, that's a really good idea. It just so happens that's exactly
what we're doing. Here is a sneak preview of the Pentium-aware
MATLAB. This is partial output from our test suite of problems
specifically intended to trigger the bug, running on a MATLAB
built within the last hour (about 3:00 pm, EST, 12/3).

-- Cleve

< MATLAB command window output >

You have the FDIV bug, installing patch table.

Commands to get started: intro, demo, help help
Commands for more information: help, whatsnew, info, subscribe

>> xpent
>> format long

% (x/y)*y should = x
>> y = 3145727

y =

3145727

>> x = 4195835

x =

4195835

>> (x/y)*y

Correcting Pentium division flaw. Detected in function /.
x = 4.1958350000000000e+006 = 04150017ec0000000
y = 3.1457270000000000e+006 = 04147ffff80000000
x/y ~ 1.3337390689020380e+000 = 03ff556fec7254ed1
x/y = 1.3338204491362410e+000 = 03ff557541c7c6b43

ans =

4195835

>> s = 3 - 18391/2^38

s =

2.99999993309393

>> A = [s 1; 1 0]

A =

2.99999993309393 1.00000000000000
1.00000000000000 0

% det(A) should = -1
>> det(A)

Correcting Pentium division flaw. Detected in function det.
x = 1.0000000000000000e+000 = 03ff0000000000000
y = -2.9999999330939320e+000 = 0c007fffff7052000
x/y ~ -3.3333333952557760e-001 = 0bfd555555bfb71ca
x/y = -3.3333334076734100e-001 = 0bfd555555d50c71f

ans =

-1.00000000000000

% A*inv(A) should = I
>> A*inv(A)

Correcting Pentium division flaw. Detected in function inv.
x = 1.0000000000000000e+000 = 03ff0000000000000
y = -2.9999999330939320e+000 = 0c007fffff7052000
x/y ~ -3.3333333952557760e-001 = 0bfd555555bfb71ca
x/y = -3.3333334076734100e-001 = 0bfd555555d50c71f

Correcting Pentium division flaw. Detected in function inv.
x = 1.0000000000000000e+000 = 03ff0000000000000
y = 2.9999999330939320e+000 = 04007fffff7052000
x/y ~ 3.3333333952557760e-001 = 03fd555555bfb71ca
x/y = 3.3333334076734100e-001 = 03fd555555d50c71f

ans =

1.00000000000000 0.00000000000000
0.00000000000000 1.00000000000000

How's that?

-- Cleve

Terje Mathisen

unread,

Dec 3, 1994, 5:13:35 PM12/3/94

to

In <GWHITE.94...@cabot.bio.dfo.ca>, gwh...@cabot.bio.dfo.ca (George White) writes:
>In article <3bml7e$n...@acoma.mathworks.com> mo...@mathworks.com (Cleve Moler) writes:
>
> In article <PETERM.94...@whio.grace.cri.nz>,
> Peter McGavin <pet...@maths.grace.cri.nz> wrote:
> >...
> >Is it known whether the FDIV bug also affects transcendental
> >instructions? If any are affected, then the workaround is not a
> >complete solution.
> >One suspects that transcendental instructions might be implemented
> >internally as rational approximation functions using the same buggy
> >FDIV microcode for the division. However I have no information to
> >confirm or deny this.
>
> We will check on this as well. Thanks.
>
> -- Cleve
>

Bad news: FPATAN is definitely broken! :-(

Cleve confirmed that using a scaled version of the Coe pair did indeed fail.

Sin & Cos works, the jury is still out on the rest.

Vaughan R. Pratt

unread,

Dec 4, 1994, 6:13:13 PM12/4/94

to

In article <3bjlv2$r...@acoma.mathworks.com>,

Cleve Moler <mo...@mathworks.com> wrote:
>Where, exactly, are the bands of at-risk denominators used in our
>new algorithm? Between each two powers of two, Coe specifies five
>small intervals to avoid. Consider the interval 4096 <= y < 8192.
>For values of y in this range, the intervals to avoid have width one.
>
>Here they are, in decimal and in hex.
>
> 4607. 4607.999999...
> 5375. 5375.999999...
> 6143. 6143.999999...
> 6911. 6911.999999...
> 7679. 7679.999999...
>
> 40b1ff0000000000 40b1ffffffffffff
> 40b4ff0000000000 40b4ffffffffffff
> 40b7ff0000000000 40b7ffffffffffff
> 40baff0000000000 40baffffffffffff
> 40bdff0000000000 40bdffffffffffff

Just to gild this lily, here they are also as a single formula:

3*(6..10)*2^8 - (0,1]

Here 6..10 denotes the set {6,7,8,9,10} (so 3*(6..10) =
{18,21,24,27,30}) while (0,1] denotes the unit interval of reals
excluding 0. The - is arithmetic rather than set subtraction.

APL no doubt has an even shorter expression.
===========================================================================
===========================================================================
===========================================================================
===========================================================================
--
Vaughan Pratt http://boole.stanford.edu/boole.html

Ralf B. Lukner

unread,

Dec 5, 1994, 8:45:06 AM12/5/94

to

In article <3bqqif$n...@vkhdsu01.hda.hydro.com>,
Terje.M...@hda.hydro.com (Terje Mathisen) wrote:

> > In article <PETERM.94...@whio.grace.cri.nz>,
> > Peter McGavin <pet...@maths.grace.cri.nz> wrote:
> > >...
> > >Is it known whether the FDIV bug also affects transcendental
> > >instructions? If any are affected, then the workaround is not a
> > >complete solution.

> Bad news: FPATAN is definitely broken! :-(

> Cleve confirmed that using a scaled version of the Coe pair did indeed fail.

> Sin & Cos works, the jury is still out on the rest.

Where the heck is Intel on this? They have known for months about FDIV,
and they didn't bother to check transcendentals? Of course, maybe they
did and are trying to hide this and other bugs.

--Ralf

M. Schmidt

unread,

Dec 5, 1994, 10:47:12 AM12/5/94

to

In article <3bmna0$n...@vkhdsu01.hda.hydro.com>, Terje.M...@hda.hydro.com (Terje Mathisen) writes:

> I do agree though that Intel will probably have to bite the bullet and
> replace all the Pentiums for those who demand it.

NO.
Intel has to ship guaranteed bug-free Pentium CPUs to all involved
users without that those have to ask.

An other point:
Isn't the Pentium guaranteed by Intel (or don't they claim?) to conform
with IEEE standards?
If yes, then they MUST be forced to initialize a complete replacement
(coming from Intel themselves, not coming from the users/customers) of
all buggy Pentium CPUs because of such a IEEE non-conformance.

Such a complete replacement would be one of the largest replacements
(or the largest?) in computer history.

All other arguments form Intel sound like stupid stuff of a loosing
:-( company.

--

Michael Schmidt msch...@koblenz.fh-rpl.de

Geert Uytterhoeven

unread,

Dec 6, 1994, 3:50:36 AM12/6/94

to

In article <MSCHMIDT.9...@sparc504.koblenz.fh-rpl.de>, msch...@koblenz.fh-rpl.de (M. Schmidt) writes:
|> NO.
|> Intel has to ship guaranteed bug-free Pentium CPUs to all involved
|> users without that those have to ask.

TRUE! If there's something wrong with your car, they advertise that you can
have it fixed for free... (At least here in Europe)

+--------------------------------------------------------------------+
| Geert Uytterhoeven -->> Wavelets, Amiga, MultiUser, Linux/68k,... |
| Geert.Uyt...@CS.kuleuven.ac.be |
| Dept. of Computer Science, Katholieke Universiteit Leuven, Belgium |
+--------------------------------------------------------------------+
<A HREF="http://www.cs.kuleuven.ac.be/~geert/">Click!</A>

Jim Ewald

unread,

Dec 7, 1994, 5:07:35 AM12/7/94

to

>> I do agree though that Intel will probably have to bite the bullet and
>> replace all the Pentiums for those who demand it.

> NO.
> Intel has to ship guaranteed bug-free Pentium CPUs to all involved
> users without that those have to ask.

Intel has already acknowleged that the FDIV problem exists. It should streamline the
process it's customers use to get the chip replaced. The inquisition one must go through
now is probably Intel's way of saying, "We don't have many corrected chips (yet), and we
need to get them into the hands of those who need them most for public saftey reasons."

I would certainly hope that they will loosen up when the corrected chip is production, and
replace defective chips for anyone who simply asks for one. I don't see Intel, or any other
manufacturer, replacing 4 million chips based on the severity of this error. Software
companies typically tell me, "Yeah, that bug has been fixed in the new release.", but only
offer to send the fix after I complain about the bug. Perhaps it would be too much to
expect Intel to behave any differently.

- Jim

Harvey J. Stein

unread,

Dec 7, 1994, 6:19:29 AM12/7/94

to

In article <3bv5ld$s...@netnews.upenn.edu> ac...@mail1.sas.upenn.edu (Alex Chun) writes:

Ralf B. Lukner (luk...@che.utexas.edu) wrote:
: In article <3bqqif$n...@vkhdsu01.hda.hydro.com>,
: > Bad news: FPATAN is definitely broken! :-(

: Where the heck is Intel on this?
: --Ralf

Intel owns up to this problem. Check out http://www.intel.com.

What are you talking about? I just looked it up & saw no mention of
an FPATAN bug. The home page says:

Welcome to Intel!

Intel's mission is to lead consumers into the digital age by providing
the computing power and the engines for the new interactive services
for business, home, entertainment, education and multimedia. To
support this mission, Intel, the world's largest chip maker, is also a
leading manufacturer of personal computer networking and
communications products.

Introduction to Intel
Intel Product and Support Information
Developer Support
Latest information about FDIV Pentium Processor Flaw
What's New
Web Tools & Utilities
About This Server
Pointers to The Net

I didn't find it under "Latest info about FDIV...", nor under "What's
New", nor under "Developer Support", nor under "Intel Product and
Support Information".

So, where is this "owning up to the problem"?

Thanks,

--
Dr. Harvey J. Stein
Berger Financial Research
hjs...@math.huji.ac.il

Taylor Brady

unread,

Dec 7, 1994, 9:27:31 AM12/7/94

to

In article <3c41h7$e...@maple.enet.net>, Jim Ewald <ew...@enet.net> says:
>
>>> I do agree though that Intel will probably have to bite the bullet and
>>> replace all the Pentiums for those who demand it.
>
>> NO.
>> Intel has to ship guaranteed bug-free Pentium CPUs to all involved
>> users without that those have to ask.
>
>Intel has already acknowleged that the FDIV problem exists. It should streamline the
>process it's customers use to get the chip replaced. The inquisition one must go through
>now is probably Intel's way of saying, "We don't have many corrected chips (yet), and we
>need to get them into the hands of those who need them most for public saftey reasons."
>
>I would certainly hope that they will loosen up when the corrected chip is production, and
>replace defective chips for anyone who simply asks for one.

I would be happy to wait. As long as I was asured that I would get what I paid for
someday. Even if I had to wait for 6 months. But right now, I only have the promise
of having a broken $3,200 calculator.

zin...@oxav6.enet.dec.com

unread,

Dec 8, 1994, 10:35:24 AM12/8/94

to

In article <3c41h7$e...@maple.enet.net>, Jim Ewald <ew...@enet.net> writes:
>>> I do agree though that Intel will probably have to bite the bullet and
>>> replace all the Pentiums for those who demand it.
>
>> NO.
>> Intel has to ship guaranteed bug-free Pentium CPUs to all involved
>> users without that those have to ask.
>

> ...

>I would certainly hope that they will loosen up when the corrected chip
>is production, and
>replace defective chips for anyone who simply asks for one. I don't see
>Intel, or any other
>manufacturer, replacing 4 million chips based on the severity of this
>error. Software
>companies typically tell me, "Yeah, that bug has been fixed in the new
>release.", but only
>offer to send the fix after I complain about the bug. Perhaps it would
>be too much to
>expect Intel to behave any differently.
>
>- Jim
>

'New Scientist' of 10 December 1994 reports that the pentium chips JPL
are using for stress analysis on the space shuttle were accepted as
worthy of replacement, but that Paul Henson at JPL who was about to use
two pentium machines for simulations was turned down. If true, that is
outrageous. I accept that it may not be necessary for someone who uses
their machine for word processing to get a new chip, but anyone who does
serious numerical work should not have to argue. Intel have to replace
the chip for anyone who asks. I hope they will do
this once the furore has died down and they have an adequate supply of
corrected chips. If they eventually deny bug-free chips to any
scientific/numerical users my 486 will be my last Intel chip.

Martin.

Valerie Monbet, These Marc Prevosto 31 10 96, Ifremer DITI GO S

unread,

Dec 9, 1994, 11:13:07 AM12/9/94

to

Hi

Has anyone already written some routines to estimate the density of probability (sample estimate, kernel estimate...) of a sample or a time serie?

Please reply by e-mail.
Thanks

Valerie

*************************
Valerie Monbet
Ifremer France
vmo...@ifremer.fr
---

Richard Spencer Rhodes

unread,

Dec 12, 1994, 10:15:15 PM12/12/94

to

I can't even get them to return my phone call!!!

If they piss enough people off, somebody's going to file a
class action. Keep an eye in the papers for ads. . . .