Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

The Pentium FDIV bug explained (long)

0 views
Skip to first unread message

del...@netcom.com

unread,
Oct 24, 1995, 3:00:00 AM10/24/95
to
I don't win any prizes for punctuality, but here now is my completed
detailed explanation of Radix 4 SRT division as used by the Pentium,
including a fully detailed example of encountering the flaw which
was first made public almost exactly a year ago.
{Approx 1800 lines}
David Deley
del...@netcom.com


THE PENTIUM DIVISION FLAW
by
David Deley

The following paper describes the details behind the floating point
division bug in the hardware divide unit of Intel Corporation's
Pentium(TM) processor. The Pentium is Intel's next generation of
IBM-PC compatible microprocessors following the i486 CPU family. The
original Pentium processor was introduced into the market in May of
1993, and an estimated two million were sold. A year later it was
discovered that in certain very rare instances a division operation
returned a result that was slightly incorrect.

Intel corrected the bug in June of 1994, and any new Pentium computer
sold after January of 1995 probably has the newer corrected chip in it.
Intel will also replace upon request any Pentium chip which has the
floating point division flaw with a new one, just ask. I myself have
never bothered replacing my flawed Pentium chip. The flaw is too
insignificant to affect everyday users like myself. All my software
runs just fine with no problems, and quite fast I might add. Actually
that part about no problems is a lie -- I have endless problems with
bugs in software for IBM-PC compatible computers, but none of the bugs
are the result of the Pentium's floating point division flaw explained
here.

For most people the flaw itself is just an entertaining curiosity since
it's rare and the error in precision when it does occur is small. The
flaw does arouse one's curiosity though, and it is quite interesting to
study the details behind the flaw, as it brings together many aspects
of computer science. Like any technical field, computer science can be
made easy to understand, or it can be made difficult to understand. I
hope most people find this paper easy to understand.

-David W. Deley June 1995

------------------------------------------------------------------------

CHAPTER 1: REVIEW
1.1 INTERPRETING THE VALUE OF A DECIMAL NUMBER
1.2 INTERPRETING THE VALUE OF A BINARY NUMBER
1.3 MOVING THE DECIMAL POINT OF A BASE 10 NUMBER
1.4 MOVING THE RADIX POINT OF A BINARY NUMBER
1.5 NORMALIZED NOTATION: BASE 10
1.6 NORMALIZED NOTATION: BASE 2
1.7 DIVISION NOMENCLATURE
1.8 DIVISION OF NORMALIZED NUMBERS: BASE 10
1.9 DIVISION OF NORMALIZED NUMBERS: BASE 2
1.10 PENTIUM REPRESENTATION OF FLOATING POINT NUMBERS
1.10 CARRY-SAVE ADDITION
1.11 PARTIAL ADDITION

CHAPTER 2: DIVISION
2.1 EXAMPLE
2.2 ABSTRACT DIVISION
2.3 THE ITERATIVE FORMULA FOR DIVISION
2.4 BOUNDS ON RESULTING REMAINDER
2.5 LOOKUP TABLE

CHAPTER 3: RADIX 4 SRT DIVISION
3.1 THE SRT DIVISION ALGORITHM
3.2 THE ITERATIVE FORMULA FOR DIVISION BASE 4
3.3 BOUNDS ON RESULTING REMAINDER
3.4 THE PENTIUM LOOKUP TABLE (P-D PLOT)
3.5 ITERATION USING THE P-D PLOT AS A LOOKUP TABLE

CHAPTER 4: EXAMPLE OF BUG

CHAPTER 5: OBSERVATIONS ON HITTING THE ERROR CELL

CHAPTER 6: TOPICS FOR FURTHER STUDY

CHAPTER 7: HISTORY OF THE BUG'S DISCOVERY

CHAPTER 8: HOW TO TEST FOR THE PENTIUM BUG.

CHAPTER 9: PENTIUM JOKES

REFERENCES
------------------------------------------------------------------------

CHAPTER 1: REVIEW

Here we briefly review some simple concepts you probably already know.

1.1 INTERPRETING THE VALUE OF A DECIMAL NUMBER:

(10^0) ones-+ +-tenths (10^-1)
(10^1) tens-+| |+-hundredths (10^-2)
(10^2) hundreds-+|| ||+-thousandths (10^-3)
419.583
^-decimal point


1.2 INTERPRETING THE VALUE OF A BINARY NUMBER:

(2^0) ones-+ +-halves (1/2)
(2^1) twos-+| |+-fourths (1/4)
(2^2) fours-+|| ||+-eighths (1/8)
(2^3) eights-+||| |||+sixteenths (1/16)
|||| ||||
11.875 = 1011.1011
^-radix point


1.3 MOVING THE DECIMAL POINT OF A BASE 10 NUMBER:

Given an ordinary decimal number such as

14.195835

we can move the decimal point to the right or left by multiplying
or dividing by powers of ten. For example, to move the decimal
point right two places, we multiply by 10^2 = 100

14.195835 * 100 = 419.5835


1.4 MOVING THE RADIX POINT OF A BINARY NUMBER:

Given an ordinary binary number such as

1011.1011

we can move the radix point to the right or left by multiplying or
dividing by powers of two. For example, to move the radix point
right two places, we multiply by 2^2 = 4

1011.1011 * 4 = 101110.11


1.5 NORMALIZED NOTATION: BASE 10

Given an ordinary decimal number such as

14.195835

we can move the decimal point so it lies directly after the first
digit by multiplying or dividing by an appropriate power of 10. In
our example:

14.195835 = 1.4195835 * 10^1

In this representation 1.4195835 is called the mantissa, and 10^1
is called the exponent.


1.6 NORMALIZED NOTATION: BASE 2

Given an ordinary binary number such as

1011.1011

we can move the radix point so it lies directly after the first
digit by multiplying or dividing by an appropriate power of 2. In
our example:

1011.1011 = 1.0111011 * 2^3

In this representation 1.0111011 is called the mantissa, and 2^3 is
called the exponent.


1.7 DIVISION NOMENCLATURE

dividend numerator 12
quotient = -------- = ----------- Example: 3 = --
divisor denominator 4


quotient 3
+--------- Example: +---
divisor | dividend 4 | 12


1.8 DIVISION OF NORMALIZED NUMBERS: BASE 10

Given any two decimal numbers that we wish to divide, such as:

14.195835
----------
119.716320

we can normalize the values using exponential notation to obtain:

1.4195835 * 10^1 1.4195835 10^1 1.4195835
----------------- = ---------- * ---- = ---------- * 10(1-2)
1.19716320 * 10^2 1.19716320 10^2 1.19716320

Thus the problem has been reduced to the division of two
normalized numbers followed by a shifting of the decimal point.


1.9 DIVISION OF NORMALIZED NUMBERS: BASE 2

Given any two binary numbers that we wish to divide, such as:

1011.1011
---------
11.001000

we can normalize the values using exponential notation to obtain:

1.0111011 * 2^3 1.0111011 2^3 1.0111011
--------------- = --------- * --- = --------- * 2(3-1)
1.1001000 * 2^1 1.1001000 2^1 1.1001000

Thus the problem has been reduced to the division of two
normalized numbers followed by a shifting of the radix point.

The important point to note is that any division operation can be
reduced to dividing normalized mantissas. The exponents and signs
can be handled separately.


1.10 PENTIUM REPRESENTATION OF FLOATING POINT NUMBERS
The Pentium uses IEEE standard 594 for representing floating point
numbers. In this standard a single precision floating point variable
is represented using a 24 bit mantissa and an exponent which can take
on values between +127 and -126.

The Pentium also uses two's compliment notation to represent negative
numbers. A positive number is negated by complementing all the bits
(one's complement, i.e. "flip the bits") and then adding 1 to the
result.


1.10 CARRY-SAVE ADDITION
Carry-Save addition is a method of quickly reducing the sum of three
variables A, B, and C, down to the sum of two variables PARTIAL_SUM and
CARRY, using only bitwise logical operations (AND, OR, Exclusive-OR).
With Carry-Save addition we operate on all of the columns at once,
placing the partial sum of that column at the bottom, and saving any
overflow as a carry digit at the top. The bottom is our PARTIAL_SUM,
and the carry digits at the top become our CARRY. The true sum of
(A + B + C) is thus reduced to the sum (PARTIAL_SUM + CARRY).

EXAMPLE #1:

CARRY= 0010.1100010
------------
A=101.11010011
B=001.01100110 CARRY= 0010.1100010
C=000.01100000 PARTIAL_SUM= 100.11010101
------------ -------------
PARTIAL_SUM= 100.11010101 TOTAL= 0111.10011001

Note that:

PARTIAL_SUM = (A ^ B ^ C)
CARRY = (A & B) | (A & C) | (B & C)

'^' represents Exclusive-OR
'&' represents AND
'|' represents OR

Thus both PARTIAL_SUM and CARRY are calculated quickly using nothing
but bitwise logical operations.


1.11 PARTIAL ADDITION:
The one drawback to Carry-Save addition is if we want to know the true
sum of (A + B + C) we still have to add (PARTIAL_SUM + CARRY), and this
can take time. The classic method of adding two numbers is to start
with the right most column and add it up, placing the partial sum for
that column below and any overflow as carry digits above the next
column to the left. Then move left one column and repeat. Continue in
this manner, adding one column at a time, until all the columns have
been added and the calculation is complete. This method of addition is
slow since each number may have 64 bits, and we must loop through all
of the columns, working on only one column at a time, propagating the
overflow on to the next column as carry bits.

However, it may be that we don't need an exact answer but that an
approximate answer will do fine. We can get an approximate answer by
adding just the first few columns and ignoring the rest. For Example,
we can calculate an approximate answer to example #1 above
by considering just the first 7 columns:

CARRY= 0010.110
PARTIAL_SUM= 100.110
-------------
approx total= 0111.100

Here our total is approximate, but it is very close to the actual
answer. In this case the total dropped down from the true answer to
the first integral multiple of 0.001 below the true answer.

Consider now example #2 below which shows a slightly different
PARTIAL_SUM and CARRY. Notice the actual sum is identical to the sum
we had before, but the approximate total obtained by adding the first
seven bits only is lower this time:

EXAMPLE #2:

CARRY= 0010.1011101 CARRY= 0010.101
PARTIAL_SUM= 100.11011111 PARTIAL_SUM= 100.110
------------- -------------
actual total= 0111.10011001 approx total= 0111.011

This time our approximate total dropped down to the _second_ integral
multiple of 0.001 rather than the _first_ integral multiple as it did
before. The difference is in example #1 the sum of the truncated parts
is less than 0.001, whereas in example #2 the sum of the truncated
parts is greater than 0.001:

Example #1: Example #2:
|truncated |truncated
|part |part
| |
CARRY= . 0010 CARRY= . 1101
PARTIAL_SUM= . 10101 PARTIAL_SUM= . 01111
------------- -------------
TOTAL= .___11001 TOTAL= .__101001

Mathematically our approximate total is equal to the true total minus
the sum of the truncated parts:

approx total = total - (sum of truncated parts)

Consider now a worst case scenario in which all the truncated bits are
1's. Convince yourself that even in this case the sum of the truncated
parts will always be less than (2 x 0.001). Consider now the other
extreme scenario where all the truncated bits are 0's. Convince
yourself that in this trivial case the sum of the truncated parts is 0.

We have thus determined the bounds on our approximate total:

total >= approx total > total - (2 x 0.001)

Our approximate total will always be equal to or lower than the true
answer, and the error will always be less than (2 x 0.001). This is a
crucial result so make sure you understand it.


-----------------------------------------------------------------------

CHAPTER 2: DIVISION

2.1 EXAMPLE
Recall how we divide by hand:
<-- quotient
+---------
3 | 7.203125 <-- Dividend (starting Remainder R)
|
Divisor (D)


Step 1.
Choose a quotient digit:

+-- quotient digit q
2
+---------
3 | 7.203125


Step 2.
Multiply quotient digit by Divisor.
Subtract result from Remainder to get New Remainder:

2
+---------
3 | 7.203125 <-- Remainder R
6 <-- (2 * 3 = 6)
---------
1.203125 <-- New Remainder R


Step 3.
To prepare for the next iteration, move the decimal point right one
place (multiply by 10):

2
+---------
3 | 7.203125
6
---------
1.203125 <-- New Remainder R
========
12.03125 <-- R = 10 * R

At this point our remainder R must be within the following acceptable
limits:

0 <= R < 10*D ( D = Divisor, 3 in this example)

If the remainder R is below zero, then the quotient digit we chose in
step 1 was too high. If R is greater than or equal to 10*D, then the
quotient digit we chose in step 1 was too low.


2.2 ABSTRACT DIVISION
We now repeat those three steps above for division in a more general
abstract setting:

q - quotient digit
D - Divisor
R[j] - Remainder after the j'th iteration.
(The starting remainder R[0] is the Dividend.)


+---------
D | R[j] <-- Remainder
/
Divisor

Step 1.
Choose a quotient digit:

+-- quotient digit q
q
+---------
D | R[j]


Step 2.
Multiply quotient digit by Divisor.
Subtract result from Remainder to get New Remainder:

q
+---------
D | R[j]
-qD
---------
Rnew[j] <-- Rnew[j] = (R[j] - qD)


Step 3.
To prepare for the next iteration, move radix point right one place.
(In base 10 the radix point is more familiarly known as the decimal
point, and we multiply by 10. In base 4, which the Pentium uses for
division, we multiply by 4.)

q
+---------
D | R[j]
-qD
---------
Rnew[j] <-- Rnew[j] = (R[j] - qD)
========
R[j+1] = base * Rnew[j] (base = 10 for decimal
4 for radix 4)

Repeat the above 3 steps until the desired number of quotient digits
have been calculated.


2.3 THE ITERATIVE FORMULA FOR DIVISION:
Note steps 2 and 3 can be combined to give the following iterative
formula (this is THE iterative formula for performing division that we
will be referencing throughout the rest of this paper):

R[j+1] = base * (R[j] - q[j]*D)


2.4 BOUNDS ON RESULTING REMAINDER
Note that the minimum we can make the final quotient answer at this
point in the iteration is to have all subsequent quotient digits be
the smallest quotient digit possible. Likewise the maximum we can make
the final quotient answer at this point in the iteration is to have all
subsequent quotient digits be the largest quotient digit possible.
>From this observation we can conclude that our remainder R[J=1] at this
point must be within the following range:

(D * n.nnnnn...) <= R[j+1] < (D * m.mmmmm...)

where: n = lowest quotient digit we can use
m = highest quotient digit we can use

If our remainder R[j+1] at this point is less than (D * n.nnnnn...),
then the correct quotient answer is less than the minimum we can make
our final quotient answer. Likewise if our remainder R[j+1] at this
point is greater than (D * m.mmmmm...) then the correct quotient answer
is greater than the maximum we can make our final quotient answer. In
the first instance our quotient answer will always be too high; in the
second instance our quotient answer will always be too low.


2.5 LOOKUP TABLE
To aid us in determining what quotient digit to choose at each
iteration we can make ourselves a table showing remainder R[j] vs.
divisor D. Here is a simple table for doing base 10 (decimal)
division:

9 | 9 5 3 2 1 1 1 1 1
8 | 8 4 2 2 1 1 1 1 0
7 | 7 3 2 1 1 1 1 0 0
6 | 6 3 2 1 1 1 0 0 0
Current 5 | 5 2 1 1 1 0 0 0 0
Remainder 4 | 4 2 1 1 0 0 0 0 0
3 | 3 1 1 0 0 0 0 0 0
2 | 2 1 0 0 0 0 0 0 0
1 | 1 0 0 0 0 0 0 0 0
0 +------------------
0 1 2 3 4 5 6 7 8 9
Divisor

For example if we have a divisor of 3 and a current remainder of 7, we
can look at our table and see that 2 is the correct next quotient digit
to choose. The Pentium uses a table such as this to determine what
quotient digit to choose next. The bug is some of the table entries
are incorrect. We'll look more closely at the Pentium's lookup table in
a moment.


NEGATIVE QUOTIENT DIGITS
We now introduce one last concept which may seem confusing at first but
it's actually fully legitimate and it results in a faster computer
division implementation, which of course is the whole point. For our
base 10 division examples above we have implicitly restricted our
quotient digits to the usual set {0,1,...,8,9}. Consider what would
happen now if we allowed our quotient digits to also be negative,
{-9,-8,...,8,9} This would allow for some numbers to be represented in
more than one way. For example (using a bar over a number to represent
negation):
_
16 = 24 (i.e. 1*10 + 6 = 2*10 + -4)


========================================================================
CHAPTER 3: RADIX 4 SRT DIVISION

3.1 THE SRT DIVISION ALGORITHM

The SRT division algorithm was named after the 3 scientists who
discovered it independently at about the same time: D. Sweeney of IBM,
J.E. Robertson of the University of Illinois, and T.D. Tocher of the
Imperial College of London.

The Pentium uses a Radix 4 SRT Division algorithm for floating point
divisions. It's simply the division we've been doing above, except
it's done in base 4, and the quotient digits we can choose are
{-2,-1,0,1,2}.


3.2 THE ITERATIVE FORMULA FOR DIVISION BASE 4

Our iterative formula for base 4 is:

R[j+1] = 4 * (R[j] - q[j]*D) (3-1)


3.3 BOUNDS ON RESULTING REMAINDER

At the beginning of each iteration we must choose a quotient digit
which will result in the next remainder R[j+1] being
within the acceptable limits of:
_ _____
(D * 2.22222...) < R[j+1] < (D * 2.22222...)
[base 4] [base 4]
_
where: 2 = lowest quotient digit we can use (-2)
2 = highest quotient digit we can use


Using a little calculus we can show that:

2.22222... (base 4) = 8/3 (decimal)

Proof: 2.22222... = 2*(x^0 + x^1 + x^2 + x^3 +...) [x = 1/4]
(base 4) (decimal)

Recall from calculus if |x| < 1 then:

1
(x^0 + x^1 + x^2 + x^3 +...) = -----
1 - x

1 8
= 2 * ------- = ---
1 - 1/4 3
Q.E.D.


Similarly, it can be show that:
_ _____
2.22222... (base 4) = -8/3 (decimal)


Our restriction on the remainder R[j+1] thus becomes:

(D * -8/3) < R[j+1] < (D * 8/3)

Substituting Eq. (3-1) for R[j+1] gives:

(-8/3)*D < 4*(R[j] - qD) < (8/3)*D (3-2)

Given a remainder R[j] at the beginning of an iteration, we can
now determine what values of q will satisfy the above restraint.

FOR q=+2 (-8/3)*D < 4*(R[j] - 2*D) < (8/3)*D
becomes: (4/3)*D < R[j] < (8/3)*D

We may thus choose quotient digit 2 when our remainder R[j] is between
(4/3)*D and (8/3)*D.


FOR q=+1: (-8/3)*D < 4*(R[j] - 1*D) < (8/3)*D
becomes: (1/3)*D < R[j] < (5/3)*D

We may thus choose quotient digit 1 when our remainder R[j] is between
(1/3)*D and (5/3)*D.


FOR q=0: (-8/3)*D < 4*(R[j] - 0*D) < (8/3)*D
becomes: (-2/3)*D < R[j] < (2/3)*D

We may thus choose quotient digit 0 when our remainder R[j] is between
(-2/3)*D and (2/3)*D.


FOR q=-1: (-8/3)*D < 4*(R[j] - (-1)*D) < (8/3)*D
becomes: (-5/3)*D < R[j] < (-1/3)*D

We may thus choose quotient digit -1 when our remainder R[j] is between
(-5/3)*D and (-1/3)*D.


FOR q=-2: (-8/3)*D < 4*(R[j] - (-2)*D) < (8/3)*D
becomes: (-8/3)*D < R[j] < (-4/3)*D

We may thus choose quotient digit -2 when our remainder R is between
(-8/3)*D and (-4/3)*D.

Notice there is some overlap in the choices of q:
If R is between (5/3)*D and (4/3)*D we can choose either q=2 or q=1.
If R is between (2/3)*D and (1/3)*D we can choose either q=1 or q=0.
If R is between (-1/3)*D and (-2/3)*D we can choose either q=0 or q=-1.
If R is between (-4/3)*D and (-5/3)*D we can choose either q=-1 or q=-2.


3.4 THE PENTIUM LOOKUP TABLE (P-D plot)
The above information leads to the following plot, called the Partial
remainder vs. Divisor plot, or P-D plot for short. Dividing the plot
up into cells generates a lookup table that we can use to aid us in
choosing the next quotient digit q given the current remainder R[j].
The divisions up the vertical axis are 0.001 [binary] apart,
representing R[j] values. The divisions across the horizontal axis are
0.0001 [binary] apart, representing Divisor values. (The divisor
values range from 1 to 1.11111..., since a normalized divisor mantissa
will always be in this range).

There are five cells marked with a '!' near the top of the graph
running along the (8/3)D line. These are the error cells that should
specify a quotient digit of 2, but instead specify a quotient digit of
0. Selecting a quotient digit of 0 when R[j] is within this range is
unacceptable -- the digit value is too low, and no matter what
subsequent quotient digits are chosen the total quotient answer will
always be lower than the correct answer from this point on. This is
the Pentium flaw. Those five cells failed to get the proper value
loaded into them when Intel made the chip.

These 5 error cells are very rarely used being positioned where they
are and with less than half of each error cell actually below the (8/3)D
line. It is also impossible to hit any of these error cells early on
in the division process, so even if an error cell is encountered the
generated result will still be close to the correct answer.

(The upper half of this plot is much more neatly presented in
Postscript format in Intel's white paper on the Pentium flaw as Figure
4-3 "Missing Terms in P-D Plot")

THE PENTIUM P-D PLOT:

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
. . . . . . . . . . . . . . . .
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
5.3125 0101.011 + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - +
| .|(8/3)D
5.25 0101.010 + + + + + + + + + + + + + + + + / +
| / |
5.0625 0101.001 + + + + + + + + + + + + + + + /+ +
| / |
5.0 0101.000 + + + + + + + + + + + + + +---/ + +
| |!/!| |
4.8125 0100.111 + + + + + + + + + + + + + +/--+ + +
| / |
4.75 0100.110 + + + + + + + + + + + + + / + + + +
| / |
4.5625 0100.101 + + + + + + + + + + + + /+ + + + +
| / |
4.5 0100.100 + + + + + + + + + + +---/ + + + + +
| |!/!| |
4.3125 0100.011 + + + + + + + + + + +/--+ + + + + +
| / |
4.25 0100.010 + + + + + + + + + + / + + + + + + +
| / |
4.0625 0100.001 + + + + + + + + + /+ + + + + + + +
| / |
4.0 0100.000-+ - + - + - + - + - + - + - +---/ - + - + - + - + - + - + - + - +
| |!/!| |
3.8125 0011.111 + + + + + + + +/--+ + + + + + + + +
| / |
3.75 0011.110 + + + + + + + / + + + + + + + + + +
| / |
3.5625 0011.101 + + + + + + /+ + + + + + + + + + +
| / |
3.5 0011.100 + + + + +---/ + + + + + + + + + + +
| |!/!| |
3.3125 0011.011 + + + + +/--+ + + + + + + + + + + +
| / (q = 2) /|(5/3)D -----------
3.25 0011.010 + + + + / + + + + + + + + + + + +/ + ^
| / / | |
3.0625 0011.001 + + + /+ + + + + + + + + + + / + + |
| / / |
3.0 0011.000 + +---/ + + + + + + + + + + /+ + + + q=(1 or 2)
| |!/!| / |
2.8125 0010.111 + +/--+ + + + + + + + + + / + + +-------+ |
| / / | ? ? | |
2.75 0010.110 + / + + + + + + + + + + / + + +=======+pentium v
|/ / | |(4/3)D -----------
2.5625 0010.101 + + + + + + + + + +/ + +-----------+ + / +
| / | ? ? ? | / |
2.5 0010.100 + + + + + + + + / + + +===========+ + +
| / | / |
2.3125 0010.011 + + + + + + + /+ +-----------+ + / + + + +
| / | ? ? ? | / |
2.25 0010.010 + + + + + + /+ + +===========+ + + + + +
| / | / |
2.0625 0010.001 + + + + + / +-----------+ + / + + + + + + +
| / | ? ? ? | / |
2.0 0010.000 + + + +/ + +===========+ + + + + + + + +
| / | / |
1.8125 0001.111 + + +-----------+ + / + + + + + + + + + +
| / | ? ? ? | / |
1.75 0001.110 + /+ +===========+ + + + + + + + + + + +
|/ | / |
1.5625 0001.101 +-------+ + / + + + + + + + + + + + + +
| ? ? | / (q = 1) |
1.5 0001.100 +=======+ + + + + + + + + + + + + + +
| / |
1.3125 0001.011 + / + + + + + + + + + + + + + + + +
/ _ - |(2/3)D ------------
1.25 0001.010 + + + + + + + + + + + + + + _ - ~ + + ^
| _ - ~ | |
1.0625 0001.001 + + + + + + + + + + + _ - ~ + + + + + |
| _ - ~ |
1.0 0001.000 + + + + + + + + _ +-~ + + + + + + + + q=(0 or 1)
| _ - ~ |
0.8125 0000.111 + + + + + _ +-~ + + + + + + + + +-------+ |
| _ - ~ | ? ? | |
0.75 0000.110 + + _ + ~ + + + + + +-----------------------+=======+pentium v
| - ~ | ? ? ? ? ? ? | ___.|=(1/3)D -----------
0.5625 0000.101 + + +-----------------------+=======================+/~~+ +
| | ? ? ? ? ? ? | ___....-----~~~~~~~ |
0.5 0000.100 +-------+=======================+/~~+ + + + + + + +
| ? ? | ___....-----~~~~~~~ |
0.3125 0000.011 +=======+/~~~ + + + + + + + + + + + + +
|--~~~~~ (q = 0) |
0.25 0000.010 + + + + + + + + + + + + + + + + +
| |
0.0625 0000.001 + + + + + + + + + + + + + + + + +
| |
0. 0000.000 + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = + = +=0D
| |
-0.0625 1111.111 + + + + + + + + + + + + + + + + +
| |
-0.25 1111.110 + + + + + + + + + + + + + + + + +
|--..__ (q = 0) |
-0.3125 1111.101 + + + ..+__ + + + + + + + + + + + + +
| ~~~~---....___ ! |
-0.5 1111.100 +-------+ + + + + + + ..+__ + + + + + + +
| ? ? | ! ~~~~---....___ |
-0.5625 1111.011 +=======+-----------------------+ + + + + + +-..+ +
|~ -._ | ? ? ? ? ? ? | ~~-|(-1/3)D ------------
-0.75 1111.010 + + ~+=======================+-----------------------+ + + ^
| ~ - . _ | ? ? ? ? ? ? | | |
-0.8125 1111.001 + + + + + ~-._ + + +=======================+-------+ |
| ~ - ._ | ? ? | |
-1.0 1111.000 + + + + + + + + ~-._ + + + + + +=======+pentium q=(0 or -1)
| ~ - ._ | |
-1.0625 1110.111 + + + + + + + + + + + ~- _ + + + + + |
| ~ - ._ | |
-1.25 1110.110 + + + + + + + + + + + + + + ~- _ + + v
| ~ -.|(-2/3)D ------------
-1.3125 1110.101 + \ + + + + + + + + + + + + + + + +
| \ |
-1.5 1110.100 +-------+ + + + + + + + + + + + + + +
| ? ? | \ (q = -1) |
-1.5625 1110.011 +=======+ + \ + + + + + + + + + + + + +
|\ | \ |
-1.75 1110.010 + \ + +-----------+ + + + + + + + + + + +
| ~ \ | ? ? ? | \ |
-1.8125 1110.001 + + +===========+ + \ + + + + + + + + + +
| \ | \ |
-2.0 1110.000 + + + +\ + +-----------+ + + + + + + + +
| \ | ? ? ? | \ |
-2.0625 1101.111 + + + + +\ +===========+ + \ + + + + + + +
| \ | \ |
-2.25 1101.110 + + + + + + \+ + +-----------+ + + + + +
| \ | ? ? ? | \ |
-2.3125 1101.101 + + + + + + + \+ +===========+ + \ + + + +
| \ | \ |
-2.5 1101.100 + + + + + + + + \ + + +-----------+ + +
| \ | ? ? ? | \ |
-2.5625 1101.011 + + + + + + + + + +\ + +===========+ + \ +
|\ \ | |(-4/3)D -----------
-2.75 1101.010 + \ + + + + + + + + + +\ + + + +-------+ ^
| \ \ | ? ? | |
-2.8125 1101.001 + +\ + + + + + + + + + + \+ + +=======+pentium |
| \ \ | |
-3.0 1101.000 + + \ + + + + + + + + + + \+ + + + q=(-1 or -2)
| \ \ | |
-3.0625 1100.111 + + + \+ + + + + + + + + + + \ + |
| \ (q = -2) \ | |
-3.25 1100.110 + + + + \ + + + + + + + + + + + +\ + v
| \ ~|(-5/3)D -----------
-3.3125 1100.101 + + + + +\ + + + + + + + + + + + +
| \ |
-3.5 1100.100 + + + + + \ + + + + + + + + + + +
| \ |
-3.5625 1100.011 + + + + + + \+ + + + + + + + + + +
| \ |
-3.75 1100.010 + + + + + + + \ + + + + + + + + + +
| \ |
-3.8125 1100.001 + + + + + + + +\ + + + + + + + + +
| \ |
-4.0 1100.000 + + + + + + + + \ + + + + + + + +
| \ |
-4.0625 1011.111 + + + + + + + + + \+ + + + + + + +
| \ |
-4.25 1011.110 + + + + + + + + + + \ + + + + + + +
| \ |
-4.3125 1011.101 + + + + + + + + + + +\ + + + + + +
| \ |
-4.5 1011.100 + + + + + + + + + + + \ + + + + +
| \ |
-4.5625 1011.011 + + + + + + + + + + + + \+ + + + +
| \ |
-4.75 1011.010 + + + + + + + + + + + + + \ + + + +
| \ |
-4.8125 1011.001 + + + + + + + + + + + + + +\ + + +
| \ |
-5.0 1011.000 + + + + + + + + + + + + + + \ + +
| \ |
-5.0625 1010.111 + + + + + + + + + + + + + + + \+ +
| \ |
-5.25 1010.110 + + + + + + + + + + + + + + + + \ +
| |(-8/3)D
-5.3125 1010.101 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
. . . . . . . . . . . . . . . .
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

Divisor


Let's look at this crude graph for a moment. There are diagonal lines
(8/3)D, (5/3)D, (4/3)D, (2/3)D, (1/3)D, (-1/3)D, (-2/3)D, (-4/3)D,
(-5/3)D, (-8/3)D. These lines would pass through the origin if we
extended the graph to the left. These are the theoretical boundaries
discussed above.

Above (8/3)D is out of bounds. If we ever end up there,
something has gone wrong. The Pentium returns a value
of q=0 if we ever hit this area.

Between (8/3)D and (5/3)D we must have q=2. Each cell which
is in or partly overlaps this area must have the value 2.
The five error cells marked with '!' have a value of 0
instead of 2.

Between (5/3)D and (4/3)D we can have either q=2 or q=1. The
actual division line (as best as we know) is the staircase
drawn using '==='. Cells above that staircase have q=2,
cells below that staircase have q=1. Cells immediately
above the division line, marked with a '?', are cells which
might return either q=2 or q=1 depending upon other factors
that will be discussed in chapter 4.

Between (4/3)D and (2/3)D we must have q=1. Each cell which is
in or partly overlaps this area must have the value 1.

Between (2/3)D and (1/3)D we can have either q=1 or q=0. The
actual division line (as best as we know) is the staircase
drawn using '==='. Cells above that staircase have q=1, cells
below that staircase have q=0. Cells immediately above the
division line, marked with a '?', are cells which might return
either q=1 or q=0 depending upon other factors that will be
discussed in chapter 4.

Between (1/3)D and (-1/3)D we must have q=0. Each cell which
is in or partly overlaps this area must have the value 0.

Between (-1/3)D and (-2/3)D we can have either q=0 or q=-1. The
actual division line (as best as we know) is the staircase
drawn using '==='. Cells above that staircase have q=0, cells
below that staircase have q=-1. Note this division staircase
is not a mirror image of the one in the positive plane, but is
instead shifted down by two cells. Cells immediately above the
division line, marked with a '?', are cells which might return
either q=0 or q=-1 depending upon other factors that will be
discussed in chapter 4.

Between (-2/3) and (-4/3) we must have q=-1. Each cell which
is in or partly overlaps this area must have the value -1.

Between (-4/3)D and (-5/3)D we can have either q=-1 or q=-2.
The actual division line (as best as we know) is the staircase
shown there. Cells above that staircase have q=-1, cells below
that staircase have q=-2. Note again this division staircase
is not a mirror image of the one in the positive plane, but is
instead shifted down by two cells. Cells immediately above the
division line, marked with a '?', are cells which might return
either q=-1 or q=-2 depending upon other factors that will be
discussed in chapter 4.

Between (-5/3)D and (-8/3)D we must have q=-2. Each cell which
is in or partly overlaps this area must have the value -2.
Note there are no error cells down here in the negative plane.
The 5 error cells are all in the positive plane.

Below (-8/3)D is out of bounds. If we ever end up there,
something has gone wrong. The Pentium returns a value of q=0
if we ever hit this area. (We actually do hit this area after
hitting the error cell.)


3.5 ITERATION USING THE P-D PLOT AS A LOOKUP TABLE
The cells in the above graph define the lookup table used during the
Pentium's SRT division process. The iteration goes something like
this:

Given:
D - Divisor (note this value remains fixed)
R - Dividend (initial Remainder)

Step 1:
Using D and R, look in P-D table above for next quotient digit q. Note
that we need only know the first 4 digits after the radix point of the
Divisor, and we need only know the first 3 digits after the radix point
of the Remainder.

Step 2:
Do the math:
R[j+1] = 4 * (R[j] - q*D)

Repeat steps 1 & 2 until enough quotient digits have been determined.


************************************************************************

CHAPTER 4: EXAMPLE OF BUG

And now for an example where we reproduce the Pentium bug. In the
Microsoft windows calculator try 5506153/294911

Correct: 18.67055823621364
Pentium: 18.66990719233938

If you get the correct answer then either you have the newer Pentium
chip with the bug corrected, or you're using a calculator program which
is "Pentium aware". "Pentium aware" software looks for and catches any
incorrect division result caused by the bug and recalculates the result
using some other method so you still get the right answer.

To demonstrate the division here we must first convert our decimal
divisor and dividend to binary numbers. This can easily be done
using the Microsoft windows calculator with View set to Scientific.
Enter 5506153, then click on BIN to switch to binary mode. The result
is:

5506153 = 10101000000010001101001.0

= 1.01010000000100011010010 * 2^22

We set the exponent to 2^-22 so the radix point falls right after the
leading 1. This gives us our normalized mantissa for the dividend.
The divisor, 294911, is similarly converted:

294911 = 1000111111111111111.00000

= 1.00011111111111111100000 * 2^18


Our division becomes:

1.01010000000100011010010 * 2^22
quotient = -----------------------------------
1.00011111111111111100000 * 2^18


1.01010000000100011010010
= ------------------------- * 2^(22-18)
1.00011111111111111100000


We will perform the division on the mantissa values, and then multiply
the result by 2^(22-18) = 2^4 = 16 at the very end to get our quotient
answer.

To perform the division we will be calculating:

R[j+1] = 4 * (R[j] - qD)

which we will rewrite as:

R[j+1] = 4 * (R[j] + -qD)

The steps are:
a: determine q from our table given R[j] and D
b: calculate -qD
c: add (R[j] + -qD)
d: multiply by 4 (move binary point right by two bits) to get R[j+1]

Note how easy it is to calculate -qD when the possible values of q are
(2,1,0,-1,-2}. To multiply by 2 we just shift the bits left one
position. To negate the value we just flip the bits and add one to the
lowest bit, except we don't add that lowest bit to -qD, because that
would take time to perform the addition, instead we place it
add it in slightly later when it's more convenient.

Nomenclature:
ps - partial sum
pps - previous partial sum
c - carry
pc - previous carry
D - Divisor
D~ - Approximate Divisor. In our example below this is 1.0001 The
divisor truncated after the fourth bit following the radix
point.
R~ - Approximate Remainder.
_ _ _
2 - A bar over a number indicates negation. 1 = -1 and 2 = -2

Our initial conditions are:

Divisor (D)
= 0001.000111111111111111000000000000000000000000000000000000000000

Dividend (initial remainder R)
= 0001.010100000001000110100100000000000000000000000000000000000000


ITERATION #1:
1a: Choose next quotient digit q.
Given R~=0001.010 (and D~=1.0001), examine the lookup table to find
the cell corresponding to these values. D~ runs across the bottom
of the table, and R~ runs up and down the side of the table. The
lookup table says next q digit is 1.

q digits (base 4): 1. = 1.0 (decimal)

1b: Calculate -qD
q=1, so this is just -D. Negation is performed by flipping the
bits and adding 1 to the lowest bit. However, instead of adding
that 1 to the lowest bit of -qD as would normally be done, we
instead add it to the carry (c) result. Since we know the lowest
bit of the carry (c) result will always be clear (since there's no
lower bits to set it), adding that extra low bit here means just
setting the bit, whereas adding that extra low bit to -qD, as would
normally be done, could take time, requiring many clock cycles to
figure out the new -qD after adding in that lowest bit.

-qD= 1110.111000000000000000111111111111111111111111111111111111111111

1c: Add (R + -qD)
R for this step is our initial Dividend, and we just calculated
-qD. Here we start using our Carry-Save format. We add the bits,
putting the sum bits in partial_sum (ps) below, and saving the
carry bits in carry (c) above. Our answer, the new remainder, can
be obtained by adding ps + c, but that would take time, so for now
we'll leave the two values separate.

Note also we added to carry (c) the lowest bit leftover from step
1b when we negated D. This will only be necessary when calculating
-qD involves negation, which occurs only when q=1 or q=2. Note that
this carry bit, being the lowest one, will always be clear, so there
is no time wasted adding the bit here.

lowest bit added--+
from step 1b v
c= 0000.100000000000000001001000000000000000000000000000000000000001
-----------------------------------------------------------------
R= 0001.010100000001000110100100000000000000000000000000000000000000
-qD= 1110.111000000000000000111111111111111111111111111111111111111111
-----------------------------------------------------------------
ps= 1111.101100000001000110011011111111111111111111111111111111111111


1d: Multiply by 4. We do this by shifting the bits left two places.
c*4= 0010.000000000000000100100000000000000000000000000000000000000100
ps*4=1110.110000000100011001101111111111111111111111111111111111111100


We have now calculated R[1] = 4*(R[0] - qD) as intended. This ends
the first iteration.

========================================================================
ITERATION #2:

2a: Choose next quotient digit q.
Let us call c*4 from the previous iteration 'pc' for
previous_carry, and let us call ps*4 from the previous iteration
'pps' for previous_partial_sum. Our new remainder R[1] at this
moment is pps + pc. To save time the Pentium adds only the first
eight bits of pps and pc, obtaining an approximate value R~[1] for
the current remainder which is good enough for using with the
lookup table to determine what next quotient digit to choose.

pc= 0010.000 000000000000100100000000000000000000000000000000000000100
pps= 1110.110 000000100011001101111111111111111111111111111111111111100
--------
R~= 0000.110

Note the R~ value obtained here will be either the first or the
second integral multiple of 0.001 below the true value R. Because
R~ might be the second integral multiple of 0.001 below the true
value R, it is important that the actual staircase divisions within
the lookup table be chosen so the q value chosen in this case is
still an acceptable q value.

Now examine the lookup table and find the cell corresponding to
divisor D=1.001 and remainder R=0.110 From the table our next
quotient digit q is 1.

q digits (base 4): 1.1 = 1.25 (decimal)

2b: Calculate -qD. q=1, so -qD is the same as in iteration #1. Again,
as explained in iteration #1, since we perform a negation here we
also set the lowest bit of the carry (c) result.

-qD= 1110.111000000000000000111111111111111111111111111111111111111111

2c: Add (R + -qD). We have -qD from previous step. R is pps + pc.
We create new partial sum (ps) and new carry (c).

lowest bit added--+
from step 2b v

c= 0000.100000000000000001001000000000000000000000000000000000000001
-----------------------------------------------------------------
pc= 0010.000000000000000100100000000000000000000000000000000000000100
pps= 1110.110000000100011001101111111111111111111111111111111111111100
-qD= 1110.111000000000000000111111111111111111111111111111111111111111
-----------------------------------------------------------------
ps= 0010.001000000100011101110000000000000000000000000000000000000111

2d: Multiply by 4 (shift the bits left two places).

c*4= 0110.000000000000000101111111111111111111111111111111111111100100
ps*4=1000.100000010001110111000000000000000000000000000000000000011100


========================================================================
ITERATION #3:
The steps are the same from here on out.
pc = c*4 from previous iteration.
pps = pps*4 from previous iteration.

3a: Add first 8 bits of pc and pps to get R~:
pc= 0110.000 000000000000101111111111111111111111111111111111111100100
pps=1000.100 000010001110111000000000000000000000000000000000000011100
--------
R~= 1110.100

Given R~=1110.100 (and D~=1.0001), lookup table says next q
digit is -1.

ones-+ +-(1/4)
| |+-(1/16)
v vv
_
q digits (base 4): 1.11 = 1.1875 (decimal)

3b: Calculate -qD. q=-1, so this is just D, and there is no low bit
to add to carry (c) since there was no negation.

-qD= 0001.000111111111111111000000000000000000000000000000000000000000

3c: Add (R + -qD). (R = pps + pc)

c= 0000.00000010001110111000000000000000000000000000000000000000100
-----------------------------------------------------------------
pc= 0110.000000000000000101111111111111111111111111111111111111100100
pps= 1000.100000010001110111000000000000000000000000000000000000011100
-qD= 0001.000111111111111111000000000000000000000000000000000000000000
-----------------------------------------------------------------
ps= 1111.100111101110001101111111111111111111111111111111111111111000

3d: Multiply by 4 (shift the bits left two places).

c*4 0000.000010001110111000000000000000000000000000000000000000100000
ps*4=1110.011110111000110111111111111111111111111111111111111111100000

========================================================================
ITERATION #4:
4a: Choose next quotient digit q.
Add first 8 bits of pc and pps to get R~:

pc= 0000.000 010001110111000000000000000000000000000000000000000100000
pps= 1110.011 110111000110111111111111111111111111111111111111111100000
--------
R~= 1110.011

Given R~=1110.011 (and D~=1.0001), lookup table says next q
digit is -1.

ones-+ +-(1/4)
| |+-(1/16)
| ||+-(1/64)
v vvv
__
q digits (base 4): 1.111 = 1.171875 (decimal)

4b: Calculate -qD. q=-1, so this is just D, and there is no low bit
to add to carry (c) since there was no negation.

-qD= 0001.000111111111111111000000000000000000000000000000000000000000

4c: Add (R + -qD). (R = pps + pc)

c= 0000.00110111110111111000000000000000000000000000000000000100000
-----------------------------------------------------------------
pc= 0000.000010001110111000000000000000000000000000000000000000100000
pps= 1110.011110111000110111111111111111111111111111111111111111100000
-qD= 0001.000111111111111111000000000000000000000000000000000000000000
-----------------------------------------------------------------
ps= 1111.011011001001110000111111111111111111111111111111111111000000

4d: Multiply by 4 (shift the bits left two places)

c*4= 0000.110111110111111000000000000000000000000000000000000100000000
ps*4=1101.101100100111000011111111111111111111111111111111111100000000

========================================================================
ITERATION #5:
5a: Add first 8 bits of pc and pps to get R~:

pc= 0000.110 111110111111000000000000000000000000000000000000100000000
pps= 1101.101 100100111000011111111111111111111111111111111111100000000
--------
R~= 1110.011

Given R~=1110.011 (and D~=1.0001), lookup table says next q
digit is -1.

ones-+ +-(1/4)
| |+-(1/16)
| ||+-(1/64)
| |||+-(1/256)
v vvvv
___
q digits (base 4): 1.1111 = 1.16796875 (decimal)

5b: Calculate -qD:

-qD= 0001.000111111111111111000000000000000000000000000000000000000000
(It's just D)

5c: Add (R + -qD)
R = pps + pc, so this becomes (pc + pps + -qD), which we add using
Carry-Save format, creating a partial sum (ps) and a carry (c):

c= 0011.00111110111111011000000000000000000000000000000000100000000
-----------------------------------------------------------------
pc= 0000.110111110111111000000000000000000000000000000000000100000000
pps= 1101.101100100111000011111111111111111111111111111111111100000000
-qD= 0001.000111111111111111000000000000000000000000000000000000000000
-----------------------------------------------------------------
ps= 1100.011100101111000100111111111111111111111111111111111000000000

5d: Multiply by 4 (shift the bits left two places)
5d: Multiply partial sum (ps) and carry (c) by 4 (there is no lb to add
this time):

c*4= 1100.111110111111011000000000000000000000000000000000100000000000
ps*4=0001.110010111100010011111111111111111111111111111111100000000000


========================================================================
ITERATION #6:

6a: Add first 8 bits of pc and pps to get R~:

pc= 1100.111 110111111011000000000000000000000000000000000100000000000
pps= 0001.110 010111100010011111111111111111111111111111111100000000000
--------
R~= 1110.101

Given R~=1110.101 (and D=1.0001), lookup table says next q
digit is -1.

ones-+ +-(1/4)
| |+-(1/16)
| ||+-(1/64)
| |||+-(1/256)
| ||||+-(1/1024)
v vvvvv
____
q digits (base 4): 1.11111 = 1.1669921875 (decimal)

6b: Calculate -qD:
-qD= 0001.000111111111111111000000000000000000000000000000000000000000
(It's just D, and there is no low bit to add in later.)

6c: Add pc + pps + -qD, creating partial sum ps and carry c:

c= 0011.10110111111011011000000000000000000000000000000100000000000
-----------------------------------------------------------------
pc= 1100.111110111111011000000000000000000000000000000000100000000000
pps= 0001.110010111100010011111111111111111111111111111111100000000000
-qD= 0001.000111111111111111000000000000000000000000000000000000000000
-----------------------------------------------------------------
ps= 1100.001011111100110100111111111111111111111111111111000000000000

6d: Multiply psum and carry by 4 (there is no lb to add this time)

c*4= 1110.110111111011011000000000000000000000000000000100000000000000
ps*4=0000.101111110011010011111111111111111111111111111100000000000000

========================================================================
ITERATION #7:

7a: Add first 8 bits of pc and pps to get R~:

pc= 1110.110 111111011011000000000000000000000000000000100000000000000
pps= 0000.101 111110011010011111111111111111111111111111100000000000000
--------
R~= 1111.011

Given R~=1111.011 (and D~=1.0001), lookup table says next q
digit is -1.

_____
q digits (base 4): 1.111111 = 1.166748046875 (decimal)

7b: Calculate -qD:

-qD= 0001.000111111111111111000000000000000000000000000000000000000000
(It's just D, and there is no low bit lb to add in later.)

7c: Add prev carry, prev psum, and -qD, creating psum and carry:

c= 0001.00111111011011011000000000000000000000000000100000000000000
-----------------------------------------------------------------
pc= 1110.110111111011011000000000000000000000000000000100000000000000
pps= 0000.101111110011010011111111111111111111111111111100000000000000
-qD= 0001.000111111111111111000000000000000000000000000000000000000000
-----------------------------------------------------------------
ps= 1111.011111110111110100111111111111111111111111111000000000000000

7d: Multiply psum and carry by 4 (there is no low bit lb to add this
time)

c*4= 0100.111111011011011000000000000000000000000000100000000000000000
ps*4=1101.111111011111010011111111111111111111111111100000000000000000

========================================================================
ITERATION #8:

8a: Add first 8 bits of pc and pps to get R~:

pc= 0100.111 111011011011000000000000000000000000000100000000000000000
pps= 1101.111 111011111010011111111111111111111111111100000000000000000
--------
R~= 0010.110

Given R~=0111.011 (and D~=1.0001), lookup table says next q
digit is 2.

_____
q digits (base 4): 1.1111112 = 1.1668701171875 (decimal)

8b: Calculate -qD:
-qD= 1101.110000000000000001111111111111111111111111111111111111111111

8c: Add prev carry, prev psum, and -qD, creating psum and carry:

c= 1011.11111011011010001111111111111111111111111100000000000000000
-----------------------------------------------------------------
pc= 0100.111111011011011000000000000000000000000000100000000000000000
pps= 1101.111111011111010011111111111111111111111111100000000000000000
-qD= 1101.110000000000000001111111111111111111111111111111111111111111
-----------------------------------------------------------------
ps= 0100.110000000100001010000000000000000000000000111111111111111111

8d: Multiply psum and carry by 4 (there is no low bit lb to add this
time)

c*4= 1111.111011011010001111111111111111111111111100000000000000000100
ps*4=0011.000000010000101000000000000000000000000011111111111111111100

========================================================================
ITERATION #9: (HITS THE ERROR CELL)

9a: Add first 8 bits of pc and pps to get R~:

pc= 1111.111 011011010001111111111111111111111111100000000000000000100
pps =0011.000 000010000101000000000000000000000000011111111111111111100
--------
0010.111

Given R~=00010.111 (and Divisor~=1.0001), the next quotient digit
should be 2, but lookup table hits error cell, incorrectly
specifying a quotient digit of 0 instead. There is no recovery
from this point on. The resulting quotient value for the division
will be less than it should be. This is the Pentium bug.

_____
q digits (base 4): 1.11111120 = 1.1668701171875 (decimal)
^+should be 2

9b: Calculate -qD:
-qD= 0000.000000000000000000000000000000000000000000000000000000000000
(since q=0, -qD=0)

9c: Add prev carry, prev psum, and -qD, creating psum and carry:

c= 0110.00000010000001000000000000000000000000000000000000000000100
-----------------------------------------------------------------
pc= 1111.111011011010001111111111111111111111111100000000000000000100
pps= 0011.000000010000101000000000000000000000000011111111111111111100
-qD= 0000.000000000000000000000000000000000000000000000000000000000000
-----------------------------------------------------------------
ps= 1100.111011001010100111111111111111111111111111111111111111111000

9d: Multiply psum and carry by 4 (there is no low bit lb to add this
time)
c*4= 1000.000010000001000000000000000000000000000000000000000000100000
ps*4=0011.101100101010011111111111111111111111111111111111111111100000

========================================================================
ITERATION #10:

10a: Add first 8 bits of pc and pps to get R~:

pc= 1000.000 010000001000000000000000000000000000000000000000000100000
pps=0011.101 100101010011111111111111111111111111111111111111111100000
--------
R~= 1011.101

Given R~=1011.101 (and D~=1.0001), lookup table says next q digit is
out of bounds, below the -8/3D line. Experimental results suggests
the Pentium returns q=0 when R~ is out of bounds.

_____
q digits (base 4): 1.111111200 = 1.1668701171875 (decimal)

10b: Calculate -qD:
-qD= 0000.000000000000000000000000000000000000000000000000000000000000
(since q=0, -qD=0)

10c: Add prev carry, prev psum, and -qD, creating psum and carry:

c= 0000.00000000000000000000000000000000000000000000000000000100000
-----------------------------------------------------------------
pc= 1000.000010000001000000000000000000000000000000000000000000100000
pps= 0011.101100101010011111111111111111111111111111111111111111100000
-qD= 0000.000000000000000000000000000000000000000000000000000000000000
-----------------------------------------------------------------
ps= 1011.101110101011011111111111111111111111111111111111111111000000

10d: Multiply psum and carry by 4 (there is no low bit lb to add this
time)
c*4= 0000.000000000000000000000000000000000000000000000000000100000000
ps*4=1110.111010101101111111111111111111111111111111111111111100000000

========================================================================
ITERATION #11:

11a: Add first 8 bits of pc and pps to get R~:

pc= 0000.000 000000000000000000000000000000000000000000000000100000000
pps= 1110.111 010101101111111111111111111111111111111111111111100000000
--------
1110.111

Given R~=1110.111 (and D~=1.0001), lookup table says next q
digit is -1.

_____ _
q digits (base 4): 1.1111112001 = 1.1668691635131836 (decimal)
|
+--error cell hit. All further q
digits are wrong.

11b: Calculate -qD:
-qD= 0001.000111111111111111000000000000000000000000000000000000000000

11c: Add prev carry, prev psum, and -qD, creating psum and carry:

c= 0000.00010101101111111000000000000000000000000000000000100000000
-----------------------------------------------------------------
pc= 0000.000000000000000000000000000000000000000000000000000100000000
pps= 1110.111010101101111111111111111111111111111111111111111100000000
-qD= 0001.000111111111111111000000000000000000000000000000000000000000
-----------------------------------------------------------------
ps= 1111.111101010010000000111111111111111111111111111111111000000000

11d: Multiply psum and carry by 4

c*4= 0000.010101101111111000000000000000000000000000000000100000000000
ps*4=1111.110101001000000011111111111111111111111111111111100000000000

========================================================================
ITERATION #12:

12a: Add first 8 bits of pc and pps to get R~:

pc= 0000.010 101101111111000000000000000000000000000000000100000000000
pps= 1111.110 101001000000011111111111111111111111111111111100000000000
--------
0000.000

Given R~=0000.000 (and D~=1.0001), lookup table says next q
digit is 0.
_____ _
q digits (base 4): 1.11111120010 = 1.1668691635131836 (decimal)

12b: Calculate -qD:
-qD= 0000.000000000000000000000000000000000000000000000000000000000000
(since q=0, -qD=0)

12c: Add prev carry, prev psum, and -qD, creating psum and carry:

c= 0000.10101001000000000000000000000000000000000000000100000000000
pc= 0000.010101101111111000000000000000000000000000000000100000000000
pps= 1111.110101001000000011111111111111111111111111111111100000000000
-qD= 0000.000000000000000000000000000000000000000000000000000000000000
ps= 1111.100000100111111011111111111111111111111111111111000000000000

12d: Multiply psum and carry by 4

c*4= 0010.101001000000000000000000000000000000000000000100000000000000
ps*4=1110.000010011111101111111111111111111111111111111100000000000000

========================================================================

The remaining iterations continue in a similar manner. A total of 28
iterations are required for a double precision answer. The final
quotient digits are:
_____ _ _ ___ _ __
q digits: 1.111111200101221122211001110 = 1.1668691995212115 (decimal)
(base 4)

_ _ _ _ _ _ _ _ _ _ _ _ _
q digits: 1. 1 1 1 1 1 1 2 0 0 1 0 1 2 2 1 1 2 2 2 1 1 0 0 1 1 1 0

pos q digits: 1.010000000000100000000001001000000010100001000000000100
neg q digits: 0.000101010101000000010000100001011000000100000001010000
--------------------------------------------------------
quotient: 1.001010101011011111110000100110101010011100111110110100

Finally, we multiply our quotient value here by 2^(22-18) to obtain our
final answer to the original problem (this is equivalent to just moving
the radix point over 4 places):

answer = 10010.10101011011111110000100110101010011100111110110100

Converting our binary answer to decimal we have:

answer = 18.66990719233938 (decimal)

This is exactly the answer the defective Pentium gives.


========================================================================
CHAPTER 5: OBSERVATIONS ON HITTING THE ERROR CELL

In July 1995 Tim Coe of Vitesse Semiconductor Corporation and Ping Tak
Tang of Argonne National Laboratory published a paper titled "It Takes
Six Ones To Reach a Flaw" in which they proved the binary divisor must
have a run of at least six ones starting at the 5th bit after the radix
point if it is to hit an error cell. i.e. the divisor must be one of
the following:

1.0001 1111 11__ ...
1.0100 1111 11__ ...
1.0111 1111 11__ ...
1.1010 1111 11__ ...
1.1101 1111 11__ ...

They also prove that any of the 5 error cells can not be hit earlier
than the 9th iteration. This means the first 8 quotient digits will
always be correct.

Although a run of at least 6 ones required is required to hit an error
cell, it appears the earliest

It appears there is a relationship between how long the run of ones is
(starting at the 5th bit after the radix), and what the earliest
iteration is an error cell can be hit. Tim Coe believes the following
relationship to be true but has not rigorously proved it:

# of 1's earliest iteration
error cell can be hit
------ ---------------------
6 ones -- 15 iterations
7 ones -- cannot address flaw
8 ones -- 10 iterations
9 ones -- 10 iterations
10 ones -- 10 iterations
11 ones -- 9 iterations

========================================================================
CHAPTER 6: TOPICS FOR FURTHER STUDY

Some unanswered questions for further research:

3. Before releasing the Pentium chip to the world one would assume
Intel did all they could to test the floating point division
instruction by trying out numerous divisions. A carefully chosen
division operation might prove that a certain cell in the lookup
table has the correct value in it. Can every cell in the lookup
table be verified this way? Determine if it's possible to
synthesize a Dividend/Divisor pair of numbers that when divided
will hit one of the error cells.

========================================================================
CHAPTER 7: HISTORY OF THE BUG'S DISCOVERY

What is the history of the bug's discovery?

Outside of Intel, the bug was first found by Dr. Thomas R. Nicely,
Professor of Mathematics at Lynchburg College, Lynchburg, Virginia
(nic...@acavax.lynchburg.edu). He was pursuing a research project in
an area of pure mathematics called computational number theory. A half
dozen computers had been running simultaneously for over a year on a
project involving the reciprocal of large twin primes. Most of the
computers were 486's, but one Pentium was added in March of 1994.

On 13 June 1994 Dr. Nicely noticed a discrepancy in his results.
Extensive searching eventually isolated the problem to the twin primes
824633702441 and 824633702443, the double precision floating point
reciprocals of which were incorrect by about three ten-millionths of
one percent.

During the week of October 16-22 Dr. Nicely tested the calculation on
several other Pentiums and found they also produced the error. On
Monday, 24 October, he contacted Intel tech support regarding this bug.
The contact person later reported that the bug was observed on a 66-MHz
system at Intel, but had no further information or explanation, other
than the fact that no such bug had been previously reported or
observed, which is surprising since Intel had discovered the bug months
earlier and had already begun manufacturing new corrected chips.

In the absence of any meaningful response from Intel tech support, Dr.
Nicely sent a now famous e-mail message to a dozen people on October
30, 1994 describing the apparent bug in the Pentium's FPU, and citing
his findings that the result of calculating 1/x was incorrect for most
x values in the range:

824633702418 <= x <= 824633702449

(Note this range includes the twin primes that started it all.)

By November 3rd the news of the bug had found it's way to the Internet
newsgroup comp.sys.intel, where extensive discussions followed. People
began searching for other instances of the bug. Some early reports on
the net, using random searches of double-precision numbers, found a
couple dozen more cases of the bug. Around November 10, Andreas Kaiser
(a...@ananke.s.bawue.de) posted, in comp.sys.intel, a list of 23 cases
where a number of the form 1/x failed.

Tim Coe of Vitesse Semiconductor (c...@vitsemi.com) is credited with
reverse engineering the Pentium's division algorithm based on the 23
known cases supplied by Andreas Kaiser. He wrote a program which
implemented his tentative model of the Pentium's division process and
posted it to comp.sys.intel on November 16. Since then his program has
successfully predicted every known instance of the bug. Most of the
information in this paper comes from examining his program code.

On December 20, 1994, under pressure by consumers, Intel announced it
would replace any flawed Pentium chip with a new one upon request.


=======================================================================
CHAPTER 8: HOW TO TEST FOR THE PENTIUM BUG.

SIMPLE HAND TEST:
In the Microsoft windows calculator try 5506153/294911

Correct: 18.67055823621
Pentium: 18.66990719234

If you get the incorrect answer then you definitely have the bug. If
you get the correct answer then either you have the newer Pentium chip
with the bug corrected, or you have a calculator which is "Pentium
aware". "Pentium aware" software looks for and catches any incorrect
division result caused by the bug and recalculates the result using
some other method so you still get the right answer.


CPU ID PROGRAM:
The only sure way to determine if your CPU does or does not have the
FDIV flaw is to identify the CPU's family, model, and stepping, and
compare that against a list of ones known to have the bug. Pentium
chips have a special instruction, CPUID, which returns chip
identification information. Intel has an official program which uses
the CPUID instruction to identify CPU chips, including which ones have
the FDIV bug. The program is available at:

ftp.intel.com:/pub/IAL/pentium/cpuidf.exe - executable
ftp.intel.com:/pub/IAL/pentium/cpuidf.txt - instructions

World Wide Web (http://www.intel.com under Pentium)
CompuServe (GO: Intel)
America On-Line (Keyword:Intel)


=======================================================================
CHAPTER 9: PENTIUM JOKES

"Intel Inside? Don't Divide!"

"Sure it gives the wrong answer, but look how FAST it is!"

Ever wonder why the "Intel inside" emblem is in a circle that doesn't
quite meet?

P - Produces
E - Erroneous
N - Numbers
T - Through
I - Incorrect
U - Understanding of
M - Mathematics

"I heard that Intel lost one of its divisions today..."

"Redefining the PC -- and Mathematics As Well..."

"Nearly 300 Correct Opcodes"

-------------------------------------------------------
TOP .9999999998 REASONS TO BUY A FLAWED PENTIUM MACHINE
=======================================================
9.9999973251 -- YOUR CURRENT COMPUTER IS TOO ACCURATE
8.9999163362 -- YOU WANT TO GET INTO Guinness Book of World Records
AS "OWNER OF THE MOST EXPENSIVE PAPERWEIGHT"
7.9999414610 -- MATH ERRORS ADD ZEST TO LIFE
6.9999831538 -- YOU NEED AN ALIBI FOR THE I.R.S.
5.9999835137 -- THE "INTEL INSIDE" LOGO MATCHES YOUR DECOR PERFECTLY
4.9999999021 -- YOU NO LONGER HAVE TO WORRY ABOUT THE CPU OVERHEATING
3.9998245917 -- THE PENTIUM CHIP WILL MAKE A LOVELY PENDANT
2.9991523619 -- YOU WANT TO LEARN MORE ABOUT THIS 'NEW MATH'
1.9999103517 -- YOU GOT A GREAT DEAL FROM JPL
And the #0.9999999998 reason to buy a flawed Pentium machine:
-- YOUR Doom II GAME DOESN'T SEEM TO CARE

-------------------------------------------------------

Q: What does the pentium assembler command JCA mean?
A: Jump on Correct Answer ....

Q: What is Intel's follow-on to the Pentium?
A: Repentium.

Q: What does the element Pentium decay into?
A: Jewelry with the emission of a press release.

(Yes it's true, a number of the flawed Pentium chips were converted
into jewelry and keychains. But creating art and jewelry out of man
made junk originally intended for some other use is nothing new.)

The #1 favorite Pentium joke seems to be:

Q: Why didn't Intel call the Pentium the 586?
A: Intel added 486 and 100 on the first Pentium and got
585.999983605

This joke actually doesn't make sense, since it's a division flaw, not
an addition flaw. Still the joke was published in P.C. Magazine's
March 14, 1995 issue "Abort, Retry, Fail" humor section as their
favorite.

Humor gives us insight into human nature. When the Pentium flaw was
discovered it became a fad to peel off the "Intel Inside" sticker and
place it on some other appliance or item of lesser worth. "Intel
Inside" stickers have been found on 8-track tape recorders, toilets,
washing machines, telephones, coffeepots, a vacuum cleaner (it sucks
*much* more than before), and on the trash bin.

Of course there's nothing new about peeling off stickers and placing
them elsewhere. One person has a refrigerator that purports to offer
eight events over a three week period and Hi-Fi stereo, some clip-on
sun glasses that are frost-free and 23 cubic feet, and a radio receiver
that is apparently made by Cuisinart and boasts a copper bottom and a
stay-cool handle.

------------------------

Back in the days of slide rules a person would ask an engineer what 8/2
was and the engineer would wip out his trusty slide rule, slide the
hairline back and forth a little, and answer "about 3.99" Nowadays
when someone asks an engineer what 8 divided by 2 is, the engineer
turns to his trusty Pentium computer, enters a quick calculation, and
answers "about 3.99".

==============================================================
REFERENCES

[1] "Architecture of the Pentium Microprocessor", by Donald Alpert and
Dror Avnon, IEEE Micro,V-13:11-21(June 1993).

[2] "Higher-Radix Division Using Estimates of the Divisor and Partial
Remainders", Daniel E. Atkins, IEEE Transactions on Computers,
C-17:925-935(1968)

THE INTEL WHITE PAPER ON THE PENTIUM FDIV FLAW
ftp.intel.com /pub/IAL/pentium/white11.ps
http://www.intel.com/procs/support/pentium/fdiv/white11.ps
(The Postscript code is wrapped at 255 at several places and may
require manual editing to unwrap it.)

ftp.microsoft.com
./Softlib/MSLFILES:
3/27/95 S15242 VCFDIV.EXE Pentium FDIV Patch for Visual C++ 2.0
12/22/94 S15039 WE1136.EXE Pentium FPU Patch for Use with Microsoft Excel
1/25/95 S15075 WW1138.EXE Updated Calculator Accessory for Windows
2/ 3/95 S15157 WW1140.EXE Windows Solution for Faulty Pentium FPU

oak.oakland.edu
SimTel/msdos/sysutil
586test4.zip B 10504 950418 NAiS Pentium Flaw Detector v4.00
prctst1d.zip B 2877 941207 Detects CPU-type and Pentium FDIV BUG

LIST OF 1738 SINGLE PRECISION DIVISIONS WHICH HIT BUG
ftp.isi.edu /pub/carlton/pentium/

PENTIUM FDIV BUG HISTORY
ftp.mathworks.com /pub/pentium/

MS-WINDOWS CALCULATOR UPDATE:
An updated version of the Microsoft Windows calculator which fixes
the rounding bug [2.01 - 2.00 = 0.0] is available from the following
site. This bug is unrelated to the Pentium division bug. This
version of the Microsoft Windows calculator is apparently also not
"Pentium Aware":

ftp.microsoft.com
./Softlib/MSLFILES:
1/25/95 S15075 WW1138.EXE Updated Calculator Accessory for Windows


OTHER PAPERS BY DAVID W. DELEY:

"Computer Random Number Generators", 1991.
ftp.netcom.com /pub/de/deleyd/random_numbers.doc
http://www.phantom.com/~dsharp/rn.html

Jon Jenkins

unread,
Oct 30, 1995, 3:00:00 AM10/30/95
to del...@netcom.com
Hi

>For most people the flaw itself is just an entertaining curiosity since
>it's rare and the error in precision when it does occur is small.

^^^^^^^^

Although I agree with the first part viz the
"entertaining" I disagree with the second
part viz "rare" and "small"! As for rare
it is only "rare" *IF* you assume a random
usage of divisors. If you happen to hit
on a divisor pattern which causes the lookup
error the error is anything but "rare".

As to "small" it really depends upon what precision
you require. If you are talking either about single
calculations an error in the 4-8th sig digit
may or may not considered "small". But when the
subsequent answer may be scaled, is an intermediate
answer in a series such as matrix solution or FEA
the resultant error can be quite "large".
This is the reason I found the error because a
simulation package on UNIX workstation gave an
answer several orders of magnitude different
from the Pentium port. This is not a
theoretical consideration but a practical result
from a real world simulation. In this case the
error was not "small" by any definition.

Although your article is obviously directed at the
technical aspects you do comment on the "psychology"
of the event. So too will I. As I said at the time
the problem was not the bug itself but the non
disclosure. Intel seem to have learnt a lesson
about disclosure, witness the recent posting
regarding Windows 95 problems. How many hours
of frustration were saved by that single post!!

Overall I enjoyed your article, thank you for
sharing it.


Jon

--
----------------------------------------------------------------------
Name: Dr Jon Jenkins Location: Digital Equipment Corporation NaC
Voice/Fax: 61-7-55-75-0151/100 Burnett Place, Research Park,
Inet: jenk...@ozy.dec.com Bond University, Gold Coast
Close Proximity: "HEY YOU !!!" QLD, AUSTRALIA 4229
"Daddy, what's outside the Universe?" (My 5 year old.....)
-----------------------------------------------------------------------


0 new messages