Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

270 views

Skip to first unread message

Aug 7, 1987, 9:34:59 PM8/7/87

to

In article <3...@astroatc.UUCP>, jo...@astroatc.UUCP (John F. Wardale) writes:

> A couple weeks ago, I asked about the infamous Intel 386 multiply

> bug... Does anyone REALLY KNOW, or is the world in as much

> darkness as I am? Should I request the "test/demo" program and

> try to grok it? (I don't speak Intel assembler, but this may be

> the only way!)

>

> Can anyone shed any more light on this? We're involved in

> designing a large computer thru extensive use of simulation, as

> I'm sure Intel did. I'm interested in the following:

>

> * i386 multiply bug details

> -- do all the failing parts ALWAYS fail the same way, or

> is it a transient failure

> * why the simulations missed it (tricky 2nd order effect???)

> * pitfalls of simulations (in general)

>

The July, 1987 issue of COMPUTER DESIGN magazine has an article on page 22

"80386 Multiplier Problem Spotlights VLSI Testability Issues"

which you might find illuminating. The following excerpt is taken

without permission:

{beginning of excerpt}

`Contrary to industry speculation, the 80386 multiplier errors result

from a layout problem, not from an error in logic design. "We didn't

allow enough margin to catch the worst-case pattern in the multiplier

at the corners of our process," explains Dana Krelle, Intel's 80386

marketing manager. "As a result, some chips, at some temperature/

voltage/frequency points, will produce errors from particular combinations

of 32-bit operands." The error, apparently due to unintentional

coupling between adjacent cells in the multiplier, escaped Intel's

simulation and chip verification process until it was spotted in a

subsequent stress-testing program.' {end of excerpt}

Another interesting facet of the problem hasn't been mentioned

yet --- the 80386 doesn't have a multiplier (!!). The 386 uses

its general-purpose ALU, plus a shift-and-add microcode routine,

to perform multiplication operations. Similarly, divide operations

use the ALU plus a shift-and-subtract microcode routine.

The confusing thing is, why do multiplys fail but divides and

adds (apparently) work?? Isn't it possible to supply the

appropriate "bad" ALU operands for a regular add, or to create

them during some step of a divide?

--

-Mark Johnson *** DISCLAIMER: The opinions above are personal. ***

UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mark TEL: 408-720-1700 x208

US mail: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

Aug 21, 1987, 1:29:29 PM8/21/87

to

In article <5...@obiwan.UUCP>, ma...@mips.UUCP (Mark G. Johnson) writes:

>

> The July, 1987 issue of COMPUTER DESIGN magazine has an article on page 22

> "80386 Multiplier Problem Spotlights VLSI Testability Issues"

> which you might find illuminating. The following excerpt is taken

> without permission:

>

> {beginning of excerpt}

> `Contrary to industry speculation, the 80386 multiplier errors result

> from a layout problem, not from an error in logic design. "We didn't

> allow enough margin to catch the worst-case pattern in the multiplier

> at the corners of our process," explains Dana Krelle, Intel's 80386 [...]>

> The July, 1987 issue of COMPUTER DESIGN magazine has an article on page 22

> "80386 Multiplier Problem Spotlights VLSI Testability Issues"

> which you might find illuminating. The following excerpt is taken

> without permission:

>

> {beginning of excerpt}

> `Contrary to industry speculation, the 80386 multiplier errors result

> from a layout problem, not from an error in logic design. "We didn't

> allow enough margin to catch the worst-case pattern in the multiplier

>

> Another interesting facet of the problem hasn't been mentioned

> yet --- the 80386 doesn't have a multiplier (!!). The 386 uses

> its general-purpose ALU, plus a shift-and-add microcode routine,

> to perform multiplication operations. Similarly, divide operations

> use the ALU plus a shift-and-subtract microcode routine.

>

> The confusing thing is, why do multiplys fail but divides and

> adds (apparently) work?? Isn't it possible to supply the

> appropriate "bad" ALU operands for a regular add, or to create

> them during some step of a divide?

This is just speculation, since I know nothing about the 386 design.

If they are using a shift and add for multiplication, then they may have a

special shift-by-two-bits latch. This is useful in doing a what I think is

called a "modified Booth algorithm", where two bits worth of a multiply can

be done each clock cycle, if you can make 1x, 2x and 4x of one of the

inputs available. As far as I know, the shift by 2 bits isn't too useful

for much else. The divide algorithms I know of which don't use parallel

multipliers or groups of adders only shift by 1 bit at a time, so such a

divide wouldn't use the shift-by-two latch either. So perhaps the pattern

sensitivity lies in this shift register.

By the way, I think you can make this multiplication algorithm work for

N bits per clock cycle, if you can provide all the following multiples of one

of the inputs (call it Y) : Y, 2Y, ... (2^^N)Y .

Ken Karakotsios ken!sci

Silicon Compiler Systems

Disclaimer: These are my personal views and opinions, and don't

reflect those of my employer.

Aug 25, 1987, 7:02:13 PM8/25/87

to

In article <79...@sci.UUCP>, k...@sci.UUCP (Ken Karakotsios) writes:

[...]

> called a "modified Booth algorithm", where two bits worth of a multiply can

[...]

> By the way, I think you can make this multiplication algorithm work for

> N bits per clock cycle, if you can provide all the following multiples of one

> of the inputs (call it Y) : Y, 2Y, ... (2^^N)Y .

>

If'n I remember correctly, in fact you need to generate fairly nasty

values (like 3Y and 5Y) in order to get the 'N' bit modified Booth

algorithm to work. This is why you don't typically see orders higher

than 2 bits.

-----------------------

Bron Nelson {ihnp4, lll-lcc}!ohlone!nelson

Not the opinions of Cray Research

[...]

> called a "modified Booth algorithm", where two bits worth of a multiply can

> By the way, I think you can make this multiplication algorithm work for

> N bits per clock cycle, if you can provide all the following multiples of one

> of the inputs (call it Y) : Y, 2Y, ... (2^^N)Y .

>

values (like 3Y and 5Y) in order to get the 'N' bit modified Booth

algorithm to work. This is why you don't typically see orders higher

than 2 bits.

-----------------------

Bron Nelson {ihnp4, lll-lcc}!ohlone!nelson

Not the opinions of Cray Research

Aug 25, 1987, 7:11:40 PM8/25/87

to

In article <79...@sci.UUCP> k...@sci.UUCP (Ken Karakotsios) writes:

>In article <5...@obiwan.UUCP>, ma...@mips.UUCP (Mark G. Johnson) writes:

>>

>> Another interesting facet of the problem hasn't been mentioned

>> yet --- the 80386 doesn't have a multiplier (!!). The 386 uses

>> its general-purpose ALU, plus a shift-and-add microcode routine,

>> to perform multiplication operations. Similarly, divide operations

>> use the ALU plus a shift-and-subtract microcode routine.

>>

>> The confusing thing is, why do multiplys fail but divides and

>> adds (apparently) work?? Isn't it possible to supply the

>> appropriate "bad" ALU operands for a regular add, or to create

>> them during some step of a divide?

>

> This is just speculation, since I know nothing about the 386 design.

>If they are using a shift and add for multiplication, then they may have a

>special shift-by-two-bits latch. [..."modified Booth algorithm"...]>In article <5...@obiwan.UUCP>, ma...@mips.UUCP (Mark G. Johnson) writes:

>>

>> Another interesting facet of the problem hasn't been mentioned

>> yet --- the 80386 doesn't have a multiplier (!!). The 386 uses

>> its general-purpose ALU, plus a shift-and-add microcode routine,

>> to perform multiplication operations. Similarly, divide operations

>> use the ALU plus a shift-and-subtract microcode routine.

>>

>> The confusing thing is, why do multiplys fail but divides and

>> adds (apparently) work?? Isn't it possible to supply the

>> appropriate "bad" ALU operands for a regular add, or to create

>> them during some step of a divide?

>

> This is just speculation, since I know nothing about the 386 design.

>If they are using a shift and add for multiplication, then they may have a

The 386 need not use a shift-by-two-bits latch. One thing that needs to be

taken into account is that during normal instruction execution, time is needed

to decode the instruction and fetch the operands (even in a pipelined machine).

A microcode routine, on the other hand, needs no time to fetch and decode

instructions, so the ALU can be run full-tilt.

While I don't know the internals of the 286, the instruction timings for

MUL and IMUL (and the following experience) lead me to believe that it uses a

simple shift-and-add routine in microcode, with one shift and one add per

clock cycle. My AT had an 8MHz 80286 being run at 9.83 :-) MHz, which went

slightly flaky after about 6.5 months (I started seeing one line in the Turbo

Pascal and Turbo C editors at the wrong place on the screen). I tracked this

down to an 8-bit MUL instruction which lost one or two carry bits when the

high bit of AL was set and the other operand had sufficient 1 bits to cause

a cascade of carries on adding the last term (corresponding to the high bit)

to the result. The CPU has since been replaced [under warrantee, this system

was sold as a 10 MHz system].

BTW, the specific operands that caused the problems in the Turbo editors were

0A0h in AL and 0Fh as the other operand. (line 15 on the screen--wound up in the

middle of line 4 when the MUL yielded 160h instead of 960h) And yes, the

problem was temperature-dependent, at least initially--cranking up the air

conditioner made it go away. Later it got to the point where, after the first

six or seven minutes while the CPU was warming up, even cranking up the air

conditioner or opening up the system didn't help.

--

-=-=-=-=-=-=-=-= {harvard,seismo,ucbvax}!b.gp.cs.cmu.edu!ralf =-=-=-=-=-=-=-=-

ARPAnet: RA...@B.GP.CS.CMU.EDU BITnet: RALF%B.GP.CS.CMU.EDU@CMUCCVMA

AT&Tnet: (412) 268-3053 (school) FIDOnet: Ralf Brown at 129/31

DISCLAIMER? Who ever said I claimed anything?

"I do not fear computers. I fear the lack of them..." -- Isaac Asimov

Aug 26, 1987, 3:00:32 PM8/26/87

to

In article <3...@ohlone.UUCP>, nel...@ohlone.UUCP (Bron Nelson) writes:

> In article <79...@sci.UUCP>, k...@sci.UUCP (Ken Karakotsios) writes:

> In article <79...@sci.UUCP>, k...@sci.UUCP (Ken Karakotsios) writes:

> > By the way, I think you can make this multiplication algorithm

> > work for N bits per clock cycle, if you can provide all the

> > following multiples of one of the inputs (call it Y) : Y, 2Y, ...

> > (2^^N)Y .

> > work for N bits per clock cycle, if you can provide all the

> > following multiples of one of the inputs (call it Y) : Y, 2Y, ...

> > (2^^N)Y .

> If'n I remember correctly, in fact you need to generate fairly nasty

> values (like 3Y and 5Y) in order to get the 'N' bit modified Booth

> algorithm to work. This is why you don't typically see orders higher

> than 2 bits.

> values (like 3Y and 5Y) in order to get the 'N' bit modified Booth

> algorithm to work. This is why you don't typically see orders higher

> than 2 bits.

Oh no, we're about to get another long series of messages saying "X

does Y-bit booth product" just like the "Z has a W-bit word" series.

I doubt that higher order booth products are uncommon. To illustrate

how unspecial it is (and be the first in the long series :-) let me

point out that even a RISC microprocessor, the MIPS R2000, uses a

3-bit algorithm for integer multiply. This requires computing 3Y

first, but that's no big deal. Integer multiply takes 12 cycles. A

little slow, but acceptable. At least it's not 33 cycles, as in

architectures with multiply step instructions.

0 new messages

Search

Clear search

Close search

Google apps

Main menu