fixed now point slower than floating point ?

Chris Chan

unread,

Nov 13, 2000, 3:00:00 AM11/13/00

to

Hey, does anyone know whether fixed point has become slower than
floating point calculations on Intel CPUs ?

Paul Hsieh

unread,

Nov 13, 2000, 3:00:00 AM11/13/00

to

ch...@ghostchip.com says...

> Hey, does anyone know whether fixed point has become slower than
> floating point calculations on Intel CPUs ?

It still depends on the problem at hand. However, well scheduled
floating point code can deliver results at a rate of nearly one operation
per clock. Fixed point code usually has a lot of "fixups" that may make
it not worth it versus todays FPUs.

However, there are strong indications that the P4 FPU will actually be
significantly slower than the P-!!! or Athlon's FPU of today. So if you
are targetting that processor, a return to fixed point might be in order.

--
Paul Hsieh
http://www.pobox.com/~qed/

H.W. Stockman

unread,

Nov 13, 2000, 3:00:00 AM11/13/00

to

By fixed point, do you mean plain integer operations like add edx,eax? On
some Pentium-class CPUs, integer division
(DIV EAX) is still very slow -- many clocks. But the simpler inther ops are
usually still much faster than equivalent FP ops, because they aren't
constrained by a stack, and can often be performed in parallel.

If you mean more traditional fixed point, where each worh has a fixed
binary point at, say, bit 15 -- then fixed point multiplication can be
slower than the equivalent FP multiplication. The reason is that the fixed
point has to be readjusted after the operation (via shifts of byte swaps),
and the operation may have to be performed with paired registers to preserve
accuracy.

With SSE, many floating point operations are very fast, and with SSE2,
double precision FP will be very fast. However, the SSE is not completely
IEEE FP.

"Chris Chan" <ch...@ghostchip.com> wrote in message
news:F9TP5.78620$x6.15...@news20.bellglobal.com...

Thomas Womack

unread,

Nov 14, 2000, 3:00:00 AM11/14/00

to

"H.W. Stockman" <stoc...@earthlink.net> wrote

> By fixed point, do you mean plain integer operations like add edx,eax? On
> some Pentium-class CPUs, integer division (DIV EAX) is still very slow --

> many clocks. But the simpler integer ops are usually still much faster

> than equivalent FP ops, because they aren't constrained by a stack, and
> can often be performed in parallel.

And don't forget MMX; I'd imagine that might still be the absolute-fastest
way to do arithmetic on numbers in 16-bit fixed-point formats. And much
faster as the integers get narrower; up to 64-wide, if you're interested in
bit operations, and make that 128-wide on the P4 (spot the man with the
cellular-automaton fixation :))

> If you mean more traditional fixed point, where each word has a fixed

> binary point at, say, bit 15 -- then fixed point multiplication can be
> slower than the equivalent FP multiplication. The reason is that the
> fixed point has to be readjusted after the operation (via shifts of byte
> swaps), and the operation may have to be performed with paired
> registers to preserve accuracy.

MUL does the paired-register multiply anyway, and SHRD does the shift on
EDX:EAX to go with it (though I suppose you usually use a 16.16 format and
just look at EDX). The P3 optimisation guide says little of use, but I
vaguely recall that P6 and Athlon don't bother computing the top 32 bits of
the multiply if you don't look at them, and have fairly nasty latencies to
compute them if you do.

> With SSE, many floating point operations are very fast, and with SSE2,
> double precision FP will be very fast. However, the SSE is not completely
> IEEE FP.

I'd be surprised if there were tasks where you're prepared to use
fixed-point, with its complete nonsense in the event of overflow and
unpleasant numerical properties, and where minor deviations from the IEEE
standard for a floating-point implementation were in the least significant.

I wonder vaguely if it's possible to do four-wide parallel 32-bit fixed
arithmetic using SSE2's widened-MMX extensions; I suppose the multiply would
be ugly, you'd have to pull the int<4>s apart, shift using that weird
byte-wide shift and put everything back together. Might still be faster than
the P4 ALU for multiply, though double-pumped add is nice.

Grr, P4 systems will be around within a couple of weeks, but even saving
carefully I don't think I'll be able to afford one before the middle of next
year ... I can't see P4/1400, pair of 128M PC800 RIMMs and motherboard
costing much less than £1000, whilst a K7/1000, pair of 128M PC133 DIMMs and
motherboard are well under £500 already. And I'm not optimistic that even
grotesquely gnarly SSE2 assembler can run the P4 twice as fast as twiddly
3dNow assembler can drive the K7.

Tom

H.W. Stockman

unread,

Nov 14, 2000, 3:00:00 AM11/14/00

to

"Thomas Womack" <t...@womack.net> wrote in message
news:8ur6sj$mlf$4...@newsg4.svr.pol.co.uk...
> "H.W. Stockman" <stoc...@earthlink.net> wrote
[...]

> > If you mean more traditional fixed point, where each word has a fixed
> > binary point at, say, bit 15 -- then fixed point multiplication can be
> > slower than the equivalent FP multiplication. The reason is that the
> > fixed point has to be readjusted after the operation (via shifts of byte
> > swaps), and the operation may have to be performed with paired
> > registers to preserve accuracy.
>
> MUL does the paired-register multiply anyway, and SHRD does the shift on
> EDX:EAX to go with it (though I suppose you usually use a 16.16 format and
> just look at EDX). The P3 optimisation guide says little of use, but I

Last I checked, SHRD was quite slow on newer Pentia. In some cases,
the breakdown of registers into (e.g.) al, ax and eax can be used to effect
a shift.

Mul will do a 32x32 to 64; nonetheless, the paired registers do have to be
preserved
and carried through any complicated operation, to preserve precision, since
there are
no implicit guard bits.

It has been a long time since I've been concerned with this issue, but I
published the fixed point code for the Mandelbrot set in MicroCornucopia,
Sept-October Issue, 1988, pp22-29 (magazine now defunct). Back then, 387s
were in short supply and cost ~$500 (like about $800 nowadays), and the
integer fixed point was about 4 or 5 times faster than 387 code. The basic
idea went on to be used in fractint, but by the time the 486 came out, the
advantage of the integer code was severly eroded, even though the 486 still
had ~10-20 cycle times for most FP ops (not including loads).

bill davidsen

unread,

Nov 16, 2000, 3:00:00 AM11/16/00

to

In article <F9TP5.78620$x6.15...@news20.bellglobal.com>,

Chris Chan <ch...@ghostchip.com> wrote:
| Hey, does anyone know whether fixed point has become slower than
| floating point calculations on Intel CPUs ?

I'd bet the price of a good beer that you don't know the difference
between integer and fixed point arithmetic. Float is almost always
faster than fixed point, only faster than integer in some nice special
cases where you can make it look like vector arithmetic.

--
bill davidsen <davi...@tmr.com> CTO, TMR Associates, Inc
Make the rules? I don't make the rules. I don't even FOLLOW the rules!