Floating point vs fixed arithmetics (signed 64-bit)

kishor

unread,

Mar 26, 2012, 5:22:21 AM3/26/12

to

Hi friends,
I am working on stellaris LM3s6965 (cortex-m3) & Keil 4.20 for data
acquisition. ADC
is signed 24-bit.

To perform software Gain calibration I have two options,

1. 64-bit fixed width arithmetic
uint16_t Gain; // 0x8000 means gain is 1
int32_t ADC_Reading; // It contains 24-bit signed integer ADC
reading

ADC_Reading = ((int64_t)ADC_Reading * Gain) / 0x8000; //
Gain calibration

// As multiplication of signed 24-bit & unsigned 16-bit will not fit
into 32-bit variable
// I typecast it to int64_t.

2. Single precision Float
float Gain;
int32_t ADC_Reading; // It contains 24-bit signed integer ADC
reading

ADC_Reading = ADC_Reading * Gain; // Gain
calibration

Which is better for performance wise.

Thanks,
Kishore.

Boudewijn Dijkstra

unread,

Mar 26, 2012, 6:08:47 AM3/26/12

to

Op Mon, 26 Mar 2012 11:22:21 +0200 schreef kishor <kii...@gmail.com>:

> Hi friends,
> I am working on stellaris LM3s6965 (cortex-m3) & Keil 4.20 for data
> acquisition. ADC is signed 24-bit.
>
> To perform software Gain calibration I have two options,
>
> 1. 64-bit fixed width arithmetic

> ADC_Reading = ((int64_t)ADC_Reading * Gain) / 0x8000;

> 2. Single precision Float

> ADC_Reading = ADC_Reading * Gain;
>

> Which is better for performance wise[?]

An (u)int64_t multiplication is always faster than a float multiplication,
assuming you don't have a hardware FPU.

Also, if you can deal with some loss of precision, you can pre-divide both
operands enough to be able to use 32-bit multiplication.

--
Gemaakt met Opera's revolutionaire e-mailprogramma:
http://www.opera.com/mail/
(Remove the obvious prefix to reply privately.)

Arlet Ottens

unread,

Mar 26, 2012, 7:14:12 AM3/26/12

to

On 03/26/2012 11:22 AM, kishor wrote:
> Hi friends,

> I am working on stellaris LM3s6965 (cortex-m3)& Keil 4.20 for data

> acquisition. ADC
> is signed 24-bit.
>
> To perform software Gain calibration I have two options,
>
> 1. 64-bit fixed width arithmetic
> uint16_t Gain; // 0x8000 means gain is 1
> int32_t ADC_Reading; // It contains 24-bit signed integer ADC
> reading
>
> ADC_Reading = ((int64_t)ADC_Reading * Gain) / 0x8000; //
> Gain calibration

Cortex-M3 has a 32x32->64 bit multiply instruction, so if your compiler
is smart enough, it might use that. Check the generated assembly output.

If not, write your own assembly version.

David Brown

unread,

Mar 26, 2012, 7:24:26 AM3/26/12

to

Unless you require the absolutely fastest performance (and someone
asking the original question clearly is not - or he would already have
found the answer), do not write your own assembly code. It's just
pointless optimisation for optimisation's sake.

By all means, look at the generated assembly and see if it uses the
ideal instruction. If it doesn't, then file a report or support request
with the compiler supplier if you want.

Don't do inline assembly unless you really have a reason for it,
especially if you are not used to it.

kishor

unread,

Mar 26, 2012, 8:24:28 AM3/26/12

to

Thanks for reply,

Compiler has generated "UMULL" instruction, (32-bit * 32-bit)
As it is signed multiplication it generated another three instructions.

Assembly listing is as below,

r1 - ADC_Reading (signed)
r2 - Gain (unsigned)

UMULL r0,r5,r1,r2 ; unsigned multiply 32 * 32
ASRS r3,r1,#31 ; Arithmetic Shift Right
MLA r2,r3,r2,r5 ; Multiply & Accumulate
MLA r1,r1,r12,r2 ; Multiply & Accumulate

MOV r2,#0x8000
MOV r3,r12

BL __aeabi_ldivmod ; 64-bits divider function

So multiplication is not a big deal. signed 64-bit divide takes time.

So still is it better than float?

Thanks,
Kishore.

Fredrik Östman

unread,

Mar 26, 2012, 8:38:00 AM3/26/12

to

>-----< kishor >

> So multiplication is not a big deal. signed 64-bit divide takes time.

You don't need the division. You need to shift down 31 bits to get back
to 32 bits.

Perhaps the ALU can do to all for you in one step. Check the compiler for
a built-in function for 32-bit signed fractional multiplication.

Read up on fractional arithmetic.

--
Fredrik Östman

kishor

unread,

Mar 26, 2012, 9:33:48 AM3/26/12

to

On Monday, March 26, 2012 6:08:00 PM UTC+5:30, Fredrik Östman wrote:

> You don't need the division. You need to shift down 31 bits to get back
> to 32 bits.

I don't understand your point. In signed division we can't shift down bits simply.

> Perhaps the ALU can do to all for you in one step. Check the compiler for
> a built-in function for 32-bit signed fractional multiplication.
>
> Read up on fractional arithmetic.

Is there other method which avoids 64-bit division?

Kishore.

Arlet Ottens

unread,

Mar 26, 2012, 9:34:37 AM3/26/12

to

On 03/26/2012 02:24 PM, kishor wrote:
> Thanks for reply,
>
> Compiler has generated "UMULL" instruction, (32-bit * 32-bit)
> As it is signed multiplication it generated another three instructions.
>
> Assembly listing is as below,
>
> r1 - ADC_Reading (signed)
> r2 - Gain (unsigned)
>
> UMULL r0,r5,r1,r2 ; unsigned multiply 32 * 32
> ASRS r3,r1,#31 ; Arithmetic Shift Right
> MLA r2,r3,r2,r5 ; Multiply& Accumulate

> MLA r1,r1,r12,r2 ; Multiply& Accumulate

>
> MOV r2,#0x8000
> MOV r3,r12
>
> BL __aeabi_ldivmod ; 64-bits divider function
>
> So multiplication is not a big deal. signed 64-bit divide takes time.
>
> So still is it better than float?

Try doing a >> 15 instead of a / 0x8000 in your C code.

Arlet Ottens

unread,

Mar 26, 2012, 9:49:54 AM3/26/12

to

On 03/26/2012 03:33 PM, kishor wrote:
> On Monday, March 26, 2012 6:08:00 PM UTC+5:30, Fredrik Östman wrote:
>
>> You don't need the division. You need to shift down 31 bits to get back
>> to 32 bits.
>
> I don't understand your point. In signed division we can't shift down bits simply.

The difference is at most 1 LSB due to rounding differences, which is
probably less than your ADC noise.

You can make the difference even smaller by using a 32 bit gain variable.

You could also find out if ADC value is negative, reverse the sign,
perform unsigned arithmetic, and reverse the sign of the result.

David Brown

unread,

Mar 26, 2012, 9:45:24 AM3/26/12

to

On 26/03/2012 15:33, kishor wrote:
> On Monday, March 26, 2012 6:08:00 PM UTC+5:30, Fredrik Östman wrote:
>
>> You don't need the division. You need to shift down 31 bits to get
>> back to 32 bits.
>
> I don't understand your point. In signed division we can't shift down
> bits simply.

Correct.

But the compiler should do the strength reduction for you - take note of
the sign, do everything unsigned, then restore the sign. If it doesn't,
then check your optimisation settings and/or complain to the supplier.

>
>> Perhaps the ALU can do to all for you in one step. Check the
>> compiler for a built-in function for 32-bit signed fractional
>> multiplication.
>>
>> Read up on fractional arithmetic.
>
> Is there other method which avoids 64-bit division?
>

Yes - do everything unsigned. First think if the incoming data really
is signed - in most cases it is not. But if you have signed data (say
from a differential input), first note the sign then convert to a
positive value if needed. Then do your scaling and division (and if the
compiler can't convert an unsigned divide by 0x8000 to a shift, it's a
poor compiler - and you can do the shift by hand). Then restore the sign.

Fredrik Östman

unread,

Mar 26, 2012, 10:34:42 AM3/26/12

to

>-----< kishor >

> I don't understand your point. In signed division we can't shift down
> bits simply.

In fractional representation (all numbers are between -1 and 1) we can
and we do.

--
Fredrik Östman

Tim Wescott

unread,

Mar 26, 2012, 1:25:22 PM3/26/12

to

But note that ANSI C leaves the contents of the most significant bits of
a right-shift of a negative number up to the implementor -- it is equally
valid within ANSI-C specs to shift in zeros (affecting both sign and
magnitude) as it is to shift in ones.

The only reliable way to do this across compilers (and even major
revisions) is to convert to unsigned, shift (unsigned right shifts always
shift in zeros), then restore the sign as necessary.

It should be habit. Never right-shift a signed number when it might be
positive.

--
My liberal friends think I'm a conservative kook.
My conservative friends think I'm a liberal kook.
Why am I not happy that they have found common ground?

Tim Wescott, Communications, Control, Circuits & Software
http://www.wescottdesign.com

Arlet Ottens

unread,

Mar 26, 2012, 2:19:56 PM3/26/12

to

On 03/26/2012 07:25 PM, Tim Wescott wrote:
> On Mon, 26 Mar 2012 15:34:37 +0200, Arlet Ottens wrote:
>
>> On 03/26/2012 02:24 PM, kishor wrote:
>>> Thanks for reply,
>>>
>>> Compiler has generated "UMULL" instruction, (32-bit * 32-bit) As it is
>>> signed multiplication it generated another three instructions.
>>>
>>> Assembly listing is as below,
>>>
>>> r1 - ADC_Reading (signed)
>>> r2 - Gain (unsigned)
>>>
>>> UMULL r0,r5,r1,r2 ; unsigned multiply 32 * 32 ASRS
>>> r3,r1,#31 ; Arithmetic Shift Right MLA
>>> r2,r3,r2,r5 ; Multiply& Accumulate MLA
>>> r1,r1,r12,r2 ; Multiply& Accumulate
>>>
>>> MOV r2,#0x8000
>>> MOV r3,r12
>>>
>>> BL __aeabi_ldivmod ; 64-bits divider function
>>>
>>> So multiplication is not a big deal. signed 64-bit divide takes time.
>>>
>>> So still is it better than float?
>>
>> Try doing a>> 15 instead of a / 0x8000 in your C code.
>
> But note that ANSI C leaves the contents of the most significant bits of
> a right-shift of a negative number up to the implementor -- it is equally
> valid within ANSI-C specs to shift in zeros (affecting both sign and
> magnitude) as it is to shift in ones.

I agree that a right shift on a negative value is implementation
defined, but it's very unlikely that a compiler for Cortex M3 would not
use the arithmetic shift right instruction.

If you're really paranoid, you could build in a run-time check at
program initialization for expected right shift behavior.

Rich Webb

unread,

Mar 26, 2012, 4:45:27 PM3/26/12

to

On Mon, 26 Mar 2012 20:19:56 +0200, Arlet Ottens <usen...@c-scape.nl>
wrote:

Why not do an explicit divide by 2 (or 4 or 8 or ...)? Surely most
modern compilers are smart enough to recognize that this can be
accomplished via an arithmetic right shift if the processor supports
such an instruction or a "logical right shift" (zero filled) followed by
stuffing ones up topside if the old MSB was set and the type was signed.

Just curious.

--
Rich Webb Norfolk, VA

Tim Wescott

unread,

Mar 26, 2012, 6:15:44 PM3/26/12

to

Actually, your answer was in the context you included: the OP's machine
code shows a divide function being called, with 0x8000 as the denominator.

Rich Webb

unread,

Mar 26, 2012, 7:09:30 PM3/26/12

to

On Mon, 26 Mar 2012 17:15:44 -0500, Tim Wescott <t...@seemywebsite.com>

Aha! Damn that automatic quote-folding feature...

Paul E. Bennett

unread,

Mar 27, 2012, 6:28:11 AM3/27/12

to

kishor wrote:

> Hi friends,
> I am working on stellaris LM3s6965 (cortex-m3) & Keil 4.20 for data
> acquisition. ADC
> is signed 24-bit.

Except when the numbers really do get to be too ridiculously large or wide
ranging to handle doing the calculations by long fractions will tend to be
faster for most instrumentation needs.

It is always best to look closely at the various ways you can implement your
calculations. Gains are easy by fractions (width limited multiplies that
discard the least significant bits). Calibration curves can often be done by
Polynomial Approximations (see Hastings reference).

"Approximations for Digital Computers" by Cecil Hastings Jr., T. Hayward,
James P. Wong Jr. ISBN 0-691-07914-5

--
********************************************************************
Paul E. Bennett...............<email://Paul_E....@topmail.co.uk>
Forth based HIDECS Consultancy
Mob: +44 (0)7811-639972
Tel: +44 (0)1235-510979
Going Forth Safely ..... EBA. www.electric-boat-association.org.uk..
********************************************************************

kishor

unread,

Mar 27, 2012, 7:59:22 AM3/27/12

to

Sorry for late reply,

I have experimented few things,

1. Signed 32-bit divide by constant of 2^n
Compiler uses 2 ASR (Arithmetic shift right) & 1 ADD instruction instead of SDIV instruction

2. Signed 32-bit divide by constant other than 2^n
It uses SMLAL (signed multiply & accumulate long) & other instructions

3. Unsigned 64-bit divide by constant of 2^n
It uses 2 LSR & 1 ORR instruction.

4. Unsigned 64-bit divide by constant other than 2^n
5. Signed 64-bit divide by constant of 2^n
6. Signed 64-bit divide by constant other than 2^n

It calls __aeabi_ldivmod, __aeabi_uldivmod functions

I have two queries,

1. Is hardware DIV / SDIV instructions are slower than shift logic?
2. Is it possible to generate shift based logic to case 5 mentioned above?
(Signed 64-bit divide by constant of 2^n)

Thanks,
Kishore.

David Brown

unread,

Mar 27, 2012, 9:25:21 AM3/27/12

to

On 27/03/2012 13:59, kishor wrote:
> Sorry for late reply,
>
> I have experimented few things,
>
> 1. Signed 32-bit divide by constant of 2^n

> Compiler uses 2 ASR (Arithmetic shift right)& 1 ADD instruction instead of SDIV instruction

>
> 2. Signed 32-bit divide by constant other than 2^n

> It uses SMLAL (signed multiply& accumulate long)& other instructions

>
> 3. Unsigned 64-bit divide by constant of 2^n

> It uses 2 LSR& 1 ORR instruction.

>
> 4. Unsigned 64-bit divide by constant other than 2^n
> 5. Signed 64-bit divide by constant of 2^n
> 6. Signed 64-bit divide by constant other than 2^n
>
> It calls __aeabi_ldivmod, __aeabi_uldivmod functions
>
> I have two queries,
>
> 1. Is hardware DIV / SDIV instructions are slower than shift logic?

Yes. DIV instructions take several cycles (I don't know how many for
the M3), and will cause pipeline stalls which reduce the throughput of
other instructions. Shifts are therefore faster, even if they need a
few other instructions around them. It is also faster to multiply by a
scaled pre-calculated reciprocal (case 2 above).

> 2. Is it possible to generate shift based logic to case 5 mentioned above?
> (Signed 64-bit divide by constant of 2^n)

Yes.

The easiest way to make sure you get signed division right is to
separate out the sign, then use unsigned arithmetic. That way you can't
go wrong, and the C code is portable.

>
> Thanks,
> Kishore.
>
>

David T. Ashley

unread,

Mar 27, 2012, 11:28:18 AM3/27/12

to

On Mon, 26 Mar 2012 02:22:21 -0700 (PDT), kishor <kii...@gmail.com>
wrote:

Without FPU support, assuming that the processor has basic integer
multiplication instructions, integer operations are ALWAYS faster than
floating-point operations. Usually _far_ faster. And always more
precise.

The general nature of computers is that all data into the computer has
to be quantized in some way (the machine can only accept digital
data), and all data out has to be quantized in some way (again, the
machine can only output digital data).

There is already quantization error coming in because it is entering a
discrete system. How much error depends on the quality of the
hardware, which usually depends on how much one was willing to spend
on it.

One measure of "goodness" of calculations is whether, for a given set
of inputs (all integers), one can prove analytically that one is able
to select the best outputs (again, all integers). This confines any
error to the hardware rather than the software.

It ends up that for many types of calculations, using integer
operations, one can meet this measure of goodness. However, one
usually requires larger integers than development tools support in a
native way. Which means inline assembly or large integer libraries
which were written in assembly-language. Preferably the latter.

In the specific case of linearly scaling by a factor, generally what
one wants to do is select a rational number h/k close to the real
number to be multiplied by.

There are two subcases.

k = 2^q may be a power of two, in which case it is an integer
multiplication followed by a shift or a "byte pluck". It should be
obvious why this is extremely efficient.

2^q may be something other than a power of two, which is the general
case. In that case, you may find this web page helpful:

http://www.dtashley.com/howtos/2007/01/best_rational_approximation/

Finding the best rational approximation when k is not a power of 2 is
a topic from number theory, and all the information you are likely to
need is at the page above. Software is included.

You're welcome.

Dave Ashley

upsid...@downunder.com

unread,

Mar 27, 2012, 11:52:09 AM3/27/12

to

On Tue, 27 Mar 2012 11:28:18 -0400, David T. Ashley
<das...@gmail.com> wrote:

>
>Without FPU support, assuming that the processor has basic integer
>multiplication instructions, integer operations are ALWAYS faster than
>floating-point operations. Usually _far_ faster. And always more
>precise.

Floating point instructions MUL/DIV are trivial, just multiply/divide
the mantissa and add/sub the exponent.

With FP add/sub you have to denormalize one operand and then normalize
the result, which can be quite time consuming, without sufficient HW
support.

This can be really time consuming, if the HW is designed by an idiot.

David T. Ashley

unread,

Mar 27, 2012, 1:02:19 PM3/27/12

to

Your observations are valid. But I have yet to see a practical
example of something that can be done faster and with equal accuracy
in floating point vs. using integer operations.

I concur with your observations. After reading your first paragaph
... yeah, floating-point multiplication is pretty simple so long as
the floating point format is sane.

Before reading your post, I my mental model was that floating-point
operations might be 20 times as slow as integer operations. Now I'm
thinking maybe 2-3 times.

DTA.

Walter Banks

unread,

Mar 27, 2012, 2:56:49 PM3/27/12

to

I did a fixed point support package for our 8 bit embedded systems
compilers and one interesting metric came out of the project.

Given a number of bits in a number and similar error checking fixed
or float took very similar amounts of execution time and code size
in applications.

For example 32 bit float and 32 bit fixed point. They are not exact
but they are close. In the end much to my surprise the choice is
dynamic range or resolution.

There are other factors IEEE754 has potentially much more error
checking but not all libraries a written to support it, and not
applications need it.

Regards,

w..
--
Walter Banks
Byte Craft Limited
http://www.bytecraft.com

Tim Wescott

unread,

Mar 27, 2012, 3:17:29 PM3/27/12

to

That's interesting, because in my experience fixed-point fractional
arithmetic (i.e., 0x7fffffff = 1 - 2^-31, 0x80000001 = -1 + 2^-31), with
saturation-on-add, is significantly faster (3x to 10x) than floating
point on all the machines I've tried it except for those with floating-
point hardware.

I have a portable version that works on just about anything that's ANSI-C
compatible, and when I really need speed I rewrite the arithmetic
routines in assembly for about a 2x increase.

The only processor that came close to matching it was the TMS320F2812,
where we used the ANSI-C compatible version that was just about matched
by the floating-point package that came with the tool set (and I _know_
that TI cut corners with that floating point package). That's the _only_
processor in my experience where the floating point could keep up with
the ANSI-C version, and I would expect that had I written an assembly
version it would have been faster yet.

Walter Banks

unread,

Mar 27, 2012, 4:35:14 PM3/27/12

to

What you saw was what I was expecting. My points in the post was to be
careful in assuming that fixed is going to be dramatically better. At least for
8 bits the variable size in bits is a significant factor when all math is
multiprecision.

One of the keys in our metrics was the target was 8 bit processors
and there was an exchange between precision and dynamic range
but the bit sizes remained the same.

Real applications are probably dominated by scaling and precision
reducing the number of bits used by fixed point for the same application.

It didn't make sense until I realized that it was 8 bit processors using
software mults and divides and 32 bit floating point uses for the most
part 24bit mults and divides and a few adds/subtracts for the exponents.
32 bit fixed point uses 32 bit mults/divides adding to the cycle count.

My experience with 32 bit processors is similar to yours although
I don't have metrics to back it up.

Walter..

Tim Wescott

unread,

Mar 27, 2012, 11:36:59 PM3/27/12

to

Ah. I see your point. 9 multiplies and some shifting during addition
vs. 16 multiplies might well turn out to be a wash.

The first serious control loop I did was quite starved for clock cycles,
and used a 24-bit accumulator, but with an 8 x 16 (or 8 x 8) multiply,
and had 16-bit data paths other than that.

--
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

David Brown

unread,

Mar 28, 2012, 3:00:02 AM3/28/12

to

That's not a big surprise - with floating point, the actual arithmetic
is 24-bit, which will be quite a lot faster than 32-bit on a small 8-bit
machine (especially if it doesn't have enough registers or data pointers).

David Brown

unread,

Mar 28, 2012, 3:17:14 AM3/28/12

to

On 27/03/2012 19:02, David T. Ashley wrote:
> On Tue, 27 Mar 2012 18:52:09 +0300, upsid...@downunder.com wrote:
>
>> On Tue, 27 Mar 2012 11:28:18 -0400, David T. Ashley
>> <das...@gmail.com> wrote:
>>
>>>
>>> Without FPU support, assuming that the processor has basic integer
>>> multiplication instructions, integer operations are ALWAYS faster than
>>> floating-point operations. Usually _far_ faster. And always more
>>> precise.
>>
>> Floating point instructions MUL/DIV are trivial, just multiply/divide
>> the mantissa and add/sub the exponent.
>>
>> With FP add/sub you have to denormalize one operand and then normalize
>> the result, which can be quite time consuming, without sufficient HW
>> support.
>>
>> This can be really time consuming, if the HW is designed by an idiot.
>
> Your observations are valid. But I have yet to see a practical
> example of something that can be done faster and with equal accuracy
> in floating point vs. using integer operations.
>

It depends on the chip, the type of floating point hardware it has, the
operations you need, the compiler, and the code quality. For a lot of
heavy calculations done with integer arithmetic, you need a number of
"extra" instructions as well as the basic add, subtract, multiply and
divides. You might need shifts for scaling, mask operations, extra code
to get the signs right, etc. And the paths for these are likely to be
highly serialised, with each depending directly on the results of the
previous operation, which slows down pipelining. With hardware floating
point, you have a much simpler instruction stream, which can result in
faster throughput even if the actual latency for the calculations is the
same.

This effect increases with the size and complexity of the processor.
Obviously it is dependent on the processor having floating point
hardware for the precision needed (single or double), but once you have
any sort of hardware floating point you should re-check all your
assumptions about speed differences. You could be wrong in either
direction.

dp

unread,

Mar 28, 2012, 5:38:03 AM3/28/12

to

On Mar 28, 10:17 am, David Brown <da...@westcontrol.removethisbit.com>
wrote:
> ... And the paths for these are likely to be

> highly serialised, with each depending directly on the results of the
> previous operation, which slows down pipelining. With hardware floating
> point, you have a much simpler instruction stream, which can result in
> faster throughput even if the actual latency for the calculations is the
> same.

Hi David,
this reminds me of something I was through not so long ago.
On the MPC5200B one gets a good FPU, and I was beginning to use it
for DSP purposes (using the FMADD.D opcode, 64 bit FP MAC, that is
64*64+64) .
It is specified at 2 cycles per FMADD opcode.
I did a loop with just one FMADD inside and guess what, I got 25
(or was it 35) cycles... Data dependencies, obviously. I had to
spread the loop over 24+ FP registers in order to eliminate the
data dependencies (well, and hid some of the data & coeeficients
load/store as a bonus) and got an average of 5.5 cycles eventually
IIRC, well, somewhat <6 anyway (including memory accesses).

Dimiter

------------------------------------------------------
Dimiter Popoff Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Tim Wescott

unread,

Mar 28, 2012, 1:20:51 PM3/28/12

to

The key point is "it is dependent on the processor having floating point
hardware for the precision needed". And, I might add, on other things --
see Walter Banks's comments in another sub-thread about 32-bit floating
point vs. 32-bit integer math.

In my experience with signal processing and control loops, having a
library that implements fixed-point, fractional arithmetic with
saturation on addition and shift-up is often faster that floating point
_or_ "pure" integer math, and sidesteps a number of problems with both.
It's at the cost of a learning curve with anyone using the package, but
it works well.

On all the processors I've tried it except for x86 processors, there's
been a 3-20x speedup once I've hand-written the assembly code to do the
computation (and that's without understanding or trying to accommodate
any pipelines that may exist).

But on the x86 -- which is the _only_ processor that I've tried it that
had floating point -- 32-bit fractional arithmetic is slower than 64-bit
floating point.

So, yes -- whether integer (or fixed point) arithmetic is going to be
faster than floating point depends _a lot_ on the processor. So instead
of automatically deciding to do everything "the hard way" and feeling
clever and virtuous thereby, you should _benchmark_ the performance of a
code sample with floating point vs. whatever fixed-point poison you
choose.

Then, even if fixed point is significantly faster, you should look at the
time consumed by floating point and ask if it's really necessary to save
that time: even cheapo 8-bit processors run pretty fast these days, and
can implement fairly complex control laws at 10 or even 100Hz using
double-precision floating point arithmetic. If floating point will do,
fixed point is a waste of effort. And if floating point is _faster_,
fixed point is just plain stupid.

So, benchmark, think, make an informed decision, and then that virtuous
glow that surrounds you after you make your decision will be earned.

upsid...@downunder.com

unread,

Mar 28, 2012, 3:59:23 PM3/28/12

to

On Tue, 27 Mar 2012 18:52:09 +0300, upsid...@downunder.com wrote:

>On Tue, 27 Mar 2012 11:28:18 -0400, David T. Ashley
><das...@gmail.com> wrote:
>
>>
>>Without FPU support, assuming that the processor has basic integer
>>multiplication instructions, integer operations are ALWAYS faster than
>>floating-point operations. Usually _far_ faster. And always more
>>precise.
>
>Floating point instructions MUL/DIV are trivial, just multiply/divide
>the mantissa and add/sub the exponent.

Assuming we are doing 64 bit double precession mul/div with an 8 bit
processor, the mantissa is 48-56 bits and hence a single cycle 8x8=16
bit multiply instruction helps a lot. In addition, the lowest part of
mantissa result (96-112 bits) is interesting only to see if this will
generate a carry to the most significant 48-56 bits.

>With FP add/sub you have to denormalize one operand and then normalize
>the result, which can be quite time consuming, without sufficient HW
>support.

The denormalization of the smaller value can be done quite effectively
if the hardware supports shift right by N bits in a single
instruction. In fact it makes sense to first perform the right shift
by multiple of 8 bits by byte copy and then do the 1..7 bit right
shift by shift right instructions.

Unfortunately , the normalization after FP add/sub gets ugly. While
you can do the multiple of 8 shift with byte test and byte copying,
you still have to do the final left shift with a loop 1-7 times with
shift into carry and branch if carry set.

Again, if the hardware supports something like FindFirstBitSet
instruction in a single cycle, this will help the normalization a lot.

>This can be really time consuming, if the HW is designed by an idiot.

In the old days, I have seen a lot of designs, in which the designs is
made based on available gates, not by the required functionality.

Andrew Reilly

unread,

Mar 28, 2012, 6:44:32 PM3/28/12

to

Weren't you the one that said that your (tuned) ARM C code was generally
only a factor of 1.2 worse than the best hand-tweaked assembly code?
Maybe not, but I've seen it said in these parts. Certainly, my
experience is that that is quite good rule of thumb, and it is very
difficult to get more than a factor of two between assembler and C unless
the platform in question has a very poor C compiler or the assembly code
is actually implementing a different algorithm (which is sometimes
possible, but much rarer in these days of well-supplied intrinsic
function libraries.)

> But on the x86 -- which is the _only_ processor that I've tried it that
> had floating point -- 32-bit fractional arithmetic is slower than 64-bit
> floating point.

One thing that gives float a particualr edge on the x86(32) (but which
can also apply to other processors) is that using floating point means
that you don't have to use the precious integer register set for data: it
can be used for pointers, counters and other control periphera, leaving
the working "data state" in the FPU registers. Modern SIMD units can do
integer operations as well as floating point, so the "extra state"
argument might seem weaker, but I've never seen a compiler use SIMD
registers for integer calculations (unless forced to with intrinsic
functions).

> So, yes -- whether integer (or fixed point) arithmetic is going to be
> faster than floating point depends _a lot_ on the processor. So instead
> of automatically deciding to do everything "the hard way" and feeling
> clever and virtuous thereby, you should _benchmark_ the performance of a
> code sample with floating point vs. whatever fixed-point poison you
> choose.

Fast isn't always the only consideration, though. Floating point is
*always* going to be more power-hungry than fixed point, simply because
it is doing a bunch of extra work at run-time that fixed-point forces you
to hoist to compile-time.

The advice to benchmark is excellent, of course. Particularly because
the results won't necessarily be what you expect.

Cheers,

--
Andrew

Tim Wescott

unread,

Mar 28, 2012, 7:35:41 PM3/28/12

to

When the compiler can figure out what I mean, yes, it is usually at least
almost as good as I can do, and sometimes better (I don't carry around
all the instruction reordering rules in my head: the compiler does).

With fixed-point arithmetic stuff, though, the compiler never seems to
"get it".

>> But on the x86 -- which is the _only_ processor that I've tried it that
>> had floating point -- 32-bit fractional arithmetic is slower than
>> 64-bit floating point.
>
> One thing that gives float a particualr edge on the x86(32) (but which
> can also apply to other processors) is that using floating point means
> that you don't have to use the precious integer register set for data:
> it can be used for pointers, counters and other control periphera,
> leaving the working "data state" in the FPU registers. Modern SIMD
> units can do integer operations as well as floating point, so the "extra
> state" argument might seem weaker, but I've never seen a compiler use
> SIMD registers for integer calculations (unless forced to with intrinsic
> functions).

So, the next time I try this on x86 I should use the SIMD registers.

Actually, if you know you're going to be doing things like vector dot
products, then you could probably get some significant speed-up by doing
a spot of assembly here and there. I haven't had occasion to try this on
an x86, though.

>> So, yes -- whether integer (or fixed point) arithmetic is going to be
>> faster than floating point depends _a lot_ on the processor. So
>> instead of automatically deciding to do everything "the hard way" and
>> feeling clever and virtuous thereby, you should _benchmark_ the
>> performance of a code sample with floating point vs. whatever
>> fixed-point poison you choose.
>
> Fast isn't always the only consideration, though. Floating point is
> *always* going to be more power-hungry than fixed point, simply because
> it is doing a bunch of extra work at run-time that fixed-point forces
> you to hoist to compile-time.

It'll be power hungry twice if you select a chip that has floating point
hardware. I never seem to have the budget -- either dollars or watts --
to use such processors.

> The advice to benchmark is excellent, of course. Particularly because
> the results won't necessarily be what you expect.

Yes. Even when I expect anti-intuitive results, I can still be
astonished by benchmarks.

upsid...@downunder.com

unread,

Mar 29, 2012, 12:19:02 AM3/29/12

to

On 28 Mar 2012 22:44:32 GMT, Andrew Reilly <areil...@bigpond.net.au>
wrote:

>Weren't you the one that said that your (tuned) ARM C code was generally
>only a factor of 1.2 worse than the best hand-tweaked assembly code?
>Maybe not, but I've seen it said in these parts. Certainly, my
>experience is that that is quite good rule of thumb, and it is very
>difficult to get more than a factor of two between assembler and C unless
>the platform in question has a very poor C compiler or the assembly code
>is actually implementing a different algorithm (which is sometimes
>possible, but much rarer in these days of well-supplied intrinsic
>function libraries.)

The main problem trying to write _low_level_ math routines in C is
that you do not have access to the carry bit or use any rotate
instruction. The C-compiler would have to be very clever to convert a
sequence of C-statement into a single rotate instruction or shifting
multiple bits into two registers.

David Brown

unread,

Mar 29, 2012, 4:09:54 AM3/29/12

to

Yes (see my reply on that thread).

> In my experience with signal processing and control loops, having a
> library that implements fixed-point, fractional arithmetic with
> saturation on addition and shift-up is often faster that floating point
> _or_ "pure" integer math, and sidesteps a number of problems with both.
> It's at the cost of a learning curve with anyone using the package, but
> it works well.
>

When you add things like saturation into the mix, it gets more
complicated. That is going to be much less overhead for integer
arithmetic than for floating point (unless you have a processor that has
hardware support for floating point saturated instructions).

But yes, a well-written library is normally going to be better than
poorly written "direct" code, as well as saving you from having to get
all the little details correct (you shouldn't worry about your code
being fast until you are sure it is correct!). A lot of ready-made
libraries are not well written, however, or have other disadvantages.
I've seen libraries that were compiled without optimisation - and were
thus far slower than necessary. And many libraries are full of
hand-made assembly that is out of date, yet remains there for historic
reasons even when it now does more harm than good.

Like everything in this field, there are no simple answers.

> On all the processors I've tried it except for x86 processors, there's
> been a 3-20x speedup once I've hand-written the assembly code to do the
> computation (and that's without understanding or trying to accommodate
> any pipelines that may exist).

While x86 typically means "desktop" rather than "embedded", there are
steadily more powerful cpu's making their way into the embedded space.
I've been using some PowerPC cores recently, and see there's a large
number of factors that affect the real-world speed of the code. Often
floating point (when supported by hardware) will be faster than scaled
integer code, and C code will often be much faster than hand-written
assembly (because it is hard for the assembly programmer to track
pipelines or to make full use of the core's weirder instructions).

>
> But on the x86 -- which is the _only_ processor that I've tried it that
> had floating point -- 32-bit fractional arithmetic is slower than 64-bit
> floating point.
>
> So, yes -- whether integer (or fixed point) arithmetic is going to be
> faster than floating point depends _a lot_ on the processor. So instead
> of automatically deciding to do everything "the hard way" and feeling
> clever and virtuous thereby, you should _benchmark_ the performance of a
> code sample with floating point vs. whatever fixed-point poison you
> choose.

Absolutely.

>
> Then, even if fixed point is significantly faster, you should look at the
> time consumed by floating point and ask if it's really necessary to save
> that time: even cheapo 8-bit processors run pretty fast these days, and
> can implement fairly complex control laws at 10 or even 100Hz using
> double-precision floating point arithmetic. If floating point will do,
> fixed point is a waste of effort. And if floating point is _faster_,
> fixed point is just plain stupid.
>

It's always tempting to worry too much about speed, and work hard to get
the fastest solution. But if you've got code that works correctly, is
high quality (clear, reliable, maintainable, etc.), and runs fast enough
for the job - then you are finished. It doesn't matter if you could run
faster by switching to floating point or fixed point - good enough is
good enough.

> So, benchmark, think, make an informed decision, and then that virtuous
> glow that surrounds you after you make your decision will be earned.
>

Yes.

David Brown

unread,

Mar 29, 2012, 4:58:51 AM3/29/12

to

This is one of the reasons why it is best to use a modern compiler for
big processors - it is hard to keep up with them when working by hand.
On small devices, you can learn all you need to know about the cpu - but
for modern x86 devices, it is just too much effort. And if you are
trying to generate the fastest possible code, it varies significantly
between different x86 models - your fine-tuned hand-coded assembly may
run optimally on the cpu you have on your machine today, but poorly on
another machine.

> Actually, if you know you're going to be doing things like vector dot
> products, then you could probably get some significant speed-up by doing
> a spot of assembly here and there. I haven't had occasion to try this on
> an x86, though.

For particularly complex vector work, hand-coding the SIMD instructions
is essential for optimal speed. But compilers are getting surprisingly
good at generating some of this stuff semi-automatically - it is worth
trying the compiler's SIMD support before doing it by hand. The other
option is libraries - Intel in particular provides optimised libraries
for this sort of stuff.

>
>>> So, yes -- whether integer (or fixed point) arithmetic is going to be
>>> faster than floating point depends _a lot_ on the processor. So
>>> instead of automatically deciding to do everything "the hard way" and
>>> feeling clever and virtuous thereby, you should _benchmark_ the
>>> performance of a code sample with floating point vs. whatever
>>> fixed-point poison you choose.
>>
>> Fast isn't always the only consideration, though. Floating point is
>> *always* going to be more power-hungry than fixed point, simply because
>> it is doing a bunch of extra work at run-time that fixed-point forces
>> you to hoist to compile-time.
>

That is a wildly inaccurate generalisation. For small processors, the
power consumption is going to depend on the speed of the calculations -
these cores are all-or-nothing in their power usage, so doing the work
faster means you can go to sleep sooner. So faster is lower power. For
larger processors, there may be dynamic clock enabling of different
parts - if the hardware floating point unit is not used, it can be
powered-down. Then there is a trade-off - do you spend extra time in
the integer units, or do you do the job faster with the power-hungry
floating point unit? The answer will vary there too, but typically
faster means less energy overall.

It is obviously correct that the more work that is done at compile time
the better - it is only run-time that takes power (on the target). But
I can think of no justification for claiming that fixed-point algorithms
will do more at compile-time than floating-point algorithms - I would
expect the floating-point code to do far more compile-time optimisation
and pre-calculation (since the compiler has a better understanding of
the code in question).

Andrew Reilly

unread,

Mar 29, 2012, 7:53:36 AM3/29/12

to

It's a funny old world. I've seen several compilers recognise the pair
of shifts and an or combination as a rotate, and emit that instruction.
I've also replaced carefully asm-"optimised" maths routines (on x86) that
used the carry flag with "vanilla" C equivalents, and the overall effect
was a fairly dramatic performance improvement. Not sure whether it was a
side effect of the assembly code pinning registers that could otherwise
have been reassigned, or some subtle consequence of reduced dependency,
but the result was clear. Guessing performace on massively superscalar,
out-of-order processors like modern x86-64 is very difficult, IMO.

Intrinsic functions (to get access to things like clz and similar) also
help a lot.

Benchmarking is important.

Milage will definitely vary with target and toolchain...

Cheers,

--
Andrew

Walter Banks

unread,

Mar 29, 2012, 10:28:20 AM3/29/12

to

C compilers have been gaining performance in part because compiler
designers are targeting with both a target and a subset of applications
in mind.

Most compiler developers are benchmarking "real" applications that
are tending to direct the compiler to optimize those applications. The
result is compilers used in the embedded systems market can often
do some very low level optimization very well that would not be
available or even considered for compilers used in other applications.

For embedded systems specifically most if not all commercial compilers
have some mechanism to access the processor condition codes. Most
embedded system compilers do well at using the whole processor
instruction set.

Walter..

Walter Banks

unread,

Mar 29, 2012, 10:40:27 AM3/29/12

to

Andrew Reilly wrote:

> Benchmarking is important.
>
> Milage will definitely vary with target and toolchain...
>

Nothing wakes me up faster than strong coffee than
last nights benchmark results. Benchmarking code
fragments are important, but benchmarking applications
can be a real eyeopener.

There is nothing more humbling than adding a clever
optimization to a compiler and discovering that 75%
of the regression applications just got slower and
larger as a result.

Walter..

Mark Borgerson

unread,

Mar 29, 2012, 10:56:50 AM3/29/12

to

In article <oKCdnSXl7OPQPe7S...@web-ster.com>,
t...@seemywebsite.com says...

>
> On Wed, 28 Mar 2012 22:44:32 +0000, Andrew Reilly wrote:
>

<<SNIP>>

> > Fast isn't always the only consideration, though. Floating point is
> > *always* going to be more power-hungry than fixed point, simply because
> > it is doing a bunch of extra work at run-time that fixed-point forces
> > you to hoist to compile-time.
>
> It'll be power hungry twice if you select a chip that has floating point
> hardware. I never seem to have the budget -- either dollars or watts --
> to use such processors.

Cortex M4 chips,like the STM32F405 have lowered the bars quite a bit for
FPU availability. STm32F405 is about $11.5 qty 1 at DigiKey. The
STM32F205 Cortex M3 is about the same price.

I've got one of the chips, and it's compatible with the F205 board I
designed, so I'll be trying it out soon. More RAM, more Flash, faster
clock----everything we look forward to in a new generation of chips.
(since I'm not using an OS or big USB or ethernet stacks, I'll have LOTS
of flash left over for things like lookup tables, etc.)

Right now, I'm just happy to read an SD card and send bit-banged data to
an FT232H at about 6MB/second. I can even use the same drivers and
host I use with the FT245 chips which do the same thing at about
200KB/s. The 4-bit SD interface on the STM chips can do multi-block
reads at upwards of 10MB/s. Hard to match that with SPI mode!

>
> > The advice to benchmark is excellent, of course. Particularly because
> > the results won't necessarily be what you expect.
>
> Yes. Even when I expect anti-intuitive results, I can still be
> astonished by benchmarks.

I think the FPU availability will greatly simplify coding of things like
Extended Kalman Filters and digital signal processing apps. You can
write and test code on a PC while specifying 32-bit floats and port
pretty easily to the MPU system.

Mark Borgerson

David Brown

unread,

Mar 29, 2012, 10:58:49 AM3/29/12

to

I think it is a bit of an exaggeration to say this applies to "most"
commercial compilers - and it is certainly not all. I think it applies
to a /few/ commercial compilers targeted at particularly small processors.

For larger processors, you don't get access to the condition codes from
C - it would mess up the compiler code generation patterns too much.
For big processors, a compiler needs to track condition codes over
series of instructions - if the programmer can fiddle with condition
codes in the middle of an instruction stream, the compiler would lose track.

Also for larger processors, there are often many instruction codes (or
addressing modes) that are never generated by the compiler. Some
instructions are just too weird to map properly to C code, others cannot
be expressed in C at all. As a programmer, you access these using
library calls, inline assembly, or "intrinsic" functions (which are
simply ready-made inline assembly functions).

You write compilers targeted for small and limited processors, and have
very fine-tuned optimisations and extensions to let developers squeeze
maximum performance from such devices. But don't judge "most if not
all" commercial compilers by your own standards - most do not measure up
in those areas.

David T. Ashley

unread,

Mar 29, 2012, 1:17:04 PM3/29/12

to

On Tue, 27 Mar 2012 15:25:21 +0200, David Brown
<da...@westcontrol.removethisbit.com> wrote:
>
>> 2. Is it possible to generate shift based logic to case 5 mentioned above?
>> (Signed 64-bit divide by constant of 2^n)
>
>Yes.
>
>The easiest way to make sure you get signed division right is to
>separate out the sign, then use unsigned arithmetic. That way you can't
>go wrong, and the C code is portable.

Additional note to the OP: comp.lang.c will point you in the right
direction as far as what is portable and what is not.

From memory, probably wrong ... I should look it up, but too lazy.

For unsigneds, no issues shifting in either direction. Works as
intuitively expected.

For signeds ...

Signed left shifts work as expected. 0 is always propagated into the
LSB.

Signed right shifts are, from memory, I believe, implementation
dependent. It isn't guaranteed how the MSB will be populated.

Again, this is from memory and possibly wrong.

The suggestion of separating out the sign certainly prudent.

DTA

upsid...@downunder.com

unread,

Mar 29, 2012, 4:46:51 PM3/29/12

to

On 29 Mar 2012 11:53:36 GMT, Andrew Reilly <areil...@bigpond.net.au>
wrote:

>> The main problem trying to write _low_level_ math routines in C is that
>> you do not have access to the carry bit or use any rotate instruction.
>> The C-compiler would have to be very clever to convert a sequence of
>> C-statement into a single rotate instruction or shifting multiple bits
>> into two registers.
>
>It's a funny old world. I've seen several compilers recognise the pair
>of shifts and an or combination as a rotate, and emit that instruction.
>I've also replaced carefully asm-"optimised" maths routines (on x86) that
>used the carry flag with "vanilla" C equivalents, and the overall effect
>was a fairly dramatic performance improvement. Not sure whether it was a
>side effect of the assembly code pinning registers that could otherwise
>have been reassigned, or some subtle consequence of reduced dependency,
>but the result was clear. Guessing performace on massively superscalar,
>out-of-order processors like modern x86-64 is very difficult, IMO.

The x86 family is a bit strange case. The number of cycles required by
trivial integer operations (adds, shifts) compared to more complex
instructions like integer mul/div is nearly 1:1 and the floating point
variants are not much worse. Even some complex cases such as floating
point sin/cos are handled quite quickly.

One might even argue that the relative performance for primitive
operations like shifts and adds are quite poor on x86 processors,
compared to computationally intensive operations like sin/cos
(requiring 3rd-8th order polynomial).

Tim Wescott

unread,

Mar 29, 2012, 5:52:02 PM3/29/12

to

Be careful of 32-bit floating point. It is insufficient for a number of
real-world tasks for which 32-bit fixed point is well suited. IEEE
single-precision floating point gives you (effectively) a 25- or 26-bit
mantissa (I can't remember how many bits it is, plus sign, plus implied
1). When integrator gains get low, that's not enough, where the extra
factor of 128 or 64 available from well-scaled fixed point will save the
day.

Be _very_ careful of 32-bit floating point in an Extended Kalman filter.
Particularly if you're not using a square-root algorithm for the
evolution of the variance matrix. You can run out of precision
astonishingly quickly.

Mark Borgerson

unread,

Mar 30, 2012, 12:19:03 AM3/30/12

to

In article <rZydncI478wfROnS...@web-ster.com>,
t...@seemywebsite.com says...

IIRC, IEEE-854 is 8 bits exponent (offset by 128 ), one bit sign and
23-bit mantissa with an implied 1 bit as the 24th bit.

That's probably OK for FIR filters working on the results of 16-bit ADCs
as long as the number of terms is reasonable (<30 or so).
OTOH, I handled those calculations nicely on and MSP430 with the onboard
16x16 Bit hardware multiply and accumulate. When I set up the
coefficients properly, I didn't even have to do a divide of the sum. I
just picked the high 16-bit word----an effective divide by 65536.

Matlab allows me to generate filters with 16 and 32 bit integers and 32
and 64-bit FP. If I translate from MSP430 to Cortex, I would probably
just translate the filters to 32-bit integer and save the FPU for
things that might exceed the dynamic range of the 32-bit integers.

>
> Be _very_ careful of 32-bit floating point in an Extended Kalman filter.
> Particularly if you're not using a square-root algorithm for the
> evolution of the variance matrix. You can run out of precision
> astonishingly quickly.

Thanks for the notes. I looked up the last time I ported someone else's
code to a StrongArm processor. They did use doubles (64-bit FP). The
chip didn't have an FPU and was running Linux. The standard FP library
implementation did all the floating point calculations with software
interrupts and performance truly sucked. We ended up revising all the
code to use a special library that didn't use SWIs. It was still
not as fast as we wanted. I'm not sure how much a 32-bit FPU will help
with 64-bit FP calculations. One of these days I'll take a closer look
at the IAR and STM signal processing libraries.

Mark Borgerson

Tim Wescott

unread,

Mar 30, 2012, 1:42:10 AM3/30/12

to

It gets to be an issue when you're implementing IIR filters or PID
controllers where the bandwidth of the filter or loop is much smaller
than the sampling rate: in those circumstances, the difference between
the maximum size of an accumulator and the size of an increment that
needs to affect it can get to be a healthy portion of -- or more than --
2^25, and then you're screwed.

>
>> Be _very_ careful of 32-bit floating point in an Extended Kalman
>> filter. Particularly if you're not using a square-root algorithm for
>> the evolution of the variance matrix. You can run out of precision
>> astonishingly quickly.
>
> Thanks for the notes. I looked up the last time I ported someone else's
> code to a StrongArm processor. They did use doubles (64-bit FP). The
> chip didn't have an FPU and was running Linux. The standard FP library
> implementation did all the floating point calculations with software
> interrupts and performance truly sucked. We ended up revising all the
> code to use a special library that didn't use SWIs. It was still not
> as fast as we wanted. I'm not sure how much a 32-bit FPU will help with
> 64-bit FP calculations. One of these days I'll take a closer look at
> the IAR and STM signal processing libraries.

If I needed to implement a Kalman filter on a processor that would take a
significant speed hit going to 64-bit floating point I'd take a close
look at the square root algorithms. The basic idea is that you have to
do more computation to carry the square root of the variance, but because
it's a square root you pretty much cut your needed precision in half.

On a PC I rather suspect that using a square root algorithm would be a
stupid waste of time -- but if brand B can do 32-bit floating point 50
times faster than 64-bit, the square root algorithm would probably win
hands down.

j.m.gr...@gmail.com

unread,

Mar 30, 2012, 7:08:38 AM3/30/12

to wal...@bytecraft.com

On Wednesday, March 28, 2012 6:56:49 AM UTC+12, Walter Banks wrote:
> I did a fixed point support package for our 8 bit embedded systems
> compilers and one interesting metric came out of the project.
>
> Given a number of bits in a number and similar error checking fixed
> or float took very similar amounts of execution time and code size
> in applications.
>
> For example 32 bit float and 32 bit fixed point. They are not exact
> but they are close. In the end much to my surprise the choice is
> dynamic range or resolution.

That makes sense for 8 bit cores, but there is another issue besides speed the OP may need to consider and that is granularity.

We had one application where floating point was more convenient, but gave lower precision than a 32*32:64/32 because the float uses 23+1 bits to store the number. The other bits are exponent, and give dynamic range, but NOT precision.

With 24b ADCs that may start to matter and certainly with 32 bit ADCs, you would need to watch it very carefully.

Compiler suppliers for 32 bit cores, really should provide optimised libraries for Gain/Scale type calibrates, that use a 64 bit result in the intermediate steps.

Clifford Heath

unread,

Apr 1, 2012, 4:08:47 AM4/1/12

to

On 03/29/12 03:20, Tim Wescott wrote:
> But on the x86 -- which is the _only_ processor that I've tried it that
> had floating point -- 32-bit fractional arithmetic is slower than 64-bit
> floating point.

I think I recall that transition point occurring around 1994.

I was writing a scalable vector graphics subsystem, and carefully using
integer (sometimes fixed-point) math wherever possible, only to find that,
when I changed the basic type of the coordinate to float (or double, I
can't recall) the system actually rendered *faster*.

The integer unit was busy computing addresses and array offsets, and
being interrupted with *coordinate* math, while the FPU lay idle.

This was still in the Pentium days, before even the 686 and PII.

On a modern note, has anyone tried to use the TI OMAP ARM CPUs?
I haven't looked at the DSP instruction set, but the hardware FP is sweet.

Clifford Heath.

Mark Borgerson

unread,

Apr 3, 2012, 1:52:36 AM4/3/12

to

In article <18231389.1481.1333105718864.JavaMail.geo-discussion-
forums@yneo2>, j.m.gr...@gmail.com says...

>
> On Wednesday, March 28, 2012 6:56:49 AM UTC+12, Walter Banks wrote:
> > I did a fixed point support package for our 8 bit embedded systems
> > compilers and one interesting metric came out of the project.
> >
> > Given a number of bits in a number and similar error checking fixed
> > or float took very similar amounts of execution time and code size
> > in applications.
> >
> > For example 32 bit float and 32 bit fixed point. They are not exact
> > but they are close. In the end much to my surprise the choice is
> > dynamic range or resolution.
>
> That makes sense for 8 bit cores, but there is another issue besides speed the OP may need to consider and that is granularity.
>
> We had one application where floating point was more convenient, but gave lower precision than a 32*32:64/32 because the float uses 23+1 bits to store the number. The other bits are exponent, and give dynamic range, but NOT precision.
>
> With 24b ADCs that may start to matter and certainly with 32 bit ADCs, you would need to watch it very carefully.
>

Have you actually found and used a 32-bit ADC? For and ADC with a 5V
range, that would mean just a few nanovolts per LSB!!!

> Compiler suppliers for 32 bit cores, really should provide optimised libraries for Gain/Scale type calibrates, that use a 64 bit result in the intermediate steps.

My experience is that I'm lucky to get 20 noise-free bits on any system
actually connected to an MPU (for a single conversion). Still, that
would push the limits on FP with only 24 bits in the mantissa if I were
to do any significant oversampling. I remember professors in
chemistry and physics warning me that the uncertainty in my final result
should have error limits corresponding the the precision of my inputs.
Still, roundoff errors could eventually degrade the result past the
limits of the input for some calculations.

The reality of the oceanographic sensors I work with is that 16 bits
gets you right into the noise level of the real world for most
experiments.

However, if you are doing long-term integrations of variable inputs,
roundoff error could come back to haunt you.

Mark Borgerson

John Devereux

unread,

Apr 3, 2012, 6:33:59 AM4/3/12

to

Mark Borgerson <mborg...@comcast.net> writes:

> In article <18231389.1481.1333105718864.JavaMail.geo-discussion-
> forums@yneo2>, j.m.gr...@gmail.com says...
>>
>> On Wednesday, March 28, 2012 6:56:49 AM UTC+12, Walter Banks wrote:
>> > I did a fixed point support package for our 8 bit embedded systems
>> > compilers and one interesting metric came out of the project.
>> >
>> > Given a number of bits in a number and similar error checking fixed
>> > or float took very similar amounts of execution time and code size
>> > in applications.
>> >
>> > For example 32 bit float and 32 bit fixed point. They are not exact
>> > but they are close. In the end much to my surprise the choice is
>> > dynamic range or resolution.
>>
>> That makes sense for 8 bit cores, but there is another issue besides speed the OP may need to consider and that is granularity.
>>
>> We had one application where floating point was more convenient, but gave lower precision than a 32*32:64/32 because the float uses 23+1 bits to store the number. The other bits are exponent, and give dynamic range, but NOT precision.
>>
>> With 24b ADCs that may start to matter and certainly with 32 bit ADCs, you would need to watch it very carefully.
>>
> Have you actually found and used a 32-bit ADC? For and ADC with a 5V
> range, that would mean just a few nanovolts per LSB!!!

Only actual chip I have heard of is a sigma-delta from TI. Of course
8-10 of these bit are marketing. I would look it up for you but the
flash selection tool is still "initializing" for me on their site...

The best ADC I have seen is a HP 3458A meter, the equivalent of a 28 bit
chip ADC.

It might just be possible to make a 32 bit ADC using a josephson
junction array, if you have a liquid helium supply handy :)

[...]

--

John Devereux

Anders....@kapsi.spam.stop.fi.invalid

unread,

Apr 3, 2012, 8:05:07 AM4/3/12

to

John Devereux <jo...@devereux.me.uk> wrote:

> Only actual chip I have heard of is a sigma-delta from TI. Of course
> 8-10 of these bit are marketing. I would look it up for you but the
> flash selection tool is still "initializing" for me on their site...

Off-topic, but as far as I can tell TI are not using Flash in any of
their selection tools, only HTML5. Unfortunately their backend sometimes
glitches out, usually when you need to look up one of their components.

Anyway, their ADS1281/1282 advertise a 31 bit resolution. The ADS1282-HT
high-temperature variant is even available in DIP packaging for the low,
low price of $218.75 ea.

-a

John Devereux

unread,

Apr 3, 2012, 11:34:47 AM4/3/12

to

Anders....@kapsi.spam.stop.fi.invalid writes:

> John Devereux <jo...@devereux.me.uk> wrote:
>
>> Only actual chip I have heard of is a sigma-delta from TI. Of course
>> 8-10 of these bit are marketing. I would look it up for you but the
>> flash selection tool is still "initializing" for me on their site...
>
> Off-topic, but as far as I can tell TI are not using Flash in any of
> their selection tools, only HTML5. Unfortunately their backend sometimes
> glitches out, usually when you need to look up one of their
> components.

Oh really? Good for them. I apologise to TI, I admit I was using quite
an old browser.

In fact it seems to work very well in a slightly more modern one. It is
one of the few such manufacturer "selection tools" that uses the whole
width of the browser window. Most are crippled to uselessness by some
stupid marketeers desire to exactly control appearance.

> Anyway, their ADS1281/1282 advertise a 31 bit resolution. The ADS1282-HT
> high-temperature variant is even available in DIP packaging for the low,
> low price of $218.75 ea.
>
> -a

--

John Devereux

Tim Wescott

unread,

Apr 3, 2012, 2:52:57 PM4/3/12

to

On Fri, 30 Mar 2012 04:08:38 -0700, j.m.granville wrote:

> On Wednesday, March 28, 2012 6:56:49 AM UTC+12, Walter Banks wrote:
>> I did a fixed point support package for our 8 bit embedded systems
>> compilers and one interesting metric came out of the project.
>>
>> Given a number of bits in a number and similar error checking fixed or
>> float took very similar amounts of execution time and code size in
>> applications.
>>
>> For example 32 bit float and 32 bit fixed point. They are not exact but
>> they are close. In the end much to my surprise the choice is dynamic
>> range or resolution.
>
> That makes sense for 8 bit cores, but there is another issue besides
> speed the OP may need to consider and that is granularity.
>
> We had one application where floating point was more convenient, but
> gave lower precision than a 32*32:64/32 because the float uses 23+1
> bits to store the number. The other bits are exponent, and give dynamic
> range, but NOT precision.
>
> With 24b ADCs that may start to matter and certainly with 32 bit ADCs,
> you would need to watch it very carefully.

If you do any filtering at all, the 25 bits of precision often matter
with a _16_ bit ADC, when they aren't a show-stopper altogether. It
wouldn't be sensible to even _think_ about filtering the output of a 24-
bit ADC with single-precision floating point data paths unless the ADC
had been exceedingly poorly chosen or applied, and had essentially
useless content in the last several bits.

Paul

unread,

Apr 4, 2012, 4:35:31 AM4/4/12

to

In article <87sjgkj...@devereux.me.uk>, jo...@devereux.me.uk says...

>
> Anders....@kapsi.spam.stop.fi.invalid writes:
>
> > John Devereux <jo...@devereux.me.uk> wrote:
> >
> >> Only actual chip I have heard of is a sigma-delta from TI. Of course
> >> 8-10 of these bit are marketing. I would look it up for you but the
> >> flash selection tool is still "initializing" for me on their site...
> >
> > Off-topic, but as far as I can tell TI are not using Flash in any of
> > their selection tools, only HTML5. Unfortunately their backend sometimes
> > glitches out, usually when you need to look up one of their
> > components.
>
> Oh really? Good for them. I apologise to TI, I admit I was using quite
> an old browser.
>
> In fact it seems to work very well in a slightly more modern one. It is
> one of the few such manufacturer "selection tools" that uses the whole
> width of the browser window. Most are crippled to uselessness by some
> stupid marketeers desire to exactly control appearance.

Because the marketeer or developer believe everyone has the same system
and screen sie as them. Then it looks right when printed out on a piece
of paper and handed to the board to look at. Don't even get me on fonts
specified in pixels :)

--
Paul Carpenter | pa...@pcserviceselectronics.co.uk
<http://www.pcserviceselectronics.co.uk/> PC Services
<http://www.pcserviceselectronics.co.uk/fonts/> Timing Diagram Font
<http://www.gnuh8.org.uk/> GNU H8 - compiler & Renesas H8/H8S/H8 Tiny
<http://www.badweb.org.uk/> For those web sites you hate

Mark Borgerson

unread,

Apr 4, 2012, 7:50:45 PM4/4/12

to

In article <p72dndn_Y4-U2ubS...@web-ster.com>,
t...@seemywebsite.com says...

>
> On Fri, 30 Mar 2012 04:08:38 -0700, j.m.granville wrote:
>
> > On Wednesday, March 28, 2012 6:56:49 AM UTC+12, Walter Banks wrote:
> >> I did a fixed point support package for our 8 bit embedded systems
> >> compilers and one interesting metric came out of the project.
> >>
> >> Given a number of bits in a number and similar error checking fixed or
> >> float took very similar amounts of execution time and code size in
> >> applications.
> >>
> >> For example 32 bit float and 32 bit fixed point. They are not exact but
> >> they are close. In the end much to my surprise the choice is dynamic
> >> range or resolution.
> >
> > That makes sense for 8 bit cores, but there is another issue besides
> > speed the OP may need to consider and that is granularity.
> >
> > We had one application where floating point was more convenient, but
> > gave lower precision than a 32*32:64/32 because the float uses 23+1
> > bits to store the number. The other bits are exponent, and give dynamic
> > range, but NOT precision.
> >
> > With 24b ADCs that may start to matter and certainly with 32 bit ADCs,
> > you would need to watch it very carefully.
>
> If you do any filtering at all, the 25 bits of precision often matter
> with a _16_ bit ADC, when they aren't a show-stopper altogether. It
> wouldn't be sensible to even _think_ about filtering the output of a 24-
> bit ADC with single-precision floating point data paths unless the ADC
> had been exceedingly poorly chosen or applied, and had essentially
> useless content in the last several bits.

I agree with your point about filtering with 16-bit ADCs. I generally
implement FIRs with about 20 taps---which is easiy done
with a 16 x 16 -> 32-bit MAC. There's no real advantage to floating
point there, and with 16-bit data inputs, dynamic range is not
a problem.

I've usually found that getting the full 24 bits from a 24-bit ADC is
next to impossible. The CS5534 that I've used comes with a table that
lists the effective number of bits vs cycle time. IIRC, need to go to
7-1/2 conversions per second to get over 20 bits. At 30 or 60
conversions per second, you're down in the 18 bits range. However, the
built-in 60Hz rejection is quite helpful for some applications.

Floating point does have it's uses though--where dynamic range is high
and some of the numbers start out very large----as in chemistry
calculations where you may start with constants like 6.02245x10^23.
32-bit floating point may not be suitable for exactly counting the
hydrogen ions in a beaker of analyte, but it can give you reasonable
results within the limits of chemical sensors you might use
(Such as pH meter with a 4-digit display.)

Mark Borgerson

John Devereux

unread,

Apr 5, 2012, 6:48:08 AM4/5/12

to

I find it can be nice for generating the final "result" when a
complicated formula is involved. Or even if not that complicated but
there is some horrible mixture of units involved, Convert everything to
floating point SI unit and just do the calculation, instead of carefully
scaling everything and checking for loss of precision and overflows at
every sub-step.

--

John Devereux