Unsigned fixed point multiply refresher

bitrex

unread,

Mar 10, 2016, 7:53:41 PM3/10/16

to

Forgive me, my code-fu is a little rusty.

The ARM Cortex M4 (Teensy audio library) I'm writing some C++ code for
(for a personal project, I'm definitely not like, a professional) has a
simple fixed point math library the author put together. I can't seem to
find any references in the API to fixed point _unsigned_ multiply, however.

I have an unsigned 16 bit fractional part number (Q15?) in a uint16_t,
and I need to multiply another uint16_t with no fractional part by this
to scale it.

Apparently what I have available is a routine that multiples an int32_t
by an int16_t and stuffs the result in an int32_t. IIRC I should be able
to just treat the uint16_t as an int16_t and with twos complement it
should work out OK, right?

What would be the appropriate way to achieve what I need with what I have?

David Brown

unread,

Mar 11, 2016, 3:25:26 AM3/11/16

to

In practice that will probably work, but in theory it will not. In C,
signed and unsigned types have different overflow characteristics. It
is possible that you will hit undefined behaviour due to overflows for
some values. The compiler will probably generate code that will work as
expected anyway (since the cpu uses two's complement even though C does
not), but you should try to avoid the risk - weird things can happen if
the compiler uses inlining and constant propagation to pre-calculate
results.

> What would be the appropriate way to achieve what I need with what I have?

First, make sure you cast your operands to "unsigned int" before doing
the multiplication. That makes sure everything stays unsigned, there
are no overflows, and no risks of undefined behaviour. It is the
compiler's job to spot that it can do everything using using 16-bit
arithmetic instructions - your job is to make sure the C code is correct.

Secondly, you probably want saturation on overflow, rather than simply
taking the lowest 16 bits. If you multiply 0.25 (0x4000) by 5 (0x0005)
you want the result to be 0.999984741 (0xffff), not 0.25 (after
truncating 0x14000 to 0x4000).

uint16_t multQ_U(uint16_t q, uint16_t u) {
uint32_t x = (uint32_t) u * (uint32_t) q;
if (x <= 0xffff) return (uint16_t) x;
return 0xffff;
}

(That's for a UQ16, unsigned 16-bit format. If you have a 15-bit
unsigned format, the saturation will be at 0x7fff.)

An alternative option is to use <stdfix.h> and types such as "unsigned
fract". This lets the compiler do the work of getting the code correct,
but the resulting code can often be sub-optimal. In my quick tests, ARM
gcc 4.8 did a good job of some things like multiplying two "fract"
types, but quickly resorts to library calls for many other combinations.
It is also not supported in C++.

bitrex

unread,

Mar 11, 2016, 8:41:02 AM3/11/16

to

Thanks so much! This is an enormous help.

Öö Tiib

unread,

Mar 11, 2016, 6:48:37 PM3/11/16

to

On Friday, 11 March 2016 10:25:26 UTC+2, David Brown wrote:
> On 11/03/16 01:53, bitrex wrote:

> >
> > Apparently what I have available is a routine that multiples an int32_t
> > by an int16_t and stuffs the result in an int32_t. IIRC I should be able
> > to just treat the uint16_t as an int16_t and with twos complement it
> > should work out OK, right?
> >
>
> In practice that will probably work, but in theory it will not. In C,
> signed and unsigned types have different overflow characteristics. It
> is possible that you will hit undefined behaviour due to overflows for
> some values. The compiler will probably generate code that will work as
> expected anyway (since the cpu uses two's complement even though C does
> not), but you should try to avoid the risk - weird things can happen if
> the compiler uses inlining and constant propagation to pre-calculate
> results.

C does use two's complement. Section 7.18.1.1 paragraph 1 of the C99
standard:
The typedef name intN_t designates a signed integer type with width
N, no padding bits, and a two's complement representation.

So if 'int32_t' compiles then it must be two's complement in conforming
implementation.

David Brown

unread,

Mar 12, 2016, 6:06:55 PM3/12/16

to

C (and C++) can use several signed integer formats - the details are
implementation defined. But you are right that the intN_t types must
use two's complement. However, C (and C++) explicitly makes signed
arithmetical overflow undefined behaviour, even when you are using types
that must be two's complement. I was not very accurate in my wording,
but the point is that you should always avoid any possible signed
integer overflow to avoid undefined behaviour, even though the
instructions used by the compiler will likely be two's complement
arithmetic instructions that will probably give you the answer you expect.

Alf P. Steinbach

unread,

Mar 12, 2016, 6:39:21 PM3/12/16

to

On 13.03.2016 00:06, David Brown wrote:

Not quite. C++ only makes overflow UB for types where “the result is not
mathematically defined or not in the range of representable values for
its type”.

This does not apply to unsigned types because there the result is always
in range.

And more generally it does not apply to an integer type T where
numeric_limits<T>::is_modulo is true.

With Visual C++ (Windows) all integer types, signed and unsigned, have
is_modulo true.

With MinGW g++ 5.1.0 (a version of g++ for Windows), all signed integer
types have is_modulo false even with use of the flag `-fwrapv`, or at
least my hasty but not entirely complete testing right now says so.

That said, due to an ill-conceived way of specifying optimizations, by
changing the semantics of floating point types via compilation options
instead of having different but layout-compatible types, the g++
standard library does in general not report the properties of floating
point types correctly. This means that with g++ std::numeric_limits
cannot be trusted in general, only in special cases. And I do not know
how to detect the presence of g++'s `-fwrapv` or `-ftrapv` semantics in
some more reliable compiler-specific way. :(

> even when you are using types
> that must be two's complement. I was not very accurate in my wording,
> but the point is that you should always avoid any possible signed
> integer overflow to avoid undefined behaviour, even though the
> instructions used by the compiler will likely be two's complement
> arithmetic instructions that will probably give you the answer you expect.

I agree, when the code must be portable between compilers.

Cheers!,

- Alf

Öö Tiib

unread,

Mar 12, 2016, 7:09:43 PM3/12/16

to

On Sunday, 13 March 2016 01:06:55 UTC+2, David Brown wrote:

On case of C++ the traits in 'numeric_limits' can indicate if it is
undefined behavior or not.

> I was not very accurate in my wording,
> but the point is that you should always avoid any possible signed
> integer overflow to avoid undefined behaviour, even though the
> instructions used by the compiler will likely be two's complement
> arithmetic instructions that will probably give you the answer you expect.

Yes that is true but it is controversial topic. Lot of C and C++
programmers would be actually somewhat happier with implementation
defined behavior instead of undefined behavior on case of arithmetic
overflow of signed value.

Even implementation-defined behavior is nuisance of course. For example
sign extending on case when counts of bits to extend are not known
compile time. On most platforms that works:

int const bits_to_extend = 64 - bits_in_source;
int64_t result = ((int64_t)source << bits_to_extend) >> bits_to_extend;

Right shift of a negative signed number has implementation-defined
behavior by standard however. So it only works because most platforms
have right shift that extends sign. What really should be done in
portable code is something like that I suspect:

int64_t const mask = 1ULL << (bits_in_source - 1);
int64_t result = (source ^ mask) - mask;

If bits above 'bits_in_source' in 'source' may be are not zero
then additionally clearing those is needed:

int64_t tmp = (int64_t)source & ((1ULL << bits_in_source) - 1);
int64_t const mask = 1ULL << (bits_in_source - 1);
int64_t result = (tmp ^ mask) - mask;

Everybody likely agrees that it looks like cryptic garbage. I am not
100% sure that it takes care of all issues.

David Brown

unread,

Mar 13, 2016, 4:55:44 PM3/13/16

to

True.

>
> This does not apply to unsigned types because there the result is always
> in range.

True.

>
> And more generally it does not apply to an integer type T where
> numeric_limits<T>::is_modulo is true.

OK.

>
> With Visual C++ (Windows) all integer types, signed and unsigned, have
> is_modulo true.
>
> With MinGW g++ 5.1.0 (a version of g++ for Windows), all signed integer
> types have is_modulo false even with use of the flag `-fwrapv`, or at
> least my hasty but not entirely complete testing right now says so.

An implementation is free to give specific behaviour to things that are
undefined in the standards. That does not change the standard. The
standard does not make any requirements about the overflow behaviour of
signed ints - the behaviour is undefined in standard C and C++. That is
the case even if it is defined in some particular implementations of the
language, such as MSVC or gcc with -fwrapv.

(It would surprise me a little if MSVC implements wrapping behaviour on
all signed integers, since that blocks certain types of optimisation.
But it would also surprise me if it sets is_modulo true but does not
follow those requirements.)

>
> That said, due to an ill-conceived way of specifying optimizations, by
> changing the semantics of floating point types via compilation options
> instead of having different but layout-compatible types, the g++
> standard library does in general not report the properties of floating
> point types correctly. This means that with g++ std::numeric_limits
> cannot be trusted in general, only in special cases. And I do not know
> how to detect the presence of g++'s `-fwrapv` or `-ftrapv` semantics in
> some more reliable compiler-specific way. :(
>
>
>> even when you are using types
>> that must be two's complement. I was not very accurate in my wording,
>> but the point is that you should always avoid any possible signed
>> integer overflow to avoid undefined behaviour, even though the
>> instructions used by the compiler will likely be two's complement
>> arithmetic instructions that will probably give you the answer you
>> expect.
>
> I agree, when the code must be portable between compilers.
>

Yes. If your compiler (or the flags you use) document specific
behaviour for signed integer overflow, and your code is only targeted at
such compilers, then of course it is fine to rely on that behaviour.
I'd add a comment in the code, however!

David Brown

unread,

Mar 13, 2016, 5:50:30 PM3/13/16

to

On 13/03/16 01:09, 嘱 Tiib wrote:
> On Sunday, 13 March 2016 01:06:55 UTC+2, David Brown wrote:

It is always possible for an implementation to define the behaviour when
the standards say "undefined behaviour". So it is a choice made by the
compiler developers to make integer overflow undefined, rather than
giving it a specific implementation-defined behaviour. The fact that
most compilers don't have implementation-defined signed overflow
behaviour by default suggests that programmers who want that are in the
minority - but there are enough to make it worth supporting the -fwrapv
flag in gcc and clang that /does/ define signed overflow behaviour.

Note that there is a large difference between the standards calling the
behaviour "undefined", and calling it "implementation-defined". If it
is implementation-defined, the compiler developer is free to make a
choice of how they handle overflow (wrapped/modulo behaviour,
saturation, trapping, etc.). But when they have made that choice and
documented it, they must stick to it. With undefined behaviour, the
compiler can use different methods in different circumstances - whatever
makes the code simplest and fastest when there is no overflow (which of
course is the normal behaviour).

What I think would be a better compromise that avoids worries of nasal
daemons while still letting the compiler optimise freely would be to
make signed overflow "unspecified behaviour" (divide by zero can still
be undefined). In other words, multiplying two valid "int" values will
always give you an "int" - but if the compiler can't give you the
correct int, it can give you any int if finds convenient. That might be
the result from two's complement rounding, but it could be 0 or it could
be whatever happens to be in a register at the time.

>
> Even implementation-defined behavior is nuisance of course. For example
> sign extending on case when counts of bits to extend are not known
> compile time. On most platforms that works:
>
> int const bits_to_extend = 64 - bits_in_source;
> int64_t result = ((int64_t)source << bits_to_extend) >> bits_to_extend;
>
> Right shift of a negative signed number has implementation-defined
> behavior by standard however. So it only works because most platforms
> have right shift that extends sign. What really should be done in
> portable code is something like that I suspect:
>
> int64_t const mask = 1ULL << (bits_in_source - 1);
> int64_t result = (source ^ mask) - mask;
>
> If bits above 'bits_in_source' in 'source' may be are not zero
> then additionally clearing those is needed:
>
> int64_t tmp = (int64_t)source & ((1ULL << bits_in_source) - 1);
> int64_t const mask = 1ULL << (bits_in_source - 1);
> int64_t result = (tmp ^ mask) - mask;
>
> Everybody likely agrees that it looks like cryptic garbage. I am not
> 100% sure that it takes care of all issues.
>

The thing about implementation-defined behaviour is that you need to
check that it works on the compilers you use - but once you have checked
(in the documentation, not trial-and-error compilation!), you can rely
on it. If you know your code is targeted at gcc, clang, MSVC, icc, and
perhaps a few other major compilers, all of which describe how they
implement negative integer shifts, then you can be happy with code like
your first version.

Alf P. Steinbach

unread,

Mar 13, 2016, 6:46:55 PM3/13/16

to

On 13.03.2016 22:50, David Brown wrote:
>
> What I think would be a better compromise that avoids worries of nasal
> daemons while still letting the compiler optimise freely would be to
> make signed overflow "unspecified behaviour" (divide by zero can still
> be undefined). In other words, multiplying two valid "int" values will
> always give you an "int" - but if the compiler can't give you the
> correct int, it can give you any int if finds convenient. That might be
> the result from two's complement rounding, but it could be 0 or it could
> be whatever happens to be in a register at the time.

Well, the main attraction of UB is that the compiler can assume that UB
does not occur, and optimize accordingly.

One main attraction of well-defined behavior is, as you write, the
absence of possible nasal daemons. And if that were the only advantage
then some unspecified behavior could be a useful compromise (I'm not
entirely clear on this, I worry about lost optimizations). However,
another main attraction, the one that trumps all else in my humble
opinion, is PREDICTABILITY: that given some source code, its effect for
given inputs can be ~100% reliably predicted by the programmer.

That means it can be reasoned about. :)

[1]Java's quite acceptable numeric performance – in some cases even
outperforming C++ (I think due to JIT compilation which can optimize for
the environment at hand) – is evidence that well defined wrapping
behavior for signed integer arithmetic needs not incur significant cost.

Partially due to that evidence I regard the technically possible
optimizations as mostly premature micro- and nano-optimizations, doing
more harm than good – like a beginner “optimizing” loops with goto's
maybe can shave a nanosecond or two, but pays unreasonably high cost.

Cheers!,

- Alf

Notes:
[1] <url:
http://stackoverflow.com/questions/3001836/how-does-java-handle-integer-underflows-and-overflows-and-how-would-you-check-fo>