It works thanks to 6.5.8 .
Any comments?
(q + r + ((q + r) & INT_MAX))>>1
Obviously broken on overflow, but also broken on some
perfectly ordinary cases. q==1, r==-1, for example.
--
Eric Sosman
eso...@ieee-dot-org.invalid
That does not work (try q = r = 1).
Perhaps you're thinking of (q + r + abs(q - r)) / 2 ?
Is there a non-overflowing way to do abs() as a one-liner?
Branch-free is from math class:
max(a,b) = $ \{ a + b + | a - b|} \over {2} $
I don't see that requiring limits.h is a problem. However, this solution
can overflow which is undefined behaviour (i.e. anything can happen) and
implementation defined behaviour if q and r are negative, so there is no
guarantee it will work.
--
Flash Gordon
Plus it suffers if the addition overflows.
I like the conditional variant because it takes me back to the good old
days, where -1 was true in C64 BASIC. :)
Now, to go OT:
Some timings, FWIW, with gcc 4.4.2 -O2, AMD Athlon64, 100 million
iterations (calculate the total of the maximum of two pseudo-random
numbers), runtime affected by the loading whims of my PC:
code runtime
------------------------------------------ -------
#define MYMAX1(a,b) ((a)<(b)?(b):(a)) 3.311
if (a < b) return b; else return a; 3.340
return ((a>b)*a)+((a<=b)*b); 3.328
return (a + b + ((a + b) & INT_MAX))>>1; 3.339
The first two generated identical assembly, unsurprisingly. They make
use of the "conditional move" instruction, so the assembly for computing
the max is basically:
CMP a,b
CMOV a,b
None of the assembly produced in any of the four tests had any explicit
branches--it was all compares and conditional instructions.
I think the only lesson to be drawn from the timings is, "Sometimes
optimizations don't do what you think they'll do." When it comes to
little things like this, I'm a big fan of just coding it straight and
then optimizing if its required.
Perhaps on a RISCier machine (one without Intel's DOIT instruction ;) ),
the differences would be... existent.
-Beej
> There is also a branch-free way, but it requires limits.h
>
> (q + r + ((q + r) & INT_MAX))>>1
With q==2, r==4, and INT_MAX==32767, that expression seems
to evaluate to 6, although the correct max(2,4) would be 4.
> Here is a ternary-free way of implementing max(q,r)
> int max(int q,int r) {return ((q>r)*q)+((q<=r)*r) ;}
Your C expression is more difficult for me to understand than
(q > r) ? q : r, and I doubt the generated code is faster either.
The ?: version also works with pointers (to elements of the same
array), which cannot be multiplied.
This kind of Boolean*integer multiplication made some sense in
old BASIC programs, but I don't see a purpose for it in C, where
we have ?: available. If you were implementing your own C
compiler and hadn't implemented ?: in the parser yet, I think
it would be rather simple to implement max(q,r) as a built-in
function. (A standard-conforming compiler would have to call it
e.g. __max but a compiler without ?: wouldn't conform anyway.)
> On Nov 7, 5:31�pm, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
> > Andrey Vul wrote:
> > > There is also a branch-free way, but it requires limits.h
> >
> > > (q + r + ((q - r) & INT_MAX))>>1
> >
> > � � �Obviously broken on overflow, but also broken on some
> > perfectly ordinary cases. �q==1, r==-1, for example.
> Sorry, mixed + and -.
>
Only on sign&magnitude machines, which no longer exist in any
practical sense. Still broken on everything else. See below.
> Is there a non-overflowing way to do abs() as a one-liner?
>
> Branch-free is from math class:
> max(a,b) = $ \{ a + b + | a - b|} \over {2} $
With abs this works. But there is no standard computational
(conceptually branchless) operation that directly does abs.
What you can do is a + (a<b)*(b-a), and on *most* 2sC machines,
although not required by the standard, in the absence of overflow
this can be done by: a + -((a-b)>>INTbits) * (b-a)
or slightly 'simpler': a + ( (a-b)>>INTbits & (b-a) )
where usually INTbits is sizeof(INT)*CHAR_BIT-1
(unless padding bits are used, rarely if ever).
For abs start with a*( 1-(a<0)*2 ) and proceed similarly.
That said, I concur with the sentiment elsethread to just write the
ternary operator and let the compiler generate the code; if it screws
up such a basic thing, you need a better compiler anyway.