quickest way to round up to the nearest power of 2

Jeffrey Stedfast

unread,

Aug 23, 2002, 4:34:42 AM8/23/02

to

I was wondering if anyone had a quicker way of rounding an unsigned int
value up to the nearest power of 2.

obviously I can do:

unsigned int n = 1;

while (n < val)
n <<= 1;

but I figured there has to be a faster way...?

Kevin Easton

unread,

Aug 23, 2002, 6:52:36 AM8/23/02

to

Your loop never terminates if "unsigned int" is 32 bits and val is
greater than 2^31.

Here's my effort (note: it's got better O() complexity, but it may not
be *faster* for the ranges of numbers you're interested in). I make the
assumption that only numbers in the range 0..2^31 are passed in
(anything higher returns 0). It's basically an unrolled binary search:

unsigned long rop2(unsigned long val)
{
unsigned long n = 0;
unsigned long q = 1<<16;

if (val < 2)
return 1;

if (val <= q) {
n = q;
q >>= 8;
} else {
q <<= 8;
}

if (val <= q) {
n = q;
q >>= 4;
} else {
q <<= 4;
}

if (val <= q) {
n = q;
q >>= 2;
} else {
q <<= 2;
}

if (val <= q) {
n = q;
q >>= 1;
} else {
q <<= 1;
}

if (val <= q) {
n = q;
}

return n;
}

- Kevin.

Eric Sosman

unread,

Aug 23, 2002, 11:08:23 AM8/23/02

to

"Faster" is always guesswork until you actually implement
something and measure it, because the C language itself says
nothing about how much time any particular operation takes.
Even after you've implemented and measured, the results may
change when you recompile with different switches, or upgrade
your compiler, or move to a different machine, or allow the
moon to change its phase.

And, of course, you've got to need this computation a
*lot* of times to make any tweaking worth while. If it only
runs a couple thousand times, shaving a microsecond off the
execution time for each computation gives no observable net
improvement.

All that said, here's one approach:

unsigned int n1, n2;
for (n1 = val; (n2 = n1 & (n1 - 1)) != 0; n1 = n2)
;
return (n1 << (n1 < val)) + (val == 0);

The loop executes once for each 1-bit in the value of `val',
and there's an adjustment at the end just in case `val' itself
happens to be an exact power of two or equal to zero.

Another possibility is to do a binary search, either in an
explicit table or by writing a nested `if' chain with constants;
this will have an operation count proportional to the total number
of bits in an `unsigned int' (or, more generally, in the largest
`val' value you'll ever see). This has the disadvantage that you
need to know the maximum bit count when you write the code, and
you'll need to rewrite if you move to a machine with a
different value of UINT_MAX.

--
Eric....@sun.com

Kevin Easton

unread,

Aug 23, 2002, 11:57:59 AM8/23/02

to

Not true - the binary search has an operation count proportional to log
base 2 of the total number of bits in an unsigned int.

- Kevin.

Eric Sosman

unread,

Aug 23, 2002, 12:22:55 PM8/23/02

to

Kevin Easton wrote:
>
> Eric Sosman <Eric....@sun.com> wrote:
> >
> > Another possibility is to do a binary search, [...]

> > this will have an operation count proportional to the total number
> > of bits in an `unsigned int' (or, more generally, in the largest
> > `val' value you'll ever see).
>

> Not true - the binary search has an operation count proportional to log
> base 2 of the total number of bits in an unsigned int.

Yes, of course (gotta stop typing so fast; that's two blunders
in two days). Sorry for the error, and thanks for the correction.

--
Eric....@sun.com

Ralf Wildenhues

unread,

Aug 23, 2002, 2:02:57 PM8/23/02

to

* Kevin Easton wrote:
> Jeffrey Stedfast <fe...@ximian.com> wrote:
> > I was wondering if anyone had a quicker way of rounding an unsigned int
> > value up to the nearest power of 2.

*snip*

> Here's my effort (note: it's got better O() complexity, but it may not
> be *faster* for the ranges of numbers you're interested in). I make the
> assumption that only numbers in the range 0..2^31 are passed in
> (anything higher returns 0). It's basically an unrolled binary search:
>
> unsigned long rop2(unsigned long val)
> {
> unsigned long n = 0;
> unsigned long q = 1<<16;

nitpick: you might want to make that
unsigned long q = 1lu<<16;
otherwise it won't work portably for unsigned longs.

> if (val < 2)
> return 1;
>
> if (val <= q) {
> n = q;
> q >>= 8;
> } else {
> q <<= 8;
> }
>

*snip unrolling*
>
> return n;
> }

Thanks for a good hint towards a more portable solution rolling up your
code:

/*premise: no padding in unsigned long, but see below */
#include <limits.h>
#define BITS_IN_ULONG (CHAR_BIT * sizeof(unsigned long))

unsigned long rop2(unsigned long val)
{

unsigned b = BITS_IN_ULONG / 2;
unsigned long q = 1lu<<b;

unsigned long n = 0;

/* one could do
assert(q<<(b-1) != 0);
here to abort if above premise does not hold
*/

if (val < 2)
return 1;

while (b != 0) {
b >>= 1;

if (val <= q) {
n = q;

q >>= b;
} else {
q <<= b;
}
}

return n;
}

One could use Jeffreys solution on ULONG_MAX/2 to calculate
BITS_IN_ULONG-1 in initialization code, in order to make this really
portable.

offtopic part: on my system this sometimes beats Jeffreys idea when
using 64-bit wide variables, and is faster than the unrolled version.

Regards,
Ralf

Chris Torek

unread,

Aug 23, 2002, 5:33:41 PM8/23/02

to

[I am not sure who wrote which part here]

>* Kevin Easton wrote:
>> Jeffrey Stedfast <fe...@ximian.com> wrote:
>> > I was wondering if anyone had a quicker way of rounding an unsigned int
>> > value up to the nearest power of 2.
>*snip*

>> Here's my effort ... [that is] basically an unrolled binary search:
[more snippage]

In article <hv9i3x...@ID-34913.user.dfncis.de>

Ralf Wildenhues <ralfimn...@gmx.de> writes:
>Thanks for a good hint towards a more portable solution rolling up your
>code:

[more code snipped]

>One could use Jeffreys solution on ULONG_MAX/2 to calculate
>BITS_IN_ULONG-1 in initialization code, in order to make this really
>portable.
>
>offtopic part: on my system this sometimes beats Jeffreys idea when
>using 64-bit wide variables, and is faster than the unrolled version.

Yet another trick is to write a portable C program that writes
a nonportable, "unrolled-loop" version of the desired function.

Here is a method that relies on binary representation, an unrolled
loop for 32-bit "int"s, and an attempt (untested) at a portable C
program to write the unrolled loop as a non-portable C function.

Observe that if n is already a power of 2, it has only one bit
set in it:

000001000...000 (for some number of 0's)

and the answer is then just "n". Otherwise n has additional
bits set, e.g.:

00010001010...

and the answer is then whichever bit pattern has only a single
bit set one bit higher than the currently-topmost set bit:

00010001010... =>
00100000000...

i.e., we need to round up to the next power of two (as in the
original problem statement).

Let us begin by assuming that we have the latter case -- multiple
bits set. How can we find "the next power of 2"? Well, consider
the lowly bitwise-OR operator, and what happens if it is repeated.
Once a 1 is ORed in, it remains a 1 forever. Suppose we take the
initial value of n and shift it right one bit, introducing a zero
at the top and discarding the bottom-most bit, then mash them
together with bitwise-OR:

0001000101000001
| 0000100010100000x
----------------
0001100111100001

Now do this again but shift two bits:

0001100111100001
| 0000011001111000xx
----------------
0001111111111001

Repeat with a four-bit shift:

0001111111111001
| 0000000111111111xxxx
----------------
0001111111111111

Because I had 16 bits in the illustration, we really should do it
again with an 8-bit shift, but obviously this time we have reached
a stable point; any further shift-and-OR-ing will keep repeating
the same bit pattern. In fact, you always reach this point within
ceil(log2(nbits)) steps (proof left as an exercise :-) ).

Having done this shift-and-OR sequence log2(nbits) times, we have
effectively smeared *any* "1"-bits, from whatever is the topmost
set bit, on down through all the lower-order bits. Now we come to
the trick: we add 1.

By definition, 1+1 is 0 with a carry. By this time, the carries
propagate all the way up to the most-significant "1" bit. Thus
all the low order bits turn to 0 and we end up with a "1" in the
bit above the currently-MSbit:

0001111111111111
+ 0000000000000001
----------------
0010000000000000

which is just the value we wanted: the next higher power of two.

Of course, the next higher power of two may be unrepresentable;
for instance, if the bit pattern before adding 1 is all-ones,
adding 1 will cause it to wrap to 0 (unsigned arithmetic being
modular "clock" arithmetic). For now let us simply assume that
this is the desired result.

This suggests the following method:

/* given unsigned integral value "n" */
if (n has only one "1" bit in its representation)
return n;
else {
n |= n >> 1;
n |= n >> 2;
... as many n |= n >> k operations as needed ...
n++;
return n;
}

Of course, this requires a pesky (if easy-to-implement after all)
test: "does n have only one `1' bit in its representation?" But
look what happens if we "cheat" instead. Suppose we begin by
subtracting 1 from n. There are only two possible cases: either
n has *exactly* one 1 bit, or it has *more* than one 1 bit:

000100010000000000000000
- 000000000000000000000001
------------------------
000100001111111111111111

If n has *more* than one "1" bit set, the subtraction never touches
the topmost "1" bit, because at worst, it causes a borrow from all
the low-order "0"s up until we reach the first "1" bit, and then
it stops borrowing. And if n has *exactly* one "1" bit set, then
either the subtraction causes a borrow from that single "1" bit,
giving us all-1s in the lower-order bits:

000100000000000000000000
- 000000000000000000000001
------------------------
000011111111111111111111

or else n is initially 1 and subtracting 1 just gives 0. In all
three cases, the resulting bit pattern is just what we need to run
through our bit-smearing algorithm. Bit-smearing all-zeros has no
effect, and bit-smearing any "1", followed by some (or even no)
zeros, followed by some (or no) all-ones, does just what we want.

Thus, instead of the initial test "is there only one 1-bit", we
just subtract one, smear, then add one and return that:

/* assumes 16-bit "int"s */
unsigned int round_by_smearing(unsigned int n) {
n--; /* subtract one for power-of-2 input */
n |= n >> 1; /* "smear" steps, repeat log2(nbits) times */
n |= n >> 2;
n |= n >> 4;
n |= n >> 8;
n++; /* add 1 to reach next power of 2 (or 0) */
return n;
}

(Note, incidentally, that if the input is 0, the output is also 0
-- subtracting 1 gives all-1-bits, which are unchanged under
smearing, then adding 1 gives 0 again. So input-zero is
indistinguishable from overflow-out, which occurs whenever the
input exceeds the mathematical value of (UINT_MAX+1)/2. In this
case UINT_MAX is presumably 65535, and 32768 in => 32768 out => no
overflow; this is why we are OK with (UINT_MAX+1)/2 instead of
UINT_MAX/2.)

Of course, "round_by_smearing" above makes brash and unsupportable
assumptions about UINT_MAX, so now we get to the other tool in our
bag of tricks: C programs that write C translation units.

For some unsigned type T, we need to produce log2(nbits_in_T)
"smear" steps. We could examine Uwhatever_MAX, but on the other
hand, we can just do the same kind of smearing until we reach
all-one-bits and adding one overflows. So, here is a program to
do that:

/* define OUR_TYPE as desired, but note that it must be an
unsigned integral type! */
#ifndef OUR_TYPE
#define OUR_TYPE unsigned int
#endif

#define STR0(x) #x
#define STR1(x) STR0(x)

int main(void) {
OUR_TYPE v, vplus1;
int shift_factor;

printf("%s round_by_smearing(%s n) {\n n--;\n",
STR1(OUR_TYPE), STR1(OUR_TYPE));
for (shift_factor = 1, v = 1;; shift_factor *= 2) {
printf(" n |= n >> %d;\n", shift_factor);
v |= v << shift_factor;
vplus1 = v + 1;
if (vplus1 == 0)
break;
}
printf(" n++;\n return n;\n}\n");
return 0;
}

Although the computer I tested this on has only 8, 16, 32, and
64-bit integer types, I believe it would work correctly on an
18-and-36-bit Univac too.
--
In-Real-Life: Chris Torek, Wind River Systems (BSD engineering)
Salt Lake City, UT, USA Domain: to...@bsdi.com
http://67.40.109.61/torek/ (for the moment) I report spam to abuse@.
"nos...@elf.eng.bsdi.com" *is* my address (one of many actually).

Chris Torek

unread,

Aug 25, 2002, 6:27:48 PM8/25/02

to

In article <ak69nl$fu$1...@elf.eng.bsdi.com> I wrote:
> /* assumes 16-bit "int"s */
> unsigned int round_by_smearing(unsigned int n) {
> n--; /* subtract one for power-of-2 input */
> n |= n >> 1; /* "smear" steps, repeat log2(nbits) times */
> n |= n >> 2;
> n |= n >> 4;
> n |= n >> 8;
> n++; /* add 1 to reach next power of 2 (or 0) */
> return n;
> }

(followed by a program to generate the proper function for however
many bits your "int"s are)

It just occurred to me that one can use a loop of the form
"smear until value does not change":

typedef unsigned int UT;
UT another_round(UT n) {
UT n1 = n - 1;
int shift_count = 1;

do {
n = n1 | (n1 >> shift_count);
shift_count *= 2;
} while (n != n1);
return n + 1;
}

Of course, "speed" is not something addressed by Standard C, and
one would have to test each version to see which one(s) are fastest
on your particular implementation.

Chris Torek

unread,

Aug 25, 2002, 6:33:51 PM8/25/02

to

In article <akbll4$3bv$1...@elf.eng.bsdi.com> I posted untested code:

> typedef unsigned int UT;
> UT another_round(UT n) {
> UT n1 = n - 1;
> int shift_count = 1;
>
> do {
> n = n1 | (n1 >> shift_count);
> shift_count *= 2;
> } while (n != n1);
> return n + 1;
> }

Exercises:

- spot the obvious bug.
- fix it.

:-)

Kevin Easton

unread,

Aug 26, 2002, 2:02:33 AM8/26/02

to

Chris Torek <nos...@elf.eng.bsdi.com> wrote:
> In article <akbll4$3bv$1...@elf.eng.bsdi.com> I posted untested code:
>> typedef unsigned int UT;
>> UT another_round(UT n) {
>> UT n1 = n - 1;
>> int shift_count = 1;
>>
>> do {
>> n = n1 | (n1 >> shift_count);
>> shift_count *= 2;
>> } while (n != n1);
>> return n + 1;
>> }
>
> Exercises:
>
> - spot the obvious bug.
> - fix it.
>
> :-)

typedef unsigned int UT;
UT another_round(UT n) {

UT n1;
int shift_count = 1;

n--;
do {
n1 = n;

n = n1 | (n1 >> shift_count);
shift_count *= 2;
} while (n != n1);
return n + 1;
}

(in the spirit of the thread, also untested :)

- Kevin.

Kevin D. Quitt

unread,

Aug 26, 2002, 2:09:37 PM8/26/02

to

Here's a routine that returns the highest bit number:

const char clz_table[32] =
{
0, 31, 9, 30, 3, 8, 18, 29, 2, 5, 7, 14, 12, 17,
22, 28, 1, 10, 4, 19, 6, 15, 13, 23, 11, 20, 16,
24, 21, 25, 26, 27
};

unsigned long clz(unsigned long n)
{
unsigned long c = 0x7dcd629; /* magic constant... */

n |= (n >> 1);
n |= (n >> 2);
n |= (n >> 4);
n |= (n >> 8);
n |= (n >> 16);
if (n == 0) return 32;
n = c + (c * n);
return 31 - clz_table[n >> 27]; /* For little endian */
}

--
#include <standard.disclaimer>
_
Kevin D Quitt USA 91351-4454 96.37% of all statistics are made up
Per the FCA, this email address may not be added to any commercial mail list

Joe Wright

unread,

Aug 27, 2002, 7:21:07 AM8/27/02

to

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

int main(int argc, char *argv[]) {
unsigned int i, n = 250;
if (argc > 1)
n = atoi(argv[1]);
n &= INT_MAX;
for (i = 1; n > i; i <<= 1) ;
printf("the nearest power of 2 for %u is %u\n", n, i);
return 0;
}

--
Joe Wright mailto:joeww...@earthlink.net
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---