Well, these don't really save you from having to calculate the shift
(and more importantly don't save you from making a mistake doing so).
How about this one? It's kinda long but with gcc's -O2 flag it's just
as fast as the traditional (val >> sft) & msk. Anyone know of a clever
way to reduce this further? The key is to get the index of the lowest
bit thats on for the shift.
Mike
--8<--
#include <stdlib.h>
#include <stdio.h>
/* The index of the lowest bit that's on (used to shift)
*/
#define LB(m) ( \
((m) & 0x00000001L) ? 0 : \
((m) & 0x00000002L) ? 1 : \
((m) & 0x00000004L) ? 2 : \
((m) & 0x00000008L) ? 3 : \
((m) & 0x00000010L) ? 4 : \
((m) & 0x00000020L) ? 5 : \
((m) & 0x00000040L) ? 6 : \
((m) & 0x00000080L) ? 7 : \
((m) & 0x00000100L) ? 8 : \
((m) & 0x00000200L) ? 9 : \
((m) & 0x00000400L) ? 10 : \
((m) & 0x00000800L) ? 11 : \
((m) & 0x00001000L) ? 12 : \
((m) & 0x00002000L) ? 13 : \
((m) & 0x00004000L) ? 14 : \
((m) & 0x00008000L) ? 15 : \
((m) & 0x00010000L) ? 16 : \
((m) & 0x00020000L) ? 17 : \
((m) & 0x00040000L) ? 18 : \
((m) & 0x00080000L) ? 19 : \
((m) & 0x00100000L) ? 20 : \
((m) & 0x00200000L) ? 21 : \
((m) & 0x00400000L) ? 22 : \
((m) & 0x00800000L) ? 23 : \
((m) & 0x01000000L) ? 24 : \
((m) & 0x02000000L) ? 25 : \
((m) & 0x04000000L) ? 26 : \
((m) & 0x08000000L) ? 27 : \
((m) & 0x10000000L) ? 28 : \
((m) & 0x20000000L) ? 29 : \
((m) & 0x40000000L) ? 30 : \
((m) & 0x80000000L) ? 31 : 32)
/* Given an integer i containing flags, extract a value based on mask
* m for assignment to a bitfield
*/
#define FLD(i, m) ((i) >> LB(m)) & ((m) >> LB(m))
struct flds {
unsigned char a :4;
} f;
int
main(int argc, char *argv[])
{
unsigned long i;
for (i = 0x01000000L; i < 0x80000000L; i++) {
f.a = FLD(i, 0xF0000000L);
/* f.a = (i >> 28) & 0x0000000FL; */
}
printf("f.a=%u\n", f.a);
return EXIT_SUCCESS;
}
Standard (val >> sft) & msk way:
[miallen@nano miallen]$ gcc -Wall -O2 -o bits.o bits.c
[miallen@nano miallen]$ time ./bits.o
f.a=7
real 0m7.466s <---
user 0m7.420s
sys 0m0.000s
Using the FLD macro without optimization:
[miallen@nano miallen]$ gcc -Wall -o bits.o bits.c
[miallen@nano miallen]$ time ./bits.o
f.a=7
real 0m23.427s
user 0m23.160s
sys 0m0.000s
Using the FLD macro with -O2 flag:
[miallen@nano miallen]$ gcc -Wall -O2 -o bits.o bits.c
[miallen@nano miallen]$ time ./bits.o
f.a=7
real 0m7.464s <---
user 0m7.420s
sys 0m0.000s
>/* The index of the lowest bit that's on (used to shift)
> */
>#define LB(m) ( \
> ((m) & 0x00000001L) ? 0 : \
> ((m) & 0x00000002L) ? 1 : \
> ((m) & 0x00000004L) ? 2 : \
[etc -- snipped]
> ((m) & 0x40000000L) ? 30 : \
> ((m) & 0x80000000L) ? 31 : 32)
[actually that last should be "yikes! I don't know the answer!" but
this is not really expressable in C :-) ]
You can use a binary search, but of course, this is actually even
more code in the expansion, so again we must rely on the compiler
to do constant folding:
#define LB2(m) (((m) & 1) ? 0 : 1)
#define LB4(m) (((m) & 0x3) ? LB2(m) : LB2(m >> 2) + 2)
#define LB8(m) (((m) & 0xf) ? LB4(m) : LB4(m >> 4) + 4)
#define LB16(m) (((m) & 0xff) ? LB8(m) : LB8(m >> 8) + 8)
#define LB32(m) (((m) & 0xffff) ? LB16((unsigned)m) : \
LB16((unsigned)(m >> 16)) + 16)
#define LB(m) LB32(m) /* assumes m <= 0xffffffff */
The LBnn version is a bit easier to extend for larger (e.g.,
64-bit "long long") masks.
It might also be useful to check for 0 initially, as both
of these methods produce 32 for LB(0). For instance:
extern void die(const char *why); /* terminates execution, never returns */
#define LB(m) ((m) ? LB32(m) : (die("invalid mask to LB()"), 0))
You can even check for out-of-range inputs this way:
#define LB(m) \
((m) && (m) <= 0xffffffff ? LB32(m) : (die("invalid mask to LB()"), 0))
which might also be useful.
--
In-Real-Life: Chris Torek, Wind River Systems (BSD engineering)
El Cerrito, CA, USA Domain: to...@bsdi.com +1 510 234 3167
http://63.193.109.35/torek/ (not always up) I report spam to abuse@.
"nos...@elf.eng.bsdi.com" *is* my address (one of many actually).
> In article <20011231.201443...@erols.com> Michael B. Allen
> <mba...@erols.com> writes:
>>How about this one? It's kinda long but with gcc's -O2 flag it's just as
>>fast as the traditional (val >> sft) & msk. Anyone know of a clever way
>>to reduce this further? The key is to get the index of the lowest bit
>>thats on for the shift.
>
>>/* The index of the lowest bit that's on (used to shift)
>> */
>>#define LB(m) ( \
>> ((m) & 0x00000001L) ? 0 : \
>> ((m) & 0x00000002L) ? 1 : \
>> ((m) & 0x00000004L) ? 2 : \
> [etc -- snipped]
>> ((m) & 0x40000000L) ? 30 : \
>> ((m) & 0x80000000L) ? 31 : 32)
>
> [actually that last should be "yikes! I don't know the answer!" but this
> is not really expressable in C :-) ]
Well, I reasoned that if the input were 0 it would return 32 which if
used to shift would also generate 0 appropriately.
>
> You can use a binary search, but of course, this is actually even more
> code in the expansion, so again we must rely on the compiler to do
> constant folding:
>
> #define LB2(m) (((m) & 1) ? 0 : 1)
> #define LB4(m) (((m) & 0x3) ? LB2(m) : LB2(m >> 2) + 2)
> #define LB8(m) (((m) & 0xf) ? LB4(m) : LB4(m >> 4) + 4)
> #define LB16(m) (((m) & 0xff) ? LB8(m) : LB8(m >> 8) + 8)
> #define LB32(m) (((m) & 0xffff) ? \
> LB16((unsigned)m) : LB16((unsigned)(m >> 16)) + 16)
I tried binary search too although I didn't do the shift and add stuff:
#define LB2(m) ( \
(m) & 0x0000FFFF ? \
(m) & 0x000000FF ? \
(m) & 0x0000000F ? \
(m) & 0x00000003 ? \
(m) & 0x00000001 ? 0 : 1 \
: (m) & 0x00000004 ? 2 : 3 \
: (m) & 0x00000030 ? \
(m) & 0x00000010 ? 4 : 5 \
: (m) & 0x00000040 ? 6 : 7 \
: (m) & 0x00000F00 ? \
(m) & 0x00000300 ? \
(m) & 0x00000100 ? 8 : 9 \
: (m) & 0x00000400 ? 10 : 11 \
: (m) & 0x00003000 ? \
(m) & 0x00001000 ? 12 : 13 \
: (m) & 0x00004000 ? 14 : 15 \
: (m) & 0x00FF0000 ? \
(m) & 0x000F0000 ? \
(m) & 0x00030000 ? \
(m) & 0x00010000 ? 16 : 17 \
: (m) & 0x00040000 ? 18 : 19 \
: (m) & 0x00300000 ? \
(m) & 0x00100000 ? 20 : 21 \
: (m) & 0x00400000 ? 22 : 23 \
: (m) & 0x0F000000 ? \
(m) & 0x03000000 ? \
(m) & 0x01000000 ? 24 : 25 \
: (m) & 0x04000000 ? 26 : 27 \
: (m) & 0x30000000 ? \
(m) & 0x10000000 ? 28 : 29 \
: (m) & 0x40000000 ? 30 : 31 )
Problem is, it doesn't seem to do any good in the no-optimization case.
It took about the same time. I tried your version too. It was only
slightly faster. With optimization on it looks like they all reduce to
(val >> sft) & msk time.
Another trick I've see to get the lowest bit alone is to do:
(m & (m - 1)) ^ m
But I'm not sure this helps especially considering it might need to be evaluated
many times in a macro expression.
> The LBnn version is a bit easier to extend for larger (e.g., 64-bit
> "long long") masks.
Well, for decoding network packets, binary file formats and such I doubt
we'd see long longs of flags. I hope not anyway :~)
If you don't mind I'd like to include your version in my encdec module
(http://auditorymodels.org/encdec/). Now I just need to do IEEE754
floating point encoding/decdoing.
Mike
> In article <a0rg6p$6ah$1...@elf.eng.bsdi.com>, "Chris Torek"
> <nos...@elf.eng.bsdi.com> wrote:
>
> > In article <20011231.201443...@erols.com> Michael B. Allen
> > <mba...@erols.com> writes:
> >>How about this one? It's kinda long but with gcc's -O2 flag it's just as
> >>fast as the traditional (val >> sft) & msk. Anyone know of a clever way
> >>to reduce this further? The key is to get the index of the lowest bit
> >>thats on for the shift.
> >
> > [actually that last should be "yikes! I don't know the answer!" but this
> > is not really expressable in C :-) ]
>
> Well, I reasoned that if the input were 0 it would return 32 which if
> used to shift would also generate 0 appropriately.
Not necessarily. Attempting to shift a 32-bit integer by 32 bits
in a single operation yields undefined behavior. There are
plenty of architectures, x86 among them IIRC, that ignore all
except the low log2(n) bits of a shift count, so that shifting a
32-bit integer by 32 bits would be a no-op. This is ugly,
admittedly.
>
> How about this one? It's kinda long but with gcc's -O2 flag it's
> just as fast as the traditional (val >> sft) & msk. Anyone know
> of a clever way to reduce this further? The key is to get the
> index of the lowest bit thats on for the shift.
>
[Snip code for a long macro to mask and shift a value, based on a
bit mask, e.g. to get 0x23 from FLD(0x12345678, 0x0FF00000)]
You can get m's lowest set bit with (~m+1)&m, and then divide by
that instead of working out a bit index and shifting. Like this:
#define FLD2(i, m) (((i) & (m)) / ((~(m) + 1) & (m)))
I don't suppose it will be any quicker, except maybe when m is not
constant. It will attempt to divide by zero if m is zero.
I find it useful to remember there are two ways to bit-shift in C -
using the shift operators, and with division or multiplication by a
value with a single bit set. Which to use depends on which is easier
to derive, the bit index or a value with that bit set.
Phil T
In article <20020101.014640...@erols.com>
Michael B. Allen <mba...@erols.com> writes:
>Another trick I've see to get the lowest bit alone is to do:
> (m & (m - 1)) ^ m
>But I'm not sure this helps especially considering it might need to
>be evaluated many times in a macro expression.
True; however, once you have the LSBit isolated, so that you have
a power of 2, you can use something like the "table lookup of
remainder" method. On machines with slow divides (most of them)
this turns out to be slow though, and the result is not a constant
in C -- you would need a more powerful compile-time language, such
as C++ templates. (Since you can encode Peano arithmetic in C++
templates, you can do arbitrary mathematical calculations at compile
time. This is not as big a win as it might seem at first, as the
compile times grow exponentially.)
>> The LBnn version is a bit easier to extend for larger (e.g., 64-bit
>> "long long") masks.
>Well, for decoding network packets, binary file formats and such I doubt
>we'd see long longs of flags. I hope not anyway :~)
Why not? Modern 64-bit computers can do 64-bit ALU operations
(except multiply and divide) in their 64-bit registers in a single
cycle.
>If you don't mind I'd like to include your version in my encdec module
>(http://auditorymodels.org/encdec/).
Not a problem...
...
>Another trick I've see to get the lowest bit alone is to do:
>
> (m & (m - 1)) ^ m
With unsigned arithmetic you can achieve the same result using
m & -m
>But I'm not sure this helps especially considering it might need to be evaluated
>many times in a macro expression.
If the macro expands to a constant expression it doesn't matter. If
it doesn't then it would make sense to use temporaries in the calculation.
--
-----------------------------------------
Lawrence Kirby | fr...@genesis.demon.co.uk
Wilts, England | 7073...@compuserve.com
-----------------------------------------
>>In article <a0rg6p$6ah$1...@elf.eng.bsdi.com>, "Chris Torek"
>><nos...@elf.eng.bsdi.com> wrote:
>>> You can use a binary search ...
>
>In article <20020101.014640...@erols.com>
>Michael B. Allen <mba...@erols.com> writes:
>>Another trick I've see to get the lowest bit alone is to do:
>> (m & (m - 1)) ^ m
>>But I'm not sure this helps especially considering it might need to
>>be evaluated many times in a macro expression.
>
>True; however, once you have the LSBit isolated, so that you have
>a power of 2, you can use something like the "table lookup of
>remainder" method. On machines with slow divides (most of them)
>this turns out to be slow though, and the result is not a constant
>in C
For constant expressions the ?: tree is as good as anything. I'm
assuming that since performance seems to be an issue that we're
talking about non-constant expressions. It is true that division is
usually slow, however multiplication is usually pretty fast these days.
Given that I did a bit of searching for some suitable numbers
and came up with the following:
#include <stdio.h>
static const int S_lowbit_lookup[] = { /* Could be an array of characters */
31, 0, 27, 1, 28, 18, 23, 2, 29, 21, 19, 12, 24, 9, 14, 3,
30, 26, 17, 22, 20, 11, 8, 13, 25, 16, 10, 7, 15, 6, 5, 4
};
#define NUMBITS 32
#define LOWBIT(x) S_lowbit_lookup[(((x) & -(x))*0x0fb5ca62 >> 27) & 0x1f]
int main(void)
{
int bit;
for (bit = 0; bit < NUMBITS; bit++) {
unsigned long value1 = 1UL << bit;
unsigned long value2 = value1 * 15;
printf("%08lx:%2d %08lx:%2d\n", value1, LOWBIT(value1),
value2, LOWBIT(value2));
}
return 0;
}
>-- you would need a more powerful compile-time language, such
>as C++ templates. (Since you can encode Peano arithmetic in C++
>templates, you can do arbitrary mathematical calculations at compile
>time. This is not as big a win as it might seem at first, as the
>compile times grow exponentially.)
>
>>> The LBnn version is a bit easier to extend for larger (e.g., 64-bit
>>> "long long") masks.
The code above can be adapted to 64 bits.
>[finding lowest set bit in an integer]
>
>#include <stdio.h>
>
>static const int S_lowbit_lookup[] = { /* Could be an array of characters */
> 31, 0, 27, 1, 28, 18, 23, 2, 29, 21, 19, 12, 24, 9, 14, 3,
> 30, 26, 17, 22, 20, 11, 8, 13, 25, 16, 10, 7, 15, 6, 5, 4
>};
>
>#define NUMBITS 32
>#define LOWBIT(x) S_lowbit_lookup[(((x) & -(x))*0x0fb5ca62 >> 27) & 0x1f]
LOWBIT(0) == 31 is a curious answer. How about:
/* value returned for lowbit_mh(0), adjust to taste */
#define LOWBIT_0 31
int lowbit_mh(unsigned long n) {
static const char tab[] = {
0,26,9,0,30,0,0,7,0,0,0,1,0,23,27,0,10,3,13,
17,31,0,0,0,0,0,25,8,29,0,6,0,0,22,2,12,16,
0,0,24,28,5,21,11,15,0,4,20,14,19,0,18,LOWBIT_0
};
return tab[((((n - 1) & ~n) * 0x2e9bbecdUL) >> 26) & 0x3f];
}
-- Mat.
>On Mon, 14 Jan 2002 20:36:17 +0000 (GMT), fr...@genesis.demon.co.uk
>(Lawrence Kirby) wrote:
>
>>[finding lowest set bit in an integer]
>>
>>#include <stdio.h>
>>
>>static const int S_lowbit_lookup[] = { /* Could be an array of characters */
>> 31, 0, 27, 1, 28, 18, 23, 2, 29, 21, 19, 12, 24, 9, 14, 3,
>> 30, 26, 17, 22, 20, 11, 8, 13, 25, 16, 10, 7, 15, 6, 5, 4
>>};
>>
>>#define NUMBITS 32
>>#define LOWBIT(x) S_lowbit_lookup[(((x) & -(x))*0x0fb5ca62 >> 27) & 0x1f]
>
>LOWBIT(0) == 31 is a curious answer. How about:
There is no valid result for LOWBIT(0). One possibility would be to give
some sort of error value such as -1. But that error value still needs to
be tested for by the subsequent code. If 0 is a possible value for the
argument it makes more sense to test for that *before* using LOWBIT().
Allright, I've been staring at this for a bit and haven't figured
out its genesis.
--
Chuck F (cbfal...@yahoo.com) (cbfal...@XXXXworldnet.att.net)
Available for consulting/temporary embedded and systems.
(Remove "XXXX" from reply address. yahoo works unmodified)
mailto:u...@ftc.gov (for spambots to harvest)
Assuming you're asking about how it works, not who was the very first author
of the technique [I wouldn't know]...
Consider 0x0FB5CA62 written out and multiplied by powers of two, then
truncate leaving only the top 5 bits...
1 = 00001 111101101011100101001100010
3 = 00011 11101101011100101001100010
7 = 00111 1101101011100101001100010
15 = 01111 101101011100101001100010
31 = 11111 01101011100101001100010
30 = 11110 1101011100101001100010
29 = 11101 101011100101001100010
27 = 11011 01011100101001100010
22 = 10110 1011100101001100010
13 = 01101 011100101001100010
26 = 11010 11100101001100010
21 = 10101 1100101001100010
11 = 01011 100101001100010
23 = 10111 00101001100010
14 = 01110 0101001100010
28 = 11100 101001100010
25 = 11001 01001100010
18 = 10010 1001100010 [?? 10011 ?]
5 = 00101 001100010
10 = 01010 01100010
20 = 10100 1100010
9 = 01001 100010
19 = 10011 00010
06 = 00110 0010
12 = 01100 010
24 = 11000 10
17 = 10001 0
2 = 00010
4 = 00100
8 = 01000
16 = 10000
0 = 00000
I can't quite figure out how Lawrence chose that particular multiplier
though. I got 0x0FB9AC52 with...
#include <stdio.h>
#define M 5 /* 2..5 */
#define N (1UL << M) /* 2^M-bit integer */
#define N_MASK (N - 1)
#define LOG2_2POWN(x) \
( magic_table[ ((x)*K >> N-M) & N_MASK] )
#define LOW_BIT(x) LOG2_2POWN( (x)&-(x) )
unsigned long magic_table[N];
unsigned long K;
int main() {
unsigned i;
unsigned long k;
/* The depths of Lawrence's mind... */
for (k = 0, i = 1; i <= N-M; i++) {
K <<= 1;
k = (k << 1) & N_MASK;
if (!magic_table[k+1])
k++, K++;
magic_table[k] = i;
}
for (; i <= N; i++) {
k = (k << 1) & N_MASK;
magic_table[k] = i;
}
for (i = 0; i < N; i++)
magic_table[i]--;
K <<= 1;
/* Dump the fruits of our labour... */
printf("/* %d-bit unsigned int. */\n\n", (int) N);
printf("unsigned int magic_table[] = {\n");
for (i = 0; i < N; i++) {
if (i % 8 == 0) printf(" ");
printf(" %3d", (int) magic_table[i]);
if (i != N - 1) putchar(',');
if (i % 8 == 7 || i == N - 1) putchar('\n');
}
printf("};\n\n");
printf("#define LOG2_2POWN(x) \\\n");
printf(
" ( magic_table[ ((x)*0x%0*lX >> %d) & 0x%02X ] )\n\n",
(int) (N/4), K,
(int) (N-M),
(unsigned) N_MASK
);
printf("#define LOW_BIT(x) LOG2_2POWN( (x)&-(x) )\n");
/* Let's take this baby for a spin... */
for (i = 0; i < N; i++)
if ( LOG2_2POWN(1UL << i) != i
|| LOW_BIT(1UL << i) != i )
{
printf("\n\nBTW, It doesn't work. :(\n");
break;
}
return 0;
}
--
Peter
: Lawrence Kirby wrote:
:
: > #include <stdio.h>
: >
: > static const int S_lowbit_lookup[] = { /* Could be an array of characters */
: > 31, 0, 27, 1, 28, 18, 23, 2, 29, 21, 19, 12, 24, 9, 14, 3,
: > 30, 26, 17, 22, 20, 11, 8, 13, 25, 16, 10, 7, 15, 6, 5, 4
: > };
: >
: > #define NUMBITS 32
: > #define LOWBIT(x) S_lowbit_lookup[(((x) & -(x))*0x0fb5ca62 >> 27) & 0x1f]
:
: Allright, I've been staring at this for a bit and haven't figured
: out its genesis.
(n * 0x0fb5ca62 >> 27) & 0x1f is a hash function; the multiplier 0x0fb5ca62
is designed to produce a collision-free hash. The following will print a
list of suitable multipliers
#include <stdio.h>
#include <string.h>
int main(void) {
unsigned char collide[32];
unsigned long n, mul, hash;
for (mul = 1UL << 27; mul != 0; ++mul, mul &= 0xffffffff) {
memset(collide, 0, sizeof collide);
for (n = 1; n != 0; n <<= 1, n &= 0xffffffff) {
hash = ((n * mul) >> 27) & 0x1f;
if (collide[hash])
break;
collide[hash] = 1;
}
if (n == 0)
printf("multiplier found: 0x%08lx\n", mul);
}
return 0;
}
Then, when you have chosen your multiplier, build a 32-entry lookup table to
associate the hash values with the desired results.
Clever though it is, this sort of approach tends to be slower than a boring
mask/shift/table lookup.
static const char tab[256] = { ... };
if (x & 0xff)
return tab[x & 0xff];
if (x & 0xff00)
return 8 + tab[(x & 0xff00) >> 8];
...
-- Mat.
: mathew...@hotmail.com "Mathew Hendry" wrote:
:
: >LOWBIT(0) == 31 is a curious answer.
:
: There is no valid result for LOWBIT(0). One possibility would be to give
: some sort of error value such as -1.
Plenty of possibilities, but a superficially valid result like 31 is
misleading, especially if the behaviour is undocumented. :)
: If 0 is a possible value for the
: argument it makes more sense to test for that *before* using LOWBIT().
Perhaps, depending on the application.
-- Mat.
>Lawrence Kirby wrote:
>>
>... snip ...
>>
>> For constant expressions the ?: tree is as good as anything. I'm
>> assuming that since performance seems to be an issue that we're
>> talking about non-constant expressions. It is true that division is
>> usually slow, however multiplication is usually pretty fast these days.
>> Given that I did a bit of searching for some suitable numbers
>> and came up with the following:
>>
>> #include <stdio.h>
>>
>> static const int S_lowbit_lookup[] = { /* Could be an array of characters */
>> 31, 0, 27, 1, 28, 18, 23, 2, 29, 21, 19, 12, 24, 9, 14, 3,
>> 30, 26, 17, 22, 20, 11, 8, 13, 25, 16, 10, 7, 15, 6, 5, 4
>> };
>>
>> #define NUMBITS 32
>> #define LOWBIT(x) S_lowbit_lookup[(((x) & -(x))*0x0fb5ca62 >> 27) & 0x1f]
>
>Allright, I've been staring at this for a bit and haven't figured
>out its genesis.
(x) & (-x) clears all bit the lowest order bit in the value leaving
a value that is a power of 2. Multiplying by this effectively left shifts
the other value by an amount that depends on the position of this bit.
Therefore different groups of 5 bits in the other value are shifted into
the top 5 bits of the result. All we need to do is find a multiplier
value with the property that every group of 5 bits in it (also including
an imaginary 4 zero bits to the right) has a different bit pattern. A
little backtracking search program produced the value above. After the
multiplication the top 5 bits are isolated and used to index a lookup
table which creates a permutation to the correct result.
I guess there may be some applications where producing a 0 result for
a 0 input value is OK. For those the values can be rearranged as
follows:
#include <stdio.h>
static const int S_lowbit_lookup[] = {
0, 1, 2, 6, 3, 11, 7, 16, 4, 14, 12, 21, 8, 23, 17, 26,
31, 5, 10, 15, 13, 20, 22, 25, 30, 9, 19, 24, 29, 18, 28, 27
};
#define NUMBITS 32
#define LOWBIT(x) S_lowbit_lookup[(((x) & -(x))*0x04653adf >> 27) & 0x1f]
int main(void)
{
int bit;
printf("%08lx:%2d\n", 0UL, LOWBIT(0UL));
for (bit = 0; bit < NUMBITS; bit++) {
unsigned long value1 = 1UL << bit;
unsigned long value2 = value1 * 15;
printf("%08lx:%2d %08lx:%2d\n", value1, LOWBIT(value1),
value2, LOWBIT(value2));
}
return 0;
}
Creating a separate value would require an extra test in the macro or
doubling the size of the lookup table. The later is possible but the
extra test is, again, better off in the caller code where it can
do something useful, with the condition.
So very simple once it is explained! Thanks.