Conjecture that arose implementing ctz128()

15 views

Skip to first unread message

David Sparks

unread,

Apr 20, 2026, 4:09:20 PMApr 20

to flint-devel

A little computer/mathematical puzzle I ran into recently,
sharing in case anyone finds it interesting (or knows of an obvious
proof).

The Chess Programming Wiki (a great collection of low-level
bit hacks) mentions "Matt Taylor's Folding trick" for implementing
a 2N-bit count trailing zeros using an N-bit multiply.

https://www.chessprogramming.org/BitScan#Matt_Taylor.27s_Folding_trick

int ctz_2N(uint2N_t x)
{
static const char table[2*N] = { ... };
assert(x != 0);
x ^= x-1; // Convert to 00001111, with b+1 trailing ones
uintN_t y = (x >> n) ^ x;
return table[y * MAGIC_CONSTANT >> (N + 1 - log2(N))];
}

The two XOR operations map a 2N-bit input to one
of 2N n-bit symbols, bases on the number of trailing
zeros. Using n=4 as an example:

abcdefg1 ^ abcdefg0 -> 00000001, 0000 ^ 0001 -> 0001
abcdef10 ^ abcdef01 -> 00000011, 0000 ^ 0011 -> 0011
abcde100 ^ abcde011 -> 00000111, 0000 ^ 0111 -> 0111
abcd1000 ^ abcd0111 -> 00001111, 0000 ^ 1111 -> 1111
abc10000 ^ abc01111 -> 00011111, 0001 ^ 1111 -> 1110
ab100000 ^ ab011111 -> 00111111, 0011 ^ 1111 -> 1100
a1000000 ^ a0111111 -> 01111111, 0111 ^ 1111 -> 1000
10000000 ^ 01111111 -> 11111111, 1111 ^ 1111 -> 0000
00000000 ^ 11111111 -> 11111111, 1111 ^ 1111 -> 0000 (usually forbidden)

The multiply by N and shift act as a minimal perfect hash
function, which the table than maps to the desired
return value.

Anyway, there is no need for a multiply if N=1 or N=2 (the folded
result is the same size as the output), there is one possible magic
constant (5) for N=4, one (0x63) for N=8, two for N=16, four for
N=32, and >= 200 for N=64.

But I noticed that they all have a Hamming weight of N/2.
For the first few sizes, this might just be luck, but
I've found 200 possible magic constants for N=64 and they
all have Hamming weight 32.

That seems unlikely unless there's a reason, but so far I've
been unable to prove it's required. If the symbols were all
of the form 2^k, I could use de Bruijn sequence theory, but
they're of the forms 2^N - 2^k and 2^k - 1. They are *close* to
each other's negative, but that -1 causes carry propagation which
messes up the symmetry. And the bit field extracted isn't log2(N)
bits wide, but 1+log2(N), so there's no requirement that all N-bit
values are included.

For example, the 64-bit multiplier 0x0x7c0b95b31d0d489f is not a
de Bruijn sequence; it does not include 011110 or 111111, and does include 011111 and 111110 twice each.

Reply all

Reply to author

Forward

0 new messages