A slightly faster binary Jacobi implementation.

13 views
Skip to first unread message

David Sparks

unread,
Jan 23, 2026, 4:04:19 PMJan 23
to flint...@googlegroups.com, Fredrik Johansson
This is a slight tweak to the binary _n_jacobi_unsigned()
to avoid the need for a double-word sub_ddmmss in the inner
loop.

Instead the state variables are shifted right 1 bit,
leaving the lsbit implicit, which frees up an msbit to
hold the sign of the difference x-y.

Attached is a self-contained timing test.
jacobi_ref is a reference implementation using divisions
in the loop.
jacobi_bin1 is basically the current FLINT code.
jacobi_bin2 is the revised version.
The letter suffixes distinguish different size
sign-flip variables.

On my computer, the timings are (cycles per call, including
some fixed test overhead):

jacobi_ref : min 871.176195 mean 891.792649 +/-2.295181
jacobi_bin1a: min 353.912791 mean 363.860152 +/-1.235397
jacobi_bin1b: min 353.534790 mean 362.314916 +/-0.914342
jacobi_bin1c: min 348.483994 mean 359.187334 +/-0.907285
jacobi_bin2a: min 336.413793 mean 344.434208 +/-0.783840
jacobi_bin2b: min 335.931995 mean 346.487625 +/-1.123585
jacobi_bin2c: min 383.778764 mean 394.974597 +/-0.853098

Maybe someone could test it on some different computers?
The timing is RDTSC on x86, the (untested, but StackExchange
says it should work) equivalent on ARM, and CLOCK_MONOTONIC
otherwise.

For me, bin2a seems to be the fastest. Why bin1c is the fastest
in its family while bin2c is the slowest is a mystery.
jacobi.c
Reply all
Reply to author
Forward
0 new messages