David Sparks
unread,Jan 23, 2026, 4:04:19 PMJan 23Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to flint...@googlegroups.com, Fredrik Johansson
This is a slight tweak to the binary _n_jacobi_unsigned()
to avoid the need for a double-word sub_ddmmss in the inner
loop.
Instead the state variables are shifted right 1 bit,
leaving the lsbit implicit, which frees up an msbit to
hold the sign of the difference x-y.
Attached is a self-contained timing test.
jacobi_ref is a reference implementation using divisions
in the loop.
jacobi_bin1 is basically the current FLINT code.
jacobi_bin2 is the revised version.
The letter suffixes distinguish different size
sign-flip variables.
On my computer, the timings are (cycles per call, including
some fixed test overhead):
jacobi_ref : min 871.176195 mean 891.792649 +/-2.295181
jacobi_bin1a: min 353.912791 mean 363.860152 +/-1.235397
jacobi_bin1b: min 353.534790 mean 362.314916 +/-0.914342
jacobi_bin1c: min 348.483994 mean 359.187334 +/-0.907285
jacobi_bin2a: min 336.413793 mean 344.434208 +/-0.783840
jacobi_bin2b: min 335.931995 mean 346.487625 +/-1.123585
jacobi_bin2c: min 383.778764 mean 394.974597 +/-0.853098
Maybe someone could test it on some different computers?
The timing is RDTSC on x86, the (untested, but StackExchange
says it should work) equivalent on ARM, and CLOCK_MONOTONIC
otherwise.
For me, bin2a seems to be the fastest. Why bin1c is the fastest
in its family while bin2c is the slowest is a mystery.