On Tuesday, January 20th, 2026 at 9:26 AM, Fredrik Johansson <
fredrik....@gmail.com> wrote:
> On Mon, Jan 19, 2026 at 7:11 PM David Sparks <
spar...@proton.me> wrote:
>> A couple of questions about recent commit e9c1838b541
>> "Clean up and optimize n_root, n_rootrem"
>>
>> Why is the inv_table[] initialized with decimal constants?
>> Wouldn't "1.0f/4, 1.0f/5, 1.0f/6, ..., 1.0f/40" be clearer and less
>> error-prone?
>
> That or hex float literals, though it doesn't really matter for this particular table.
The two issues are
- Make the intent clearer to the reader (readable code is better
than the best comment), and
- Can't be corrupted by a typo. Are you sure the 6th decimal
place in 1/23 is correct?
While hex float literals protect against rounding errors in the
compiler's decimal parsing, an expression like 1.0/5 solves everything.
>> If root == 4, you're allowed to use "sqrtf" rather than "sqrt";
>> you only need 16 bits of precision, and float gives 24.
>
> At least on my machine, sqrtf isn't faster than sqrt.
Huh! According to Agner Fog, here are the x86 timings:
SQRTSS (float) SQRTSD (double)
Latency Throughput Latency Throughpout
Sandy Bridge 10-14 10-14 10-21 10-21
Ivy Bridge 11 7 16 8-14
Haswell (same) (same)
Broadwell 11 4 15-16 4-8
Skylake 12 3 15-16 4-6
Coffee Lake (same) (same)
Cannon Lake (same) 16 4-6 (probably just a measurement glitch)
Ice Lake (same) 15-16 4-6
Steamroller 12-13 4-9 26-29 4-18
Excavator 13 4 16 4-9
Zen1 9-10 4-5 14-15 4-8
Zen2 14 6 20 9
Zen3 14 5 20 9
Zen4 15 5 21 8
Do you have an Intel processor? They have a smaller
double-precision penalty.
> Unfortunately, the libm logf and expf don't come with any accuracy guarantees, so we need to loop in both direction to verify the result. We could of course do this kind of thing if we shipped our own versions with known error bounds.
Actually, we do know some things about logf & expf. Most importantly,
they didn't exist until C99; K&R and C89 only defined log() and exp(),
and early implementations were just wrappers around the existing double-
precision code. Only then were the implementations tuned for speed
without sacrificing too much accuracy.
In general, it's far easier to produce a high-precision float function
if you have double available internally, so it is not credible that
there has *ever* been a logf or expf implementation with more than a
few ULPs of error (except maybe bugs with extreme values like
denormalized numbers, but that doesn't apply here).
Here are some surveys of various implementations:
"Accuracy of Mathematical Functions in Single Precision"
https://members.loria.fr/PZimmermann/papers/glibc232-20200917.pdf
"Accuracy of Mathematical Functions in Single, Double, Double Extended, and Quadruple Precision"
https://members.loria.fr/PZimmermann/papers/glibc238-20230921.pdf
Note that they can't find any implementation of logf or expf with
more than 1.0 ULP of error. glibc is migrating to the core-math
transcendental functions, which are correctly rounded (<0.5 ULP
of error).
On the other hand, the maximum n_root(UWORD_MAX,5) = 7,131.
log(7132) = 8.87234697898... = 0x8.df522
log(7131) = 8.87220675602... = 0x8.df48f
a difference of 0x93 = 147 ULPs!
log(7132^5) = 5*log(7132) = 44.36173489491... = 0xb.1726bp+2
log(7131^5) = 5*log(7131) = 44.36103378014... = 0xb.171b3p+2
here, the difference is 0xb8 = 184 ULPs, because the mantissas are
larger.
In the exp() output, the difference between 7132 and 7131 is
even larger (0xd.ee000p+9 vs. 0xd.ed800p+9 is 0x800 = 2048 ULPs)
because more of the information is in the exponent.
The key point is that these errors are more than 100x larger than any
plausible implementation, and we can easily make a reasonable-but-
conservative assumption of, say, 10 ULPs. And add a test case to
verify. There are only a few thousand cases, basically n^k-1 and n^k,
and you could test the much tighter expected bounds directly, basically
examining n - expf(logf(n_pow(n,k)) * (1.0/k)) and generating a self-test
failure (and hopefully a bug report) on anything much over 1 ULP.
(The reason we care is that it affects how much we have to round up to
ensure the answer is never too low, which in turn affects how often a
correction is applied and a second exponentiation must be performed to
recompute the remainder.)
>> (Don't get me started on the fact that n_pow is an iterative loop
>> and starts with a multiply by 1. Or the fact that inv_table saving
>> one divide is lost in the noise compared to logf() & expf().)
>
> logf and expf in e.g. glibc are very fast; a division isn't negligible in comparison. However, instead of a one-off table just for n_root it would make sense to have a table of small floating-point reciprocals global to FLINT as it has various other uses.
I just looked at the glibc implementations of logf and expf, and was
surprised (but gratified) to see that they use no divisions; they break
each octave into 16 or 32 intervals, and use a polynomial approximation
within each.
The core-math correctly rounded versions is a bit more complex, but
still not bad; log2m uses 2*65*8 = 1040 bytes of tables, plus a few
words of polynomial coefficients. (Contrast that with the serious hair
in glibc/sysdeps/ieee754/flt-32/s_exp2m1f.c.)
One thing I just realized... C99 introduced both logf *and* log2f.
In practice (including glibc), log(f) is implemented in terms of
log2(f), because the first step is to extract the binary exponent.
So using log2f/exp2f should be no less portable and might be a tiny
bit faster.