Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How can I get an implementation of fmod()?

45 views
Skip to first unread message

Megumin

unread,
Aug 25, 2023, 2:25:44 AM8/25/23
to
How does remainder() in cmath works and is there any simple but fairly precise fmod that can be implemented without using double or long double?
I'm currently trying to implement fmodf(float x,float y) myself in assembly, and seems like in IEEE754, if I try to do as follows, the precision will f-up badly if x>>y in fact if x/y is more than 2^23 this will just makes no sense at all.
r1 = f32(x);
r2 = f32(y);
r3 = div_f32(r1,r2); //precision:2ulp
r4 = trunc(r3)// this is where things go bad if r3 is more than 2^23 it does nothing and I will get no remainder.
r4 = multi_f32(r4,r2);
r4 = subtract_f32(r1,r4);
result = r4;

Bonita Montero

unread,
Aug 25, 2023, 3:19:56 AM8/25/23
to
Look at the /sysdeps/ieee754/dbl-64/e_fmod.c glibc code. They've changed
the code for the 8/1/2023 release and made the algorithm multiple times
faster. The fmod() result has no precision loss.
Don't bother with x87's FPREM. It's claimed to be excact but actally its
an approximation on the CPU's I've recently checked (Zen1, Zen2, Zen4,
Skylake).

Bonita Montero

unread,
Aug 25, 2023, 3:35:41 AM8/25/23
to
Am 25.08.2023 um 09:19 schrieb Bonita Montero:
> Am 25.08.2023 um 08:25 schrieb Megumin:
>> How does remainder() in cmath works and is there any simple but fairly
>> precise fmod that can be implemented without using double or long double?
>> I'm currently trying to implement fmodf(float x,float y) myself in
>> assembly, and seems like in IEEE754, if I try to do as follows, the
>> precision will f-up badly if x>>y in fact if x/y is more than 2^23
>> this will just makes no sense at all.
>> r1 = f32(x);
>> r2 = f32(y);
>> r3 = div_f32(r1,r2);        //precision:2ulp
>> r4 = trunc(r3)// this is where things go bad if r3 is more than 2^23
>> it does nothing and I will get no remainder.
>> r4 = multi_f32(r4,r2);
>> r4 = subtract_f32(r1,r4);
>> result = r4;
>>


The problem with your solution is that your div_f32 might loss
bits right from the mantissa but before the comma. This might
yield a result that is larger than the dividend.

Bonita Montero

unread,
Aug 25, 2023, 3:49:37 AM8/25/23
to
Am 25.08.2023 um 08:25 schrieb Megumin:
Here's some fmod()-code I implemented myself for float and double.
It always produces results without precision loss because the result
is always smaller than the dividend.

template<floating_point Float>
static Float myFmodT( Float counter, Float denom )
{
constexpr bool IS_DOUBLE = same_as<Float, double>;
static_assert(IS_DOUBLE || same_as<Float, float>);
constexpr unsigned
BITS = IS_DOUBLE ? 64 : 32,
MANT_BITS = IS_DOUBLE ? 52 : 23,
EXP_BITS = IS_DOUBLE ? 11 : 8;
static_assert(sizeof(Float) * 8 == BITS &&
numeric_limits<Float>::is_iec559);
using bin_t = conditional_t<IS_DOUBLE, uint64_t, uint32_t>;
constexpr int MAX_EXP = (1 << EXP_BITS) - 1;
constexpr bin_t
SIGN_BIT = (bin_t)1 << BITS - 1,
EXP_MASK = (bin_t)MAX_EXP << MANT_BITS,
IMPLCIT_BIT = (bin_t)1 << MANT_BITS,
MANT_MASK = IMPLCIT_BIT - 1;
static auto bin = []( Float f ) -> bin_t { return bit_cast<bin_t>( f ); };
static auto flt = []( bin_t b ) -> Float { return bit_cast<Float>( b ); };
bin_t
bCounter = bin( counter ),
bDenom = bin( denom ) & ~SIGN_BIT,
bSign = bCounter & SIGN_BIT;
bCounter &= ~SIGN_BIT;
static auto maxExp = []( bin_t b ) -> bool { return b >= EXP_MASK; };
static auto infMant = []( bin_t b ) -> bool { return !(b & MANT_MASK); };
auto floatingResult = [&]() -> Float { return (counter * denom) /
(counter * denom); };
if( !bDenom ) [[unlikely]]
// ... % +/-0.0
return floatingResult();
if( maxExp( bCounter ) ) [[unlikely]]
// [Inf|NaN|SNaN] % non-zero
return floatingResult();
if( maxExp( bDenom ) ) [[unlikely]]
if( infMant( bDenom ) ) [[likely]]
// finite % Inf
return counter;
else
// finite % [Inf|NaN|SNaN]
return floatingResult();
if( !bCounter ) [[unlikely]]
// +/-0.0 % non-zero-finite = +/-0.0
return counter;
int
counterExp = bCounter >> MANT_BITS & MAX_EXP,
denomExp = bDenom >> MANT_BITS & MAX_EXP;
bin_t
counterMant = (bin_t)(bool)counterExp << MANT_BITS | bCounter & MANT_MASK,
denomMant = (bin_t)(bool)denomExp << MANT_BITS | bDenom & MANT_MASK;
static auto normalize = []( bin_t &mant, int &exp )
{
unsigned bits = countl_zero( mant ) - EXP_BITS;
mant <<= bits;
exp -= bits;
};
if( !counterExp ) [[unlikely]]
// normalize counter
normalize( counterMant, counterExp ),
++counterExp;
if( !denomExp ) [[unlikely]]
// normalize denominator
normalize( denomMant, denomExp ),
++denomExp;
int remExp = counterExp;
bin_t remMant = counterMant;
for( ; ; )
{
int below = remMant < denomMant;
if( remExp - below < denomExp ) [[unlikely]]
break;
remExp -= below;
remMant <<= below;
if( !(remMant -= denomMant) ) [[unlikely]]
return flt( bSign );
normalize( remMant, remExp );
}
if( remExp <= 0 ) [[unlikely]]
// denormal result
remMant >>= -remExp + 1,
remExp = 0;
return flt( bSign | (bin_t)remExp << MANT_BITS | remMant & MANT_MASK );
}

James Kuyper

unread,
Aug 25, 2023, 4:35:51 AM8/25/23
to
On 8/25/23 02:25, Megumin wrote:
> How does remainder() in cmath works ...

"The remainder functions compute the remainder x REM y required by IEC
60559. 259)" (7.12.10.2p2). Note that ISO/IEC 60559 is equivalent to
IEEE 754. That "259" refers to footnote 259, which quote that standard
as saying:

"When y ̸= 0, the remainder r = x REM y is defined regardless of the
rounding mode by the mathematical relation r = x − ny, where n is the
integer nearest the exact value of x/y; whenever |n − x| = 1/2 , then n
is even. If r = 0, its sign shall be that of x."

"The roundeven functions round their argument to the nearest integer
value in floating-point format, rounding halfway cases to even (that is,
to the nearest value that is an even integer), regardless of the current
rounding direction." (7.12.9.8p2)

Therefore, an implementation could define

float remainderf(float x, thus y) {
return x - roundevenf(x/y)*y;
}

> and is there any simple but fairly precise fmod that can be
> implemented without using double or long double?

"The fmod functions return the value x − ny, for some integer n such
that, if y is nonzero, the result has the same sign as x and magnitude
less than the magnitude of y. If y is zero, whether a domain error
occurs or the fmod functions return zero is implementation-defined."
(7.12.10.1p3).

"The trunc functions round their argument to the integer value, in
floating format, nearest to but no larger in magnitude than the
argument." (7.12.9.1p2)

Therefore, an implementation could define

float fmodf(float x,float y) {
return x - truncf(x/y)*y;
}

The "but no larger in magnitude" specification for trunc() guarantees
that this subtraction has the same sign as x, as is required for fmod().

> I'm currently trying to implement fmodf(float x,float y) myself in
> assembly, and seems like in IEEE754, if I try to do as follows, the
> precision will f-up badly if x>>y in fact if x/y is more than 2^23
> this will just makes no sense at all.

You say that you're doing it in assembly, which I couldn't help you
with, but the following looks a lot more like C than assembly.

> r1 = f32(x);
> r2 = f32(y);
> r3 = div_f32(r1,r2); //precision:2ulp
> r4 = trunc(r3)// this is where things go bad if r3 is more than 2^23
> it does nothing and I will get no remainder.
> r4 = multi_f32(r4,r2); r4 = subtract_f32(r1,r4);
> result = r4;
I suspect that it is not possible, and therefore not required by IEC
754, to produce a more accurate result than that in those cases. On the
other hand, it's 03:42 here right now, and I couldn't find my copy of
IEEE 754, so I can't be sure that I'm remembering and thinking about
that correctly.

Bonita Montero

unread,
Aug 25, 2023, 4:37:47 AM8/25/23
to
Am 25.08.2023 um 10:34 schrieb James Kuyper:

> float fmodf(float x,float y) {
> return x - truncf(x/y)*y;
> }

This doesn't work because when the exponent difference of x and y is
too large there might be integer-bits before the fraction of the div
-result lost so that the difference becomes larger than y.
A proper solution is much more complex. Check the solution I've shown
in this thread or the recend 8/1/2023 glibc solution in /sysdepes
/ieee754/dbl-64/e_fmod.c.



Bonita Montero

unread,
Aug 25, 2023, 4:44:01 AM8/25/23
to
Check this code:

#include <iostream>
#include <cmath>
#include <bit>
#include <cstdint>

using namespace std;

int main()
{
double
a = bit_cast<double>( (uint64_t)0x7FE << 52 ),
b = bit_cast<double>( 1.0 );
cout << b - trunc( a / b ) << endl;
}

The result should be >= 0.0 and < 1.0 and my computer prints:

-8.98847e+307

James Kuyper

unread,
Aug 25, 2023, 4:45:15 AM8/25/23
to
On 8/25/23 04:34, James Kuyper wrote:
> On 8/25/23 02:25, Megumin wrote:
>> How does remainder() in cmath works ...

I just noticed that this message was posted on comp.lang.c++, whereas
the content left me with the incorrect impression that it had been
posted to comp.lang.c. All of the citations I gave were from the C
standard. That's still appropriate in comp.lang.c++, because the C++
standard doesn't say anything relevant itself about these issues, but
only cross-references the C standard - but I should have mentioned that
fact. Also, I should have specified C++ overloads of the math functions
for float arguments, rather than the corresponding C functions with 'f'
suffixes.

Bonita Montero

unread,
Aug 25, 2023, 5:20:51 AM8/25/23
to
The code before was bullshit.

#include <iostream>
#include <cmath>

using namespace std;

int main()
{
constexpr double
A = 0x1.FFFFFFFFFFFFFp+1023,
B = 3.14;
cout << A - trunc( A / B ) * B << endl;
}

Take this. For my computer this prints: 1.99584e+292

Megumin

unread,
Aug 31, 2023, 1:52:44 AM8/31/23
to
After researching I found something that seems quite correct but I ain't implement it yet.(for this is going to be running on a hardware which can process one operation for 1024 numbers at once so using indefinite loops is not recommended)
float temp = y;
while((temp<<1)<x)temp<<=1;
while(x>y){
if(x>temp)x-=temp;
temp/=2;
}
no matter what thank you all for your insights.

Bonita Montero

unread,
Aug 31, 2023, 5:01:10 AM8/31/23
to
Am 31.08.2023 um 07:52 schrieb Megumin:
> After researching I found something that seems quite correct but I ain't implement it yet.(for this is going to be running on a hardware which can process one operation for 1024 numbers at once so using indefinite loops is not recommended)
> float temp = y;
> while((temp<<1)<x)temp<<=1;

Trying to shift floats ?

Ben Bacarisse

unread,
Aug 31, 2023, 4:48:49 PM8/31/23
to
Megumin <chrisram...@gmail.com> writes:

> After researching I found something that seems quite correct but I
> ain't implement it yet.

Have you tried:

x - (int)(x/y) * y

? Obviously you might be worried about corner cases and accuracy, but
that is at least a starting point.

> float temp = y;
> while((temp<<1)<x)temp<<=1;
> while(x>y){
> if(x>temp)x-=temp;
> temp/=2;
> }

This is not valid C or C++. If can point to where you found it, maybe
we can help make some sense of what it is trying to achieve?

--
Ben.

Keith Thompson

unread,
Aug 31, 2023, 5:37:58 PM8/31/23
to
Suggestion: Always compile code before posting it.

The "<<" operator cannot be applied to floating-point operands.

And indentation makes code much more legible.

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */
0 new messages