[go] cmd/compile: use FCLASSD for subnormal checks on riscv64

1 view

Skip to first unread message

Michael Munday (Gerrit)

unread,

Nov 3, 2025, 5:53:19 PM (3 days ago) Nov 3

to goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Michael Munday has uploaded the change for review

Commit message

cmd/compile: use FCLASSD for subnormal checks on riscv64

Only implemented for 64 bit floating point operations for now.

goos: linux
goarch: riscv64
pkg: math
cpu: Spacemit(R) X60
                    │       sec/op        │   sec/op     vs base                │
Acos                          156.8n ± 2%   154.1n ± 0%   -1.72% (p=0.000 n=10)
Acosh                         214.1n ± 1%   212.0n ± 0%   -0.98% (p=0.000 n=10)
Asin                          150.4n ± 2%   149.6n ± 2%        ~ (p=0.897 n=10)
Asinh                         248.9n ± 1%   247.2n ± 2%        ~ (p=0.340 n=10)
Atan                          98.82n ± 2%   99.38n ± 2%        ~ (p=0.468 n=10)
Atanh                         234.2n ± 2%   234.6n ± 2%        ~ (p=0.608 n=10)
Atan2                         161.5n ± 2%   156.4n ± 2%   -3.13% (p=0.000 n=10)
Cbrt                          187.5n ± 1%   183.3n ± 2%   -2.21% (p=0.004 n=10)
Ceil                          36.96n ± 2%   37.67n ± 3%        ~ (p=0.218 n=10)
Copysign                      6.671n ± 2%   6.580n ± 2%        ~ (p=0.853 n=10)
Cos                           98.08n ± 2%   93.53n ± 3%   -4.64% (p=0.000 n=10)
Cosh                          232.7n ± 3%   221.4n ± 2%   -4.81% (p=0.000 n=10)
Erf                           122.9n ± 0%   113.1n ± 2%   -7.97% (p=0.000 n=10)
Erfc                          125.6n ± 0%   117.1n ± 2%   -6.73% (p=0.000 n=10)
Erfinv                        139.2n ± 0%   139.5n ± 2%        ~ (p=0.753 n=10)
Erfcinv                       139.7n ± 0%   137.7n ± 2%        ~ (p=0.135 n=10)
Exp                           194.9n ± 0%   182.1n ± 3%   -6.54% (p=0.000 n=10)
ExpGo                         204.9n ± 1%   192.6n ± 0%   -6.03% (p=0.000 n=10)
Expm1                         155.5n ± 2%   144.5n ± 0%   -7.10% (p=0.000 n=10)
Exp2                          173.1n ± 2%   162.0n ± 0%   -6.44% (p=0.000 n=10)
Exp2Go                        185.2n ± 2%   172.9n ± 0%   -6.64% (p=0.000 n=10)
Abs                           4.913n ± 2%   4.896n ± 0%   -0.36% (p=0.047 n=10)
Dim                           15.56n ± 0%   15.51n ± 0%   -0.35% (p=0.048 n=10)
Floor                         36.79n ± 0%   36.70n ± 0%        ~ (p=0.140 n=10)
Max                           31.02n ± 0%   31.02n ± 0%        ~ (p=0.486 n=10)
Min                           31.12n ± 0%   31.02n ± 0%        ~ (p=0.058 n=10)
Mod                           292.8n ± 0%   244.7n ± 0%  -16.43% (p=0.000 n=10)
Frexp                         45.08n ± 0%   35.11n ± 0%  -22.11% (p=0.000 n=10)
Gamma                         195.3n ± 1%   181.9n ± 0%   -6.86% (p=0.000 n=10)
Hypot                         85.09n ± 0%   83.26n ± 0%   -2.14% (p=0.000 n=10)
HypotGo                       96.60n ± 1%   95.49n ± 0%   -1.15% (p=0.000 n=10)
Ilogb                         44.86n ± 0%   35.07n ± 0%  -21.82% (p=0.000 n=10)
J0                            635.1n ± 0%   620.9n ± 0%   -2.24% (p=0.000 n=10)
J1                            644.8n ± 0%   634.8n ± 0%   -1.54% (p=0.000 n=10)
Jn                            1.361µ ± 0%   1.337µ ± 0%   -1.76% (p=0.000 n=10)
Ldexp                         49.98n ± 0%   39.99n ± 0%  -19.97% (p=0.000 n=10)
Lgamma                        185.9n ± 1%   177.0n ± 0%   -4.79% (p=0.000 n=10)
Log                           150.7n ± 0%   141.3n ± 0%   -6.24% (p=0.000 n=10)
Logb                          46.52n ± 0%   35.90n ± 0%  -22.83% (p=0.000 n=10)
Log1p                         164.6n ± 0%   151.9n ± 0%   -7.75% (p=0.000 n=10)
Log10                         152.9n ± 0%   143.6n ± 0%   -6.05% (p=0.000 n=10)
Log2                          58.84n ± 1%   49.77n ± 0%  -15.41% (p=0.000 n=10)
Modf                          40.97n ± 0%   40.01n ± 0%   -2.34% (p=0.000 n=10)
Nextafter32                   49.00n ± 0%   49.37n ± 0%   +0.74% (p=0.000 n=10)
Nextafter64                   43.33n ± 0%   43.65n ± 0%   +0.74% (p=0.000 n=10)
PowInt                        269.2n ± 0%   242.4n ± 0%   -9.95% (p=0.000 n=10)
PowFrac                       618.1n ± 0%   570.4n ± 0%   -7.71% (p=0.000 n=10)
Pow10Pos                      13.07n ± 0%   13.05n ± 0%        ~ (p=0.327 n=10)
Pow10Neg                      31.12n ± 0%   19.58n ± 0%  -37.08% (p=0.000 n=10)
Round                         23.73n ± 0%   23.69n ± 0%        ~ (p=0.122 n=10)
RoundToEven                   27.73n ± 1%   27.73n ± 0%        ~ (p=0.750 n=10)
Remainder                     283.4n ± 0%   249.8n ± 0%  -11.87% (p=0.000 n=10)
Signbit                       11.43n ± 0%   11.43n ± 0%        ~ (p=0.649 n=10)
Sin                           115.1n ± 1%   110.9n ± 0%   -3.65% (p=0.000 n=10)
Sincos                        140.7n ± 0%   137.2n ± 0%   -2.49% (p=0.000 n=10)
Sinh                          251.6n ± 0%   236.5n ± 0%   -5.96% (p=0.000 n=10)
SqrtIndirect                  4.905n ± 0%   4.899n ± 0%        ~ (p=0.443 n=10)
SqrtLatency                   19.68n ± 1%   19.58n ± 0%        ~ (p=0.065 n=10)
SqrtIndirectLatency           19.64n ± 0%   19.60n ± 0%        ~ (p=0.065 n=10)
SqrtGoLatency                 197.7n ± 0%   197.4n ± 0%        ~ (p=0.168 n=10)
SqrtPrime                     5.752µ ± 0%   5.731µ ± 0%        ~ (p=0.136 n=10)
Tan                           148.5n ± 1%   142.0n ± 0%   -4.38% (p=0.000 n=10)
Tanh                          249.1n ± 0%   236.6n ± 0%   -5.00% (p=0.000 n=10)
Trunc                         36.73n ± 0%   36.73n ± 0%        ~ (p=0.835 n=10)
Y0                            638.3n ± 0%   631.7n ± 0%   -1.05% (p=0.000 n=10)
Y1                            641.1n ± 0%   634.3n ± 0%   -1.07% (p=0.000 n=10)
Yn                            1.356µ ± 0%   1.340µ ± 0%   -1.18% (p=0.000 n=10)
Float64bits                   5.714n ± 0%   5.712n ± 0%        ~ (p=0.277 n=10)
Float64frombits               4.912n ± 0%   4.897n ± 0%        ~ (p=0.119 n=10)
Float32bits                   12.27n ± 0%   12.23n ± 0%        ~ (p=0.121 n=10)
Float32frombits               4.902n ± 0%   4.897n ± 0%        ~ (p=0.611 n=10)
FMA                           6.553n ± 0%   6.530n ± 0%        ~ (p=0.098 n=10)
geomean                       86.81n        82.69n        -4.74%

Change-Id: I522297a79646d76543d516accce291f5a3cea337

Change diff

diff --git a/src/cmd/compile/internal/ssa/_gen/RISCV64.rules b/src/cmd/compile/internal/ssa/_gen/RISCV64.rules
index 49bdbc8..b43b7e2 100644
--- a/src/cmd/compile/internal/ssa/_gen/RISCV64.rules
+++ b/src/cmd/compile/internal/ssa/_gen/RISCV64.rules
@@ -834,6 +834,16 @@
 (FEQD x (FMOVDconst [c])) && float64ExactBits(c, math.Inf(1)) => (SNEZ (ANDI <typ.Int64> [1<<7] (FCLASSD x)))
 (FNED x (FMOVDconst [c])) && float64ExactBits(c, math.Inf(1)) => (SEQZ (ANDI <typ.Int64> [1<<7] (FCLASSD x)))
 
+// Test for subnormal numbers using 64 bit classify instruction.
+(FLTD x (FMOVDconst [+0x1p-1022])) => (SNEZ (ANDI <typ.Int64> [0b00_0011_1111] (FCLASSD x)))
+(FLED (FMOVDconst [+0x1p-1022]) x) => (SNEZ (ANDI <typ.Int64> [0b00_1100_0000] (FCLASSD x)))
+(FLED x (FMOVDconst [-0x1p-1022])) => (SNEZ (ANDI <typ.Int64> [0b00_0000_0011] (FCLASSD x)))
+(FLTD (FMOVDconst [-0x1p-1022]) x) => (SNEZ (ANDI <typ.Int64> [0b00_1111_1100] (FCLASSD x)))
+
+// Absorb unary sign bit operations into 64 bit classify instruction.
+(ANDI [c] (FCLASSD (FNEGD x))) => (ANDI [(c&0b11_0000_0000)|int64(bits.Reverse8(uint8(c))&0b1111_1111)] (FCLASSD x))
+(ANDI [c] (FCLASSD (FABSD x))) => (ANDI [(c&0b11_1111_0000)|int64(bits.Reverse8(uint8(c))&0b0000_1111)] (FCLASSD x))
+
 //
 // Optimisations for rva22u64 and above.
 //
diff --git a/src/cmd/compile/internal/ssa/rewriteRISCV64.go b/src/cmd/compile/internal/ssa/rewriteRISCV64.go
index 8a390eb..96e8b55 100644
--- a/src/cmd/compile/internal/ssa/rewriteRISCV64.go
+++ b/src/cmd/compile/internal/ssa/rewriteRISCV64.go
@@ -4,6 +4,7 @@
 
 import "internal/buildcfg"
 import "math"
+import "math/bits"
 import "cmd/compile/internal/types"
 
 func rewriteValueRISCV64(v *Value) bool {
@@ -3479,6 +3480,8 @@
 }
 func rewriteValueRISCV64_OpRISCV64ANDI(v *Value) bool {
 	v_0 := v.Args[0]
+	b := v.Block
+	typ := &b.Func.Config.Types
 	// match: (ANDI [0] x)
 	// result: (MOVDconst [0])
 	for {
@@ -3525,6 +3528,44 @@
 		v.AddArg(z)
 		return true
 	}
+	// match: (ANDI [c] (FCLASSD (FNEGD x)))
+	// result: (ANDI [(c&0b11_0000_0000)|int64(bits.Reverse8(uint8(c))&0b1111_1111)] (FCLASSD x))
+	for {
+		c := auxIntToInt64(v.AuxInt)
+		if v_0.Op != OpRISCV64FCLASSD {
+			break
+		}
+		v_0_0 := v_0.Args[0]
+		if v_0_0.Op != OpRISCV64FNEGD {
+			break
+		}
+		x := v_0_0.Args[0]
+		v.reset(OpRISCV64ANDI)
+		v.AuxInt = int64ToAuxInt((c & 0b11_0000_0000) | int64(bits.Reverse8(uint8(c))&0b1111_1111))
+		v0 := b.NewValue0(v.Pos, OpRISCV64FCLASSD, typ.Int64)
+		v0.AddArg(x)
+		v.AddArg(v0)
+		return true
+	}
+	// match: (ANDI [c] (FCLASSD (FABSD x)))
+	// result: (ANDI [(c&0b11_1111_0000)|int64(bits.Reverse8(uint8(c))&0b0000_1111)] (FCLASSD x))
+	for {
+		c := auxIntToInt64(v.AuxInt)
+		if v_0.Op != OpRISCV64FCLASSD {
+			break
+		}
+		v_0_0 := v_0.Args[0]
+		if v_0_0.Op != OpRISCV64FABSD {
+			break
+		}
+		x := v_0_0.Args[0]
+		v.reset(OpRISCV64ANDI)
+		v.AuxInt = int64ToAuxInt((c & 0b11_1111_0000) | int64(bits.Reverse8(uint8(c))&0b0000_1111))
+		v0 := b.NewValue0(v.Pos, OpRISCV64FCLASSD, typ.Int64)
+		v0.AddArg(x)
+		v.AddArg(v0)
+		return true
+	}
 	return false
 }
 func rewriteValueRISCV64_OpRISCV64FADDD(v *Value) bool {
@@ -3677,6 +3718,38 @@
 		v.AddArg(v0)
 		return true
 	}
+	// match: (FLED (FMOVDconst [+0x1p-1022]) x)
+	// result: (SNEZ (ANDI <typ.Int64> [0b00_1100_0000] (FCLASSD x)))
+	for {
+		if v_0.Op != OpRISCV64FMOVDconst || auxIntToFloat64(v_0.AuxInt) != +0x1p-1022 {
+			break
+		}
+		x := v_1
+		v.reset(OpRISCV64SNEZ)
+		v0 := b.NewValue0(v.Pos, OpRISCV64ANDI, typ.Int64)
+		v0.AuxInt = int64ToAuxInt(0b00_1100_0000)
+		v1 := b.NewValue0(v.Pos, OpRISCV64FCLASSD, typ.Int64)
+		v1.AddArg(x)
+		v0.AddArg(v1)
+		v.AddArg(v0)
+		return true
+	}
+	// match: (FLED x (FMOVDconst [-0x1p-1022]))
+	// result: (SNEZ (ANDI <typ.Int64> [0b00_0000_0011] (FCLASSD x)))
+	for {
+		x := v_0
+		if v_1.Op != OpRISCV64FMOVDconst || auxIntToFloat64(v_1.AuxInt) != -0x1p-1022 {
+			break
+		}
+		v.reset(OpRISCV64SNEZ)
+		v0 := b.NewValue0(v.Pos, OpRISCV64ANDI, typ.Int64)
+		v0.AuxInt = int64ToAuxInt(0b00_0000_0011)
+		v1 := b.NewValue0(v.Pos, OpRISCV64FCLASSD, typ.Int64)
+		v1.AddArg(x)
+		v0.AddArg(v1)
+		v.AddArg(v0)
+		return true
+	}
 	return false
 }
 func rewriteValueRISCV64_OpRISCV64FLTD(v *Value) bool {
@@ -3724,6 +3797,38 @@
 		v.AddArg(v0)
 		return true
 	}
+	// match: (FLTD x (FMOVDconst [+0x1p-1022]))
+	// result: (SNEZ (ANDI <typ.Int64> [0b00_0011_1111] (FCLASSD x)))
+	for {
+		x := v_0
+		if v_1.Op != OpRISCV64FMOVDconst || auxIntToFloat64(v_1.AuxInt) != +0x1p-1022 {
+			break
+		}
+		v.reset(OpRISCV64SNEZ)
+		v0 := b.NewValue0(v.Pos, OpRISCV64ANDI, typ.Int64)
+		v0.AuxInt = int64ToAuxInt(0b00_0011_1111)
+		v1 := b.NewValue0(v.Pos, OpRISCV64FCLASSD, typ.Int64)
+		v1.AddArg(x)
+		v0.AddArg(v1)
+		v.AddArg(v0)
+		return true
+	}
+	// match: (FLTD (FMOVDconst [-0x1p-1022]) x)
+	// result: (SNEZ (ANDI <typ.Int64> [0b00_1111_1100] (FCLASSD x)))
+	for {
+		if v_0.Op != OpRISCV64FMOVDconst || auxIntToFloat64(v_0.AuxInt) != -0x1p-1022 {
+			break
+		}
+		x := v_1
+		v.reset(OpRISCV64SNEZ)
+		v0 := b.NewValue0(v.Pos, OpRISCV64ANDI, typ.Int64)
+		v0.AuxInt = int64ToAuxInt(0b00_1111_1100)
+		v1 := b.NewValue0(v.Pos, OpRISCV64FCLASSD, typ.Int64)
+		v1.AddArg(x)
+		v0.AddArg(v1)
+		v.AddArg(v0)
+		return true
+	}
 	return false
 }
 func rewriteValueRISCV64_OpRISCV64FMADDD(v *Value) bool {
diff --git a/src/cmd/compile/internal/test/float_test.go b/src/cmd/compile/internal/test/float_test.go
index 7a5e278..00735e3 100644
--- a/src/cmd/compile/internal/test/float_test.go
+++ b/src/cmd/compile/internal/test/float_test.go
@@ -727,6 +727,65 @@
 	}
 }
 
+// minNormal64 is the smallest float64 value that is not subnormal.
+const minNormal64 = 2.2250738585072014e-308
+
+//go:noinline
+func isAbsLessThanMinNormal64(x float64) bool {
+	return math.Abs(x) < minNormal64
+}
+
+//go:noinline
+func isLessThanMinNormal64(x float64) bool {
+	return x < minNormal64
+}
+
+//go:noinline
+func isGreaterThanNegMinNormal64(x float64) bool {
+	return x > -minNormal64
+}
+
+//go:noinline
+func isGreaterThanOrEqualToMinNormal64(x float64) bool {
+	return math.Abs(x) >= minNormal64
+}
+
+func TestSubnormalComparisons(t *testing.T) {
+	tests := []struct {
+		value                  float64
+		isAbsLessThanMinNormal bool
+		isPositive             bool
+		isNegative             bool
+		isNaN                  bool
+	}{
+		{value: math.Inf(1), isPositive: true},
+		{value: math.MaxFloat64, isPositive: true},
+		{value: math.Inf(-1), isNegative: true},
+		{value: -math.MaxFloat64, isNegative: true},
+		{value: math.NaN(), isNaN: true},
+		{value: minNormal64, isPositive: true},
+		{value: minNormal64 / 2, isAbsLessThanMinNormal: true, isPositive: true},
+		{value: -minNormal64, isNegative: true},
+		{value: -minNormal64 / 2, isAbsLessThanMinNormal: true, isNegative: true},
+		{value: 0, isAbsLessThanMinNormal: true, isPositive: true},
+		{value: math.Copysign(0, -1), isAbsLessThanMinNormal: true, isNegative: true},
+	}
+
+	check := func(name string, f func(x float64) bool, value float64, want bool) {
+		got := f(value)
+		if got != want {
+			t.Errorf("%v(%g): want %v, got %v", name, value, want, got)
+		}
+	}
+
+	for _, test := range tests {
+		check("isAbsLessThanMinNormal64", isAbsLessThanMinNormal64, test.value, test.isAbsLessThanMinNormal)
+		check("isLessThanMinNormal64", isLessThanMinNormal64, test.value, test.isAbsLessThanMinNormal || test.isNegative)
+		check("isGreaterThanNegMinNormal64", isGreaterThanNegMinNormal64, test.value, test.isAbsLessThanMinNormal || test.isPositive)
+		check("isGreaterThanOrEqualToMinNormal64", isGreaterThanOrEqualToMinNormal64, test.value, !test.isAbsLessThanMinNormal && !test.isNaN)
+	}
+}
+
 var sinkFloat float64
 
 func BenchmarkMul2(b *testing.B) {
diff --git a/test/codegen/floats.go b/test/codegen/floats.go
index 3942cb5..343f8fa 100644
--- a/test/codegen/floats.go
+++ b/test/codegen/floats.go
@@ -6,6 +6,8 @@
 
 package codegen
 
+import "math"
+
 // This file contains codegen tests related to arithmetic
 // simplifications and optimizations on float types.
 // For codegen tests on integer types, see arithmetic.go.
@@ -277,3 +279,37 @@
 	// riscv64: "MOVD [$]f64.4015ba5e353f7cee"
 	*p = 5.432
 }
+
+// ------------------------ //
+//  Subnormal tests         //
+// ------------------------ //
+
+func isSubnormal(x float64) bool {
+	// riscv64:"FCLASSD" -"FABSD"
+	return math.Abs(x) < 2.2250738585072014e-308
+}
+
+func isNormal(x float64) bool {
+	// riscv64:"FCLASSD" -"FABSD"
+	return math.Abs(x) >= 0x1p-1022
+}
+
+func isPosSubnormal(x float64) bool {
+	// riscv64:"FCLASSD"
+	return x > 0 && x < 2.2250738585072014e-308
+}
+
+func isNegSubnormal(x float64) bool {
+	// riscv64:"FCLASSD"
+	return x < 0 && x > -0x1p-1022
+}
+
+func isPosNormal(x float64) bool {
+	// riscv64:"FCLASSD"
+	return x >= 2.2250738585072014e-308
+}
+
+func isNegNormal(x float64) bool {
+	// riscv64:"FCLASSD"
+	return x <= -2.2250738585072014e-308
+}

Change information

Files:

M src/cmd/compile/internal/ssa/_gen/RISCV64.rules
M src/cmd/compile/internal/ssa/rewriteRISCV64.go
M src/cmd/compile/internal/test/float_test.go
M test/codegen/floats.go

Change size: M

Delta: 4 files changed, 210 insertions(+), 0 deletions(-)

Open in Gerrit

Related details

Attention set is empty

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

satisfied_requirement

open

diffy

Michael Munday (Gerrit)

unread,

Nov 3, 2025, 5:55:24 PM (3 days ago) Nov 3

to goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Michael Munday voted Commit-Queue+1

Commit-Queue