cmd/compile: treat function parameters as probably constant in string comparison
When comparing strings in a loop like strs[i] == str vs str == strs[i],
the compiler needs to pick which string's length to use as the third argument
to memequal. If one is constant, memequal optimizations can inline the
comparison instead of calling runtime.memequal.
Previously, probablyConstant only recognized local variables (PAUTO) that
were initialized with constant values. This meant function parameters were
not considered probably constant, causing suboptimal code generation
when the parameter operand order differed.
This change extends probablyConstant to also recognize function parameters
(PPARAM) as constant within the function body. This allows the compiler to
generate optimal code regardless of operand order in string comparisons.
Benchmark test code:
func containsA(strs []string, str string) bool {
for i := range strs {
if strs[i] == str { return true }
}
return false
}
func containsB(strs []string, str string) bool {
for i := range strs {
if str == strs[i] { return true }
}
return false
}
Before this change, containsA generated code that used strs[i].len
as the memequal length argument, while containsB used str.len.
This caused approximately 4x performance difference in some benchmarks.
After this change, both functions use str.len (the parameter length)
as the memequal length argument. Benchmark results with various data sizes
(searching for the last element in arrays of 10, 100, 1000, 10000 strings):
name time/op
Small_A_Last-10 20.39ns
Small_B_Last-10 20.51ns
Medium_A_Last-10 235.0ns
Medium_B_Last-10 216.4ns
Large_A_Last-10 2089ns
Large_B_Last-10 2075ns
Huge_A_Last-10 20812ns
Huge_B_Last-10 20862ns
Large_A_NotFound-10 469.2ns
Large_B_NotFound-10 492.3ns
Both functions now generate the same size code (192 bytes) and show
consistent performance across all data sizes, with differences under 5 percent.
Fixes #74471
diff --git a/src/cmd/compile/internal/compare/compare.go b/src/cmd/compile/internal/compare/compare.go
index 11793fc..fb2db67 100644
--- a/src/cmd/compile/internal/compare/compare.go
+++ b/src/cmd/compile/internal/compare/compare.go
@@ -276,15 +276,20 @@
return false
}
name := n.(*ir.Name)
- if name.Class != ir.PAUTO {
- return false
- }
- if def := name.Defn; def == nil {
- // n starts out as the empty string
+ switch name.Class {
+ case ir.PPARAM:
+ // Function parameters are constant within the function body.
+ // Using the parameter's length allows better optimization when
+ // comparing against varying expressions like slice elements.
return true
- } else if def.Op() == ir.OAS && (def.(*ir.AssignStmt).Y == nil || def.(*ir.AssignStmt).Y.Op() == ir.OLITERAL) {
- // n starts out as a constant string
- return true
+ case ir.PAUTO:
+ if def := name.Defn; def == nil {
+ // n starts out as the empty string
+ return true
+ } else if def.Op() == ir.OAS && (def.(*ir.AssignStmt).Y == nil || def.(*ir.AssignStmt).Y.Op() == ir.OLITERAL) {
+ // n starts out as a constant string
+ return true
+ }
}
return false
}
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// Function parameters are constant within the function body.I don't think this is right. Parameters can be assigned to.
I think a better avenue for this optimization is to pull these string conversions out of the walk pass, emit a string equality instruction when we generate SSA and then lower it to actual calls/instructions depending on what we results we get in the prove pass. This is a much larger job however.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
This doesn't make any sense to me. The length of an argument string is *never* constant. (Unless it is possibly reassigned.)
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
if probablyConstant(t) && !probablyConstant(s) {Note that I think this (original) code is largely irrelevant after CL 403735. (That CL will make the argument to memeq a constant if either string length is a constant, and the memeq call is inside the `if len(s)==len(t)` test.)
So maybe we could repurpose this code to be "prefer to use a value from an outer loop" or something like that.
That said, I'm not sure I buy the benchmarks. The CL description claims 4x improvement but then the benchmarks show almost no improvement and some are negative. I can't run them myself, as they are just in the comments, not as real benchmarks in the CL.