Daniel Martí has uploaded this change for review.
flag: use sorted slices rather than maps in FlagSet
FlagSet needs to keep track of all defined and set flags.
Currently, this is done via two map[string]*Flag fields.
This makes looking up flags particularly fast, needed by FlagSet.Parse.
This patch changes the two fields to use a sorted slice of *Flag.
Flags are now inserted or looked up via binary search instead.
This has multiple benefits:
1) Declaring flags in Var is noticeably cheaper across the board,
as the old mechanism with a map needed a lookup before an insertion
to detect duplicates. The binary search can report a duplicate while
it tries to perform an insertion.
2) Visit and VisitAll, which must loop over flags in sorted order,
no longer need to copy and sort the flags. This makes the methods
become trivially cheap.
3) Maps are only faster than binary search at large sizes, since at
small sizes a slice is more cache-friendly and uses fewer instructions.
In the case of FlagSets, it is far more common to declare tens of
flags than to declare thousands of them; slices are the better choice.
Note that we should care more about the performance of Var than Parse.
A typical Go program always declares all of its flags when it runs;
however, it will typically only parse a minority of the flags.
For example, "go build -tags=foo" declares dozens of flags but parses one.
name old time/op new time/op delta
FlagSet/Var/5-16 1.37µs ± 6% 1.07µs ± 7% -21.97% (p=0.000 n=8+8)
FlagSet/VisitAll/5-16 542ns ± 8% 5ns ± 4% -99.02% (p=0.000 n=8+8)
FlagSet/Parse/5-16 726ns ± 5% 659ns ± 4% -9.24% (p=0.000 n=8+8)
FlagSet/Var/500-16 212µs ± 4% 182µs ± 3% -14.26% (p=0.000 n=8+8)
FlagSet/VisitAll/500-16 124µs ± 2% 2µs ± 0% -98.71% (p=0.001 n=7+7)
FlagSet/Parse/500-16 151µs ± 3% 159µs ± 3% +5.20% (p=0.001 n=7+8)
name old alloc/op new alloc/op delta
FlagSet/Var/5-16 656B ± 0% 656B ± 0% ~ (all equal)
FlagSet/VisitAll/5-16 104B ± 0% 0B -100.00% (p=0.000 n=8+8)
FlagSet/Parse/5-16 256B ± 0% 256B ± 0% ~ (all equal)
FlagSet/Var/500-16 101kB ± 0% 48kB ± 0% -52.70% (p=0.000 n=8+8)
FlagSet/VisitAll/500-16 4.15kB ± 0% 0.00kB -100.00% (p=0.000 n=8+8)
FlagSet/Parse/500-16 61.3kB ± 0% 7.9kB ± 0% -87.06% (p=0.000 n=7+8)
name old allocs/op new allocs/op delta
FlagSet/Var/5-16 12.0 ± 0% 11.0 ± 0% -8.33% (p=0.000 n=8+8)
FlagSet/VisitAll/5-16 3.00 ± 0% 0.00 -100.00% (p=0.000 n=8+8)
FlagSet/Parse/5-16 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=8+8)
FlagSet/Var/500-16 1.02k ± 0% 1.00k ± 0% -1.66% (p=0.000 n=8+8)
FlagSet/VisitAll/500-16 3.00 ± 0% 0.00 -100.00% (p=0.000 n=8+8)
FlagSet/Parse/500-16 22.0 ± 0% 5.0 ± 0% -77.27% (p=0.000 n=8+8)
This also helps cmd/go in a noticeable way, as its dozen or so
sub-commands all have their own sets of flags to be declared.
Moreover, many share multiple flags to declare, such as build flags.
Thus, cmd/go's init spends as much as 11% on flag.FlagSet.Var,
and a ~13% faster Var helps bring a ~1.7% faster cmd/go init:
name old time/op new time/op delta
ExecGoEnv-16 516µs ± 1% 507µs ± 1% -1.68% (p=0.000 n=8+8)
name old sys-time/op new sys-time/op delta
ExecGoEnv-16 4.72ms ± 1% 4.66ms ± 2% -1.28% (p=0.040 n=7+8)
name old user-time/op new user-time/op delta
ExecGoEnv-16 2.25ms ± 3% 2.19ms ± 2% -2.53% (p=0.038 n=8+8)
Change-Id: If3af4d012d4a5e0139e0434c204580aad31ad1ec
---
M src/cmd/go/internal/test/testflag.go
M src/cmd/go/internal/vet/vetflag.go
M src/flag/flag.go
3 files changed, 147 insertions(+), 26 deletions(-)
diff --git a/src/cmd/go/internal/test/testflag.go b/src/cmd/go/internal/test/testflag.go
index 55f6ebf..3215816 100644
--- a/src/cmd/go/internal/test/testflag.go
+++ b/src/cmd/go/internal/test/testflag.go
@@ -28,7 +28,7 @@
func init() {
work.AddBuildFlags(CmdTest, work.OmitVFlag)
- cf := CmdTest.Flag
+ cf := &CmdTest.Flag
cf.BoolVar(&testC, "c", false, "")
cf.StringVar(&testO, "o", "", "")
work.AddCoverFlags(CmdTest, &testCoverProfile)
diff --git a/src/cmd/go/internal/vet/vetflag.go b/src/cmd/go/internal/vet/vetflag.go
index eb7af65..c9b461d 100644
--- a/src/cmd/go/internal/vet/vetflag.go
+++ b/src/cmd/go/internal/vet/vetflag.go
@@ -102,7 +102,7 @@
// also defined as build flags. This works fine, so we omit duplicates here.
// However some, like -x, are known to the build but not to vet.
isVetFlag := make(map[string]bool, len(analysisFlags))
- cf := CmdVet.Flag
+ cf := &CmdVet.Flag
for _, f := range analysisFlags {
isVetFlag[f.Name] = true
if cf.Lookup(f.Name) == nil {
diff --git a/src/flag/flag.go b/src/flag/flag.go
index f6b3890..0b19d17 100644
--- a/src/flag/flag.go
+++ b/src/flag/flag.go
@@ -386,13 +386,69 @@
name string
parsed bool
- actual map[string]*Flag
- formal map[string]*Flag
+ actual sortedFlags
+ formal sortedFlags
args []string // arguments after flags
errorHandling ErrorHandling
output io.Writer // nil means stderr; use Output() accessor
}
+type sortedFlags []*Flag
+
+func (s sortedFlags) find(name string) int {
+ i, j := 0, len(s)
+ if i >= j {
+ return 0
+ }
+loop:
+ {
+ h := int(uint(i+j) >> 1)
+ if s[h].Name < name {
+ i = h + 1
+ } else {
+ j = h
+ }
+ }
+ if i < j {
+ goto loop
+ }
+ return i
+}
+
+func (s sortedFlags) get(name string) *Flag {
+ if i := s.find(name); i < len(s) {
+ if s[i].Name == name {
+ return s[i]
+ }
+ }
+ return nil
+}
+
+func (s *sortedFlags) put(name string, flag *Flag) (duplicate bool) {
+ i := s.find(name)
+ if i < len(*s) {
+ if f := (*s)[i]; f.Name == name {
+ return true // found a duplicate
+ }
+ }
+ // Extend the length by one to make space.
+ // To reduce the number of calls to append,
+ // we start the capacity at 32 and reslice directly where possible.
+ if len(*s) >= cap(*s) {
+ if *s == nil {
+ *s = make([]*Flag, 1, 32)
+ } else {
+ *s = append(*s, nil)
+ }
+ } else {
+ *s = (*s)[:len(*s)+1]
+ }
+ // Shift the flags to the right by one and insert our flag.
+ copy((*s)[i+1:], (*s)[i:])
+ (*s)[i] = flag
+ return false
+}
+
// A Flag represents the state of a flag.
type Flag struct {
Name string // name as it appears on command line
@@ -443,7 +499,7 @@
// VisitAll visits the flags in lexicographical order, calling fn for each.
// It visits all flags, even those not set.
func (f *FlagSet) VisitAll(fn func(*Flag)) {
- for _, flag := range sortFlags(f.formal) {
+ for _, flag := range f.formal {
fn(flag)
}
}
@@ -457,7 +513,7 @@
// Visit visits the flags in lexicographical order, calling fn for each.
// It visits only those flags that have been set.
func (f *FlagSet) Visit(fn func(*Flag)) {
- for _, flag := range sortFlags(f.actual) {
+ for _, flag := range f.actual {
fn(flag)
}
}
@@ -470,29 +526,26 @@
// Lookup returns the Flag structure of the named flag, returning nil if none exists.
func (f *FlagSet) Lookup(name string) *Flag {
- return f.formal[name]
+ return f.formal.get(name)
}
// Lookup returns the Flag structure of the named command-line flag,
// returning nil if none exists.
func Lookup(name string) *Flag {
- return CommandLine.formal[name]
+ return CommandLine.formal.get(name)
}
// Set sets the value of the named flag.
func (f *FlagSet) Set(name, value string) error {
- flag, ok := f.formal[name]
- if !ok {
+ flag := f.formal.get(name)
+ if flag == nil {
return fmt.Errorf("no such flag -%v", name)
}
err := flag.Value.Set(value)
if err != nil {
return err
}
- if f.actual == nil {
- f.actual = make(map[string]*Flag)
- }
- f.actual[name] = flag
+ f.actual.put(name, flag)
return nil
}
@@ -971,7 +1024,7 @@
// Remember the default value as a string; it won't change.
flag := &Flag{name, usage, value, value.String()}
- _, alreadythere := f.formal[name]
+ alreadythere := f.formal.put(name, flag)
if alreadythere {
var msg string
if f.name == "" {
@@ -981,10 +1034,6 @@
}
panic(msg) // Happens only if flags are declared with identical names
}
- if f.formal == nil {
- f.formal = make(map[string]*Flag)
- }
- f.formal[name] = flag
}
// Var defines a flag with the specified name and usage string. The type and
@@ -1056,9 +1105,8 @@
break
}
}
- m := f.formal
- flag, alreadythere := m[name] // BUG
- if !alreadythere {
+ flag := f.formal.get(name)
+ if flag == nil {
if name == "help" || name == "h" { // special case for nice help message.
f.usage()
return false, ErrHelp
@@ -1090,10 +1138,7 @@
return false, f.failf("invalid value %q for flag -%s: %v", value, name, err)
}
}
- if f.actual == nil {
- f.actual = make(map[string]*Flag)
- }
- f.actual[name] = flag
+ f.actual.put(name, flag)
return true, nil
}
To view, visit change 437556. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Bryan Mills, Daniel Martí, Michael Matloob.
1 comment:
Patchset:
Although the code is reasonable, I don't see the motivation, as flag parsing cannot be the bottleneck in any reasonable program. The flag package is simple and straightforward. Does it really need to be as fast as possible? I can't see when it would matter.
It's a philosophical objection: bloat comes in changes like this, changes that are in isolation reasonable but complicate and enlarge the code in increments that over time lead to ever larger binaries. The cost is small for any one change, like this, but they add up, and moreover each such change establishes a precedent and encourages more like it.
I wish to push back on this kind of change.
To view, visit change 437556. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Bryan Mills, Michael Matloob, Rob Pike.
1 comment:
Patchset:
Although the code is reasonable, I don't see the motivation, as flag parsing cannot be the bottlenec […]
I would agree with you if this simply gave a slight speed-up to flag parsing, as we are ending up with about forty more lines than before. However, that is not the case. It mostly helps with declaring flags and visiting all declared flags, which are often costs which are always paid at init time, no matter the amount of flags to be parsed later. See the notes in the commit message about cmd/go, for example, which spends as much as half a millisecond just declaring flags.
As for the code bloat itself - part of it is because I inline sort.Search, just like go/token does at https://cs.opensource.google/go/go/+/refs/tags/go1.19.1:src/go/token/position.go;l=511-532. That can of course be undone when the compiler gets better, and it would save about twenty of the forty lines we're adding. I can certainly add a TODO as a reminder just like go/token does.
I've also found a couple of other cases where a map was used and the requirements were:
Hash maps aren't a good fit for those criteria. I would like for this container to be available as a generic type at some point in the future, perhaps as a "small map" or somesuch. Hash maps areThen this change would be a net gain in terms of lines removed. But I didn't want to open up that debate right now.
To view, visit change 437556. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Daniel Martí, Michael Matloob, Rob Pike.
1 comment:
Commit Message:
3) Maps are only faster than binary search at large sizes, since at
small sizes a slice is more cache-friendly and uses fewer instructions.
In the case of FlagSets, it is far more common to declare tens of
flags than to declare thousands of them; slices are the better choice.
I agree with your observation that N is usually quite small, but for the handful of cases where it's not this could be a significant regression, compared to a constant-factor speedup for the common case: it changes a sequence of `N` calls to `Var` and/or `Set` from O(N) to O(N²).
For a savings of what appears to be on the order of 100µs, I don't think this is worth the risk of regressing outlier programs.
(I'm not so worried about the change from O(1) to O(log N) for `Lookup`, though.)
To view, visit change 437556. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Bryan Mills, Michael Matloob, Rob Pike.
2 comments:
Commit Message:
3) Maps are only faster than binary search at large sizes, since at
small sizes a slice is more cache-friendly and uses fewer instructions.
In the case of FlagSets, it is far more common to declare tens of
flags than to declare thousands of them; slices are the better choice.
I agree with your observation that N is usually quite small, but for the handful of cases where it's […]
I was thinking about the worst case scenario for Var the day after sending this, too. In the best case scenario, one could prepare the calls to Var to have them be in order, then the cost is just a binary search and an append. However, in the worst case scenario, we are indeed copying large parts of the slice over and over again.
I thought about ways to avoid the looped copying, perhaps by only sorting once VisitAll or Parse are called. Alas, Var is documented to reject duplicates, so we need to do a lookup before we insert.
I _think_ that we can't make any easy change to this patch to improve Var's worst case performance. If we think that's a deal breaker, then I'll just have to call this CL a failed experiment :)
Regarding a small N - my thinking with this patch was that the flag package is already geared towards reasonable amounts of flags. For example, the default usage function lists all flags, which will be insane if one has thousands of declared flags. I see the point about the potential regression, though. I had a bit of a mental dilemma about whether or not to mail this CL - and in fact wrote it over a year ago and had temporarily given up on it.
Patchset:
Bryan, would you still be up for reviewing the benchmark itself? Given that cmd/go spends over 10% of its init CPU time in flag.FlagSet.Var, I think it's worth at least measuring why it's (relatively) slow.
To view, visit change 437556. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Bryan Mills, Michael Matloob, Rob Pike.
1 comment:
Commit Message:
3) Maps are only faster than binary search at large sizes, since at
small sizes a slice is more cache-friendly and uses fewer instructions.
In the case of FlagSets, it is far more common to declare tens of
flags than to declare thousands of them; slices are the better choice.
I was thinking about the worst case scenario for Var the day after sending this, too. […]
Writing this comment made me realise that perhaps I could remove the double map use in FlagSet.Var without such a large change. How about https://go-review.googlesource.com/c/go/+/435259? The improvement is now just ~10%, but it's a small change and the code clearly does less in the normal path.
To view, visit change 437556. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Bryan Mills, Michael Matloob, Rob Pike, Russ Cox.
2 comments:
Patchset:
Closing per the O(N^2) behavior on Var. A second attempt, which doesn't remove the use of maps, is at https://go-review.googlesource.com/c/go/+/435259.
This change makes N flag registration operations potentially O(N^2). That seems like a mistake.
You're the second to bring this up, and I'm starting to see that. I'll abandon the CL given this flaw.
You're right that we could cache the list for VisitAll. However, I doubt it would really help in practice. Most flag users declare their flags once, and at most visit all of their flags just once or twice (e.g. to print the usage). So I don't think optimizing for multiple calls to VisitAll helps much, at least as far as the common init cost goes.
To view, visit change 437556. To unsubscribe, or for help writing mail filters, visit settings.
Daniel Martí abandoned this change.
To view, visit change 437556. To unsubscribe, or for help writing mail filters, visit settings.