Compiler performance with a huge project

9,328 views
Skip to first unread message

Dave Cheney

unread,
Mar 7, 2016, 5:22:36 AM3/7/16
to golang-dev
Hello,

By a quirk of fate, the juju codebase has had to maintain compatibility with go 1.2 since that release. This afternoon I did an experiment compiling and linking our biggest command, 512 packages in total, with the latest release compiler from each minor revision.

The raw results are here, along with a graph on the second tab. The test machine was a thinkpad x220, Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 8gb of ram, everything running out of a ramdisk on /tmp.





I have heard rumours that others are also collecting benchmark data, but they may not have access to such a pathological data set as juju :) I intend to keep this data set current for as long as I have access to this machine. 

Thanks

Dave

Aram Hăvărneanu

unread,
Mar 7, 2016, 5:26:14 AM3/7/16
to Dave Cheney, golang-dev
I bought a new computer to work on Go, because compile times became
unmanageable.

I'm willing to help in bringing the compile performance back as much as I can.

--
Aram Hăvărneanu

Konstantin Shaposhnikov

unread,
Mar 7, 2016, 5:41:37 AM3/7/16
to Aram Hăvărneanu, Dave Cheney, golang-dev

It would be interesting to see size of the created binary on the same graph.

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dave Cheney

unread,
Mar 7, 2016, 5:44:33 AM3/7/16
to Konstantin Shaposhnikov, Aram Hăvărneanu, golang-dev
It's around 80mb. The improvements David Crawshaw made last week have reduced the link times by 1/3rd.

In truth, I don't care about the binary size, there is no way you can compile 512 packages and have a small binary, we've just got far too much code for that.

Rob Pike

unread,
Mar 7, 2016, 6:03:49 AM3/7/16
to Dave Cheney, Konstantin Shaposhnikov, Aram Hăvărneanu, golang-dev
I wish the compilation were much faster too, and have been on the record about that. Still, if two minutes is an unmanageable build time*, you have lived a blessed existence. I'd also be surprised if a new machine made that much of a difference.

Hardware has stopped getting faster. Software has not stopped getting slower.

-rob

* I realize the time and the unmanageable are from different messages. The point still stands.

Dave Cheney

unread,
Mar 7, 2016, 6:46:34 AM3/7/16
to Pierpaolo Bernardi, golang-dev
I hope so, what are your concerns?

On 7 Mar 2016, at 22:43, Pierpaolo Bernardi <olop...@gmail.com> wrote:

Is the graph labeled correctly?


On Mon, Mar 7, 2016 at 11:22 AM, Dave Cheney <da...@cheney.net> wrote:
Hello,

By a quirk of fate, the juju codebase has had to maintain compatibility with go 1.2 since that release. This afternoon I did an experiment compiling and linking our biggest command, 512 packages in total, with the latest release compiler from each minor revision.

The raw results are here, along with a graph on the second tab. The test machine was a thinkpad x220, Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 8gb of ram, everything running out of a ramdisk on /tmp.

<image.png>



I have heard rumours that others are also collecting benchmark data, but they may not have access to such a pathological data set as juju :) I intend to keep this data set current for as long as I have access to this machine. 

Thanks

Dave

David Crawshaw

unread,
Mar 7, 2016, 9:54:12 AM3/7/16
to Dave Cheney, Konstantin Shaposhnikov, Aram Hăvărneanu, golang-dev
There's another 15% linker improvement available for anyone willing to
import Joe's SAIS package into the linker:
https://github.com/golang/go/issues/14648#issuecomment-192550869

Pierpaolo Bernardi

unread,
Mar 7, 2016, 10:35:46 AM3/7/16
to Dave Cheney, golang-dev
Is the graph labeled correctly?

Pierpaolo Bernardi

unread,
Mar 7, 2016, 10:35:47 AM3/7/16
to Dave Cheney, golang-dev
On Mon, Mar 7, 2016 at 12:43 PM, Pierpaolo Bernardi <olop...@gmail.com> wrote:
Is the graph labeled correctly?

Sorry. Please ignore this. I misunderstood the graph.

Dave Cheney

unread,
Mar 7, 2016, 3:39:45 PM3/7/16
to Konstantin Shaposhnikov, Aram Hăvărneanu, golang-dev
Just to follow up, a version of jujud compiled with 1.3.3 

-rwxr-xr-x  1 dfc  dfc  88174297 Mar  7 20:43 jujud

and comparing that to one just compiled with tip

-rwxrwxr-x 1 dfc dfc 79450868 Mar  7 12:44 /home/dfc/bin/jujud

Which looks like 10% reduction in binary size!


On Mon, Mar 7, 2016 at 9:44 PM, Dave Cheney <da...@cheney.net> wrote:

Rob Pike

unread,
Mar 7, 2016, 4:27:46 PM3/7/16
to Dave Cheney, Konstantin Shaposhnikov, Aram Hăvărneanu, golang-dev
Dave C., do the times in that graph include the SSA compiler's internal consistency checks? If so, it is an unfair chart. If not, it is a sad chart.

-rob

Dave Cheney

unread,
Mar 7, 2016, 4:32:50 PM3/7/16
to Rob Pike, Konstantin Shaposhnikov, Aram Hăvărneanu, golang-dev

It's just what is in tip at the time. The only revisions that I believe has the ssa checks enabled are the ones post 1.6.

I intend to keep the chart updated over time.

Rob Pike

unread,
Mar 7, 2016, 4:36:56 PM3/7/16
to Dave Cheney, Konstantin Shaposhnikov, Aram Hăvărneanu, golang-dev
OK, but that means the peak is overly pessimistic, and I believe by a substantial amount.

-rob

Dave Cheney

unread,
Mar 7, 2016, 4:38:56 PM3/7/16
to Rob Pike, Konstantin Shaposhnikov, Aram Hăvărneanu, golang-dev

The peek was go 1.5, things look to be getting better since then.

Brad Fitzpatrick

unread,
Mar 7, 2016, 4:39:40 PM3/7/16
to Rob Pike, Dave Cheney, Konstantin Shaposhnikov, Aram Hăvărneanu, golang-dev
The peak is at Go 1.5.3, not post-Go1.6 SSA.

Rob Pike

unread,
Mar 7, 2016, 4:44:55 PM3/7/16
to Brad Fitzpatrick, Dave Cheney, Konstantin Shaposhnikov, Aram Hăvărneanu, golang-dev
Ah, right. Well, the incipient drop in time should continue then.

-rob

Rob Pike

unread,
Mar 7, 2016, 4:46:42 PM3/7/16
to Brad Fitzpatrick, Dave Cheney, Konstantin Shaposhnikov, Aram Hăvărneanu, golang-dev
Whatever the reason, it's almost a 10X slowdown, much worse than is expected or is usually seen. I wonder if juju tickles some pathological issue.

-rob

Dave Cheney

unread,
Mar 7, 2016, 4:50:22 PM3/7/16
to Rob Pike, Brad Fitzpatrick, Konstantin Shaposhnikov, Aram Hăvărneanu, golang-dev

The line I'm tracking is real time, which grew by 2.5x after go 1.5.

I'll drop reporting the sys and user times on the graph, it is confusing.

Rob Pike

unread,
Mar 7, 2016, 4:58:04 PM3/7/16
to Dave Cheney, Brad Fitzpatrick, Konstantin Shaposhnikov, Aram Hăvărneanu, golang-dev
User time got almost 10X worse. I want to know why.

-rob

Ian Lance Taylor

unread,
Mar 7, 2016, 5:15:00 PM3/7/16
to Rob Pike, Dave Cheney, Brad Fitzpatrick, Konstantin Shaposhnikov, Aram Hăvărneanu, golang-dev
On Mon, Mar 7, 2016 at 1:57 PM, Rob Pike <r...@golang.org> wrote:
> User time got almost 10X worse. I want to know why.

Some of it has to be the concurrent GC.

Ian

Rob Pike

unread,
Mar 7, 2016, 5:18:32 PM3/7/16
to Ian Lance Taylor, Dave Cheney, Brad Fitzpatrick, Konstantin Shaposhnikov, Aram Hăvărneanu, golang-dev
Some of it, yes, but how much? With other tests it appears to cost about 2X.

-rob

hartwig...@gmail.com

unread,
Mar 7, 2016, 5:47:46 PM3/7/16
to golang-dev
If the SSA enabled compiler produces (smaller and) faster binaries, then I think the SSA enabled version from tip should be (re-)compiled by itself to see the best results...? (In addition to disabling consistency checks)

Dave Cheney

unread,
Mar 7, 2016, 5:51:06 PM3/7/16
to hartwig...@gmail.com, golang-dev
On Tue, Mar 8, 2016 at 9:18 AM, <hartwig...@gmail.com> wrote:
If the SSA enabled compiler produces (smaller and) faster binaries, then I think the SSA enabled version from tip should be (re-)compiled by itself to see the best results...?

It already is. 
 
(In addition to disabling consistency checks)

Dave Cheney

unread,
Mar 7, 2016, 6:10:57 PM3/7/16
to Rob Pike, Ian Lance Taylor, Brad Fitzpatrick, Konstantin Shaposhnikov, Aram Hăvărneanu, golang-dev
I've pushed the code into a stand alone repo so that others can verify my results, or just run your own benchmarks.


Linking github.com/juju/juju/cmd/jujud is also a great benchmark for the linker.

Thanks

Dave

andrewc...@gmail.com

unread,
Mar 7, 2016, 6:28:15 PM3/7/16
to golang-dev, r...@golang.org, ia...@golang.org, brad...@golang.org, k.shapo...@gmail.com, ara...@mgk.ro
iirc there were reports from people where the build time went up disproportionately high around 1.5,  Here is one I remember: https://groups.google.com/forum/#!searchin/golang-nuts/build$20time$201.5/golang-nuts/39jJ8sW-qVY/m-5lI_f_VssJ . Interesting the factor slowdown is similar, though maybe totally unrelated.

Anyone other than Dave actively looking into a cause?

Rob Pike

unread,
Mar 7, 2016, 6:53:06 PM3/7/16
to Andrew Chambers, golang-dev, Ian Lance Taylor, Brad Fitzpatrick, Konstantin Shaposhnikov, Aram Hăvărneanu
I just did a simple check using a completely stable, unedited program of a few tens of thousands of lines. Here are the results (go build is running at today's tip):

% time go build -gcflags=-ssa=0 

real 0m4.585s
user 0m5.570s
sys 0m1.027s
% time go build -gcflags=-ssa=1

real 0m5.778s
user 0m7.201s
sys 0m1.102s
% time go build -gcflags='-ssa=1 -d=ssa/check/off'

real 0m5.371s
user 0m6.707s
sys 0m1.137s
% time go1.4 build

real 0m2.631s
user 0m2.286s
sys 0m0.670s

As you can see, the 2X number sounds about right. Cheney's 10X seems pathological.

SSA costs about 20% relative to the old compiler.

All those factors ignore whatever setup and linking overhead there might be and assume it's all in the compiler.


-rob

Dave Cheney

unread,
Mar 7, 2016, 6:58:18 PM3/7/16
to Rob Pike, Andrew Chambers, golang-dev, Ian Lance Taylor, Brad Fitzpatrick, Konstantin Shaposhnikov, Aram Hăvărneanu
I don't know where this 10x increase in user time you quote is coming from

version 1.4.3real00:00:43
user00:01:39
sys00:00:17
version 1.5.3real00:02:19
user00:07:05
sys00:00:24
version 1.6real00:02:09
user00:06:35
sys00:00:20
rev 6ed1038real00:02:11
user00:06:39
sys00:00:19

I count that as 4.6x increase in user time going from go 1.4.3 to 1.5.3.

Rob Pike

unread,
Mar 7, 2016, 7:00:23 PM3/7/16
to Dave Cheney, Andrew Chambers, golang-dev, Ian Lance Taylor, Brad Fitzpatrick, Konstantin Shaposhnikov, Aram Hăvărneanu
I am looking at real time, because that is what we wait for.

-rob

Dave Cheney

unread,
Mar 7, 2016, 7:02:20 PM3/7/16
to Rob Pike, Andrew Chambers, golang-dev, Ian Lance Taylor, Brad Fitzpatrick, Konstantin Shaposhnikov, Aram Hăvărneanu
I'm sorry, I still don't see a 10x increase in real time. From 43 seconds (go 1.4.3) to 139 seconds (go 1.5.3) is just over 3x.

Devon H. O'Dell

unread,
Mar 7, 2016, 7:05:54 PM3/7/16
to Rob Pike, Dave Cheney, Andrew Chambers, golang-dev, Ian Lance Taylor, Brad Fitzpatrick, Konstantin Shaposhnikov, Aram Hăvărneanu
But the real time quotient is even less: 43 seconds in 1.4.3 to a max of 139 seconds (2:19) in 1.5.3, which is just 3.2x (and 1.6 and tip are less than that). Or what am I missing?

Rob Pike

unread,
Mar 7, 2016, 8:01:05 PM3/7/16
to Dave Cheney, Andrew Chambers, golang-dev, Ian Lance Taylor, Brad Fitzpatrick, Konstantin Shaposhnikov, Aram Hăvărneanu
However I am misinterpreting the graph, there are numbers on it that are worse than I can reproduce in my own code base, which means juju is causing worse performance.

And yes, it's user time I'm upset by on the graph, but real time in real life. I have been unclear.

However you slice it, things are too slow.

-rob

Dave Cheney

unread,
Mar 7, 2016, 8:02:54 PM3/7/16
to Rob Pike, Andrew Chambers, golang-dev, Ian Lance Taylor, Brad Fitzpatrick, Konstantin Shaposhnikov, Aram Hăvărneanu
On Tue, Mar 8, 2016 at 12:00 PM, Rob Pike <r...@golang.org> wrote:
However I am misinterpreting the graph, there are numbers on it that are worse than I can reproduce in my own code base, which means juju is causing worse performance.

How many packages are there in your example set ? How deeply are they nested ? How wide does the dependency graph fan out ? Juju does all three of these with aplomb.

Rob Pike

unread,
Mar 7, 2016, 8:27:33 PM3/7/16
to Dave Cheney, Andrew Chambers, golang-dev, Ian Lance Taylor, Brad Fitzpatrick, Konstantin Shaposhnikov, Aram Hăvărneanu
107 packages from multiple places, nothing too dramatic, but we should be talking about juju not my test that isn't so bad.

-rob

Rob Pike

unread,
Mar 7, 2016, 9:26:48 PM3/7/16
to Dave Cheney, Andrew Chambers, golang-dev, Ian Lance Taylor, Brad Fitzpatrick, Konstantin Shaposhnikov, Aram Hăvărneanu
The large spike in user over real is perhaps explained by large memory, which would take lots of GC time in Go (and none in 1.4, in C), that is compensated for by more CPUs being on by default (done partly to make up the difference).

If that hypothesis is correct, we need to find the few files whose compilations are causing so much memory to be consumed.

-rob

Konstantin Shaposhnikov

unread,
Mar 7, 2016, 10:20:53 PM3/7/16
to Rob Pike, Aram Hăvărneanu, Dave Cheney, Andrew Chambers, Brad Fitzpatrick, Ian Lance Taylor, golang-dev

FYI turning off GC can actually decrease compilation times (especially user time):

$ go version

go version go1.6 linux/amd64

$ time go build -a std

real 0m6.569s
user 0m27.592s
sys 0m2.719s

$ time GOGC=off go build -a std

real 0m5.583s
user 0m12.872s
sys 0m3.597s

Rob Pike

unread,
Mar 7, 2016, 10:44:24 PM3/7/16
to Konstantin Shaposhnikov, Aram Hăvărneanu, Dave Cheney, Andrew Chambers, Brad Fitzpatrick, Ian Lance Taylor, golang-dev
This is well known, and is believed to be most of the cause of the advertised 2X slowdown when the compiler was moved from C to Go. But juju is seeing a larger factor.

-rob

Dave Cheney

unread,
Mar 7, 2016, 10:51:03 PM3/7/16
to Rob Pike, Konstantin Shaposhnikov, Aram Hăvărneanu, Andrew Chambers, Brad Fitzpatrick, Ian Lance Taylor, golang-dev
The user time going from 1.4.3 is 99 seconds to 429 seconds, roughly a 4.5x increase. Given that there are now 4x Go processes, each running their own gc, which thinks it has exclusive access to all four logical processes, this does not seem surprising to me. 

David Chase

unread,
Mar 7, 2016, 11:19:43 PM3/7/16
to Dave Cheney, Rob Pike, Konstantin Shaposhnikov, Aram Hăvărneanu, Andrew Chambers, Brad Fitzpatrick, Ian Lance Taylor, golang-dev
Only 4 cores seems kinda wimpy :-).

Rob Pike

unread,
Mar 8, 2016, 2:01:25 AM3/8/16
to Dave Cheney, Konstantin Shaposhnikov, Aram Hăvărneanu, Andrew Chambers, Brad Fitzpatrick, Ian Lance Taylor, golang-dev
That seems surprising to me, since other significant programs do not see a 4.5x increase.

I'll stop belaboring the obvious but for one last time: juju is pushing harder than other programs and it may be possible both to understand how and to do something about it. Isn't that why you posted the graph?

Where is the CPU time going? Measure it somehow, profile it, run top while building, use time -a, whatever it takes. Try not linking. Try only linking. Try building individual packages. I don't know what to try but I know there's lots that has not been reported being tried.

-rob

Dave Cheney

unread,
Mar 8, 2016, 2:39:20 AM3/8/16
to Rob Pike, Konstantin Shaposhnikov, Aram Hăvărneanu, Andrew Chambers, Brad Fitzpatrick, Ian Lance Taylor, golang-dev
I don't believe Juju is atypical. It's only notable quality is the
girth of the dependency graph that goes into a single binary, which
makes it useful because it brings compile and link times into human
scales.

As an experiment I built a test set from another large open source
project and ran the same tests for a command with 304 dependent
packages.

1.4.3

real 0m29.428s
user 1m19.825s
sys 0m10.983s


real 0m31.618s
user 1m26.436s
sys 0m11.854s


real 0m31.474s
user 1m26.892s
sys 0m12.298s

tip (beabd872f43)

real 1m36.218s
user 5m21.163s
sys 0m12.090s


real 1m38.975s
user 5m31.432s
sys 0m12.091s


real 1m39.403s
user 5m33.396s
sys 0m12.276s

This shows the same 3x increase in real time, and a smaller 3.7x
increase, vs the 4.5x for benchjuju, in user time.

You can find the dataset here,
https://github.com/davecheney/benchkube. I have no preference which
data set to use, I think they are both good data sets for benchmarking
the compiler.

Thanks

Dave
>>>>>>>>>>> sys00:00:19

andrewc...@gmail.com

unread,
Mar 8, 2016, 3:11:20 AM3/8/16
to golang-dev, r...@golang.org, k.shapo...@gmail.com, ara...@mgk.ro, andrewc...@gmail.com, brad...@golang.org, ia...@golang.org
Some more numbers with gc on and off for me. On tip I tried playing with the ssa check flags, but it didn't make any difference for me, maybe i was misusing the flags though.

1.6:

ac@black:~/src/benchjuju$ ./benchjuju.bash
go version go1.6 linux/amd64

real    1m0.099s
user    2m45.420s
sys    0m6.476s


GOGC=off
ac@black:~/src/benchjuju$ ./benchjuju.bash
go version go1.6 linux/amd64

real    0m38.019s
user    1m22.640s
sys    0m9.468s

tip:
ac@black:~/src/benchjuju$ ./benchjuju.bash
go version devel +beabd87 Tue Mar 8 06:02:15 2016 +0000 linux/amd64

real    1m2.182s
user    2m48.584s
sys    0m6.092s


GOGC=off
go version devel +beabd87 Tue Mar 8 06:02:15 2016 +0000 linux/amd64

real    0m43.500s
user    1m39.920s
sys    0m9.196s

Rob Pike

unread,
Mar 8, 2016, 3:33:39 AM3/8/16
to Andrew Chambers, golang-dev, Konstantin Shaposhnikov, Aram Hăvărneanu, Brad Fitzpatrick, Ian Lance Taylor
Yes, we have established that a full build takes longer, but does any component of the build add disproportionately? Compiling? Linking? One big package? Many packages? These broad numbers (which I admit I contributed to) only confirm the existence of an issue, they do nothing to narrow down what it might be.

Since juju shows the worst ratio, I think that's the one to look at. We expect everything to be linear, but something is clearly not.

-rob

David Crawshaw

unread,
Mar 8, 2016, 7:31:39 AM3/8/16
to Rob Pike, Andrew Chambers, golang-dev, Konstantin Shaposhnikov, Aram Hăvărneanu, Brad Fitzpatrick, Ian Lance Taylor
Linking juju is now ~5 seconds, so I think it is safe to focus on the compiler.

For the compiler, it turns out pprof can merge profiles, which means
that adding:

gcargs = append(gcargs, "-cpuprofile", ofile+".prof")

to func (gcToolchain) gc in cmd/go, then building juju with -work
gives you a set of profiles you can merge with:

go tool pprof -proto -output=/tmp/juju.prof
../pkg/tool/darwin_amd64/compile $(find $WORK -name \*.prof)

It looks something like the the attached SVG. Definitely GC-dominated,
but the graph gives up several interesting compiler functions.
juju.svg

David Chase

unread,
Mar 8, 2016, 8:18:54 AM3/8/16
to David Crawshaw, Rob Pike, Andrew Chambers, golang-dev, Konstantin Shaposhnikov, Aram Hăvărneanu, Brad Fitzpatrick, Ian Lance Taylor
checkFunc is almost 10% of compile, and that goes away eventually.

Might be interesting to see if we could shave a hair off visitBottomUp if we reserved a field for its marker instead of expecting a map, or if we (like SSA) came up with a node-numberer so that we could use a slice instead of a map.

We could go another round of optimization on CSE -- right now it uses comparison to form its partitions, perhaps a hash table would let it go a little faster.

to...@tneal.org

unread,
Mar 8, 2016, 8:26:28 AM3/8/16
to golang-dev, r...@golang.org, andrewc...@gmail.com, k.shapo...@gmail.com, ara...@mgk.ro, brad...@golang.org, ia...@golang.org

On Tuesday, March 8, 2016 at 6:31:39 AM UTC-6, David Crawshaw wrote:

It looks something like the the attached SVG. Definitely GC-dominated,
but the graph gives up several interesting compiler functions.


It looks like this SVG represents compilation with the SSA consistency checking turned on, notice ssa.checkFunc is called out on the bottom left. checkFunc runs after every SSA pass and is both fairly expensive and allocates a good amount.  You might get some very different numbers with it disabled.

- Todd
 

Egon Elbre

unread,
Mar 8, 2016, 8:30:27 AM3/8/16
to golang-dev, r...@golang.org, andrewc...@gmail.com, k.shapo...@gmail.com, ara...@mgk.ro, brad...@golang.org, ia...@golang.org
Here's my 5 cents. I ran Dave's benchmarks in Windows with Process Monitor, it gives a nice output on how much time something takes. Unfortunately I didn't find a way to nicely export it, anyways (there should be some program that allows to get something similar on Linux):

Here's what I saw about compile.exe

benchkube:


benchjuju:



juju anomalies:

github.com\juju\juju\state compile took 11s

kubernetes anomalies:

github.com\mesos\mesos-go 12s
k8s.io\kubernetes\pkg\api\v1.a 11s
k8s.io\kubernetes\pkg\api 8s

+ Egon

go version devel +beabd87 Tue Mar 8 06:02:15 2016 +0000 windows/amd64

Egon Elbre

unread,
Mar 8, 2016, 9:58:12 AM3/8/16
to golang-dev, r...@golang.org, andrewc...@gmail.com, k.shapo...@gmail.com, ara...@mgk.ro, brad...@golang.org, ia...@golang.org

Russ Cox

unread,
Mar 8, 2016, 10:00:01 AM3/8/16
to Dave Cheney, Rob Pike, Konstantin Shaposhnikov, Aram Hăvărneanu, Andrew Chambers, Brad Fitzpatrick, Ian Lance Taylor, golang-dev
This is a great discussion. Thanks very much for kicking it off Dave, but even more so for preparing benchjuju and benchkube. Those will be really helpful test cases. (I already used benchjuju last night for CL 20334.)

Russ

Austin Clements

unread,
Mar 8, 2016, 10:58:12 AM3/8/16
to David Crawshaw, golang-dev, Rick Hudson
Interesting. This profile is GC-dominated as you said, but not in the usual way. 40% of the time is spent in mcentral.grow, which does basically nothing except construct a free list in a span. I would think that time is coming from page faults demand allocating as the heap grows (mcentral.grow is the first hit on fresh pages), but that disagrees with the relatively low system time everyone has been reporting. Plus, unless the heap is significantly larger than it was in 1.4, I would expect the total page fault cost to be roughly the same. It's also possible this is a NUMA thing (https://golang.org/issue/14406), but again I would expect high system time in that case.

If this is in fact just the cost of initializing the free lists, Rick's work on dev.garbage should completely eliminate that hot spot. (+Rick)

On Tue, Mar 8, 2016 at 7:31 AM, David Crawshaw <craw...@golang.org> wrote:

Konstantin Shaposhnikov

unread,
Mar 8, 2016, 10:58:53 AM3/8/16
to Egon Elbre, golang-dev, Rob Pike, Andrew Chambers, Aram Hăvărneanu, Brad Fitzpatrick, Ian Lance Taylor
github.com/juju/govmomi/vim25/types is ~1.5Mb of source code, the biggest file types.go is 1.2Mb, 45,000 lines.

Here is profile that I've got using with GC and SSA check disabled:

GOGC=off go build -gcflags "-cpuprofile go.prof -d=ssa/check/off" github.com/juju/govmomi/vim25/types 

go version devel +7529701 Tue Mar 8 15:30:26 2016 +0000 linux/amd64

(pprof) top 20
3690ms of 8190ms total (45.05%)
Dropped 329 nodes (cum <= 40.95ms)
Showing top 20 nodes out of 246 (cum >= 270ms)
      flat  flat%   sum%        cum   cum%
     740ms  9.04%  9.04%      790ms  9.65%  runtime.(*mcentral).grow
     560ms  6.84% 15.87%     1940ms 23.69%  runtime.mallocgc
     430ms  5.25% 21.12%      430ms  5.25%  cmd/compile/internal/gc.testdclstack
     330ms  4.03% 25.15%      330ms  4.03%  runtime.heapBitsSetType
     210ms  2.56% 27.72%      210ms  2.56%  runtime.aeshashbody
     190ms  2.32% 30.04%      190ms  2.32%  runtime.memmove
     130ms  1.59% 31.62%      590ms  7.20%  runtime.mapassign1
     120ms  1.47% 33.09%      590ms  7.20%  cmd/compile/internal/ssa.(*regAllocState).regalloc
     120ms  1.47% 34.55%      150ms  1.83%  runtime.mapaccess1_faststr
     100ms  1.22% 35.78%      290ms  3.54%  runtime.evacuate
     100ms  1.22% 37.00%     1390ms 16.97%  runtime.newobject
      90ms  1.10% 38.10%       90ms  1.10%  runtime.duffzero
      90ms  1.10% 39.19%       90ms  1.10%  runtime.memclr
      80ms  0.98% 40.17%       80ms  0.98%  cmd/compile/internal/ssa.copyelimValue
      70ms  0.85% 41.03%      240ms  2.93%  cmd/internal/obj/x86.span6
      70ms  0.85% 41.88%      480ms  5.86%  fmt.(*pp).doPrintf
      70ms  0.85% 42.74%      130ms  1.59%  runtime.mapaccess1
      70ms  0.85% 43.59%      690ms  8.42%  runtime.newarray
      60ms  0.73% 44.32%       60ms  0.73%  cmd/compile/internal/gc.Prog
      60ms  0.73% 45.05%      270ms  3.30%  cmd/compile/internal/gc.typecheck1

Memory allocation seems to be quite expensive.

pprof001.svg

Josh Bleecher Snyder

unread,
Mar 8, 2016, 3:26:03 PM3/8/16
to Austin Clements, David Crawshaw, golang-dev, Rick Hudson
> Interesting. This profile is GC-dominated as you said, but not in the usual
> way. 40% of the time is spent in mcentral.grow, which does basically nothing
> except construct a free list in a span. I would think that time is coming
> from page faults demand allocating as the heap grows (mcentral.grow is the
> first hit on fresh pages), but that disagrees with the relatively low system
> time everyone has been reporting. Plus, unless the heap is significantly
> larger than it was in 1.4, I would expect the total page fault cost to be
> roughly the same. It's also possible this is a NUMA thing
> (https://golang.org/issue/14406), but again I would expect high system time
> in that case.
>
> If this is in fact just the cost of initializing the free lists, Rick's work
> on dev.garbage should completely eliminate that hot spot. (+Rick)

When this was last discussed
(https://groups.google.com/forum/#!topic/golang-dev/ORLpF3EFNAk), I
did the experiment that Keith suggested in that thread and found that
the actual freelist construction is about 5% of CPU, not 40-50%.

However, it also isn't just the cost of allocation and page faults.
Near the end of the 1.5 cycle, Russ dramatically shrunk gc.Node and
thus total allocation, but the compiler didn't get much faster. To
confirm this (the other direction), I just tried adding a `waste
[1024]byte` field to the beginning of gc.Node. See below for raw
output. Max rss went up 3.84x, but real time only went up 1.3x. The
compilebench numbers tell an every stranger story--allocation size up
~60%, but compile times up only ~10%.

Something doesn't line up here--or at least, I don't get it. It's not
freelist construction, but it's also not just the allocation/page
faults. So...where is that time going?

-josh


BEFORE

$ /usr/bin/time -l go build -a cmd/compile
21.16 real 42.91 user 2.13 sys
293007360 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
562976 page reclaims
0 page faults
0 swaps
0 block input operations
38 block output operations
0 messages sent
0 messages received
92 signals received
24379 voluntary context switches
21262 involuntary context switches
$ compilebench -alloc
BenchmarkTemplate 1 497372696 ns/op 90749704 B/op 1192427 allocs/op
BenchmarkGoTypes 1 1654796540 ns/op 321811744 B/op 3677910 allocs/op
BenchmarkCompiler 1 6103388695 ns/op 1137140632 B/op 14453247 allocs/op

AFTER

$ /usr/bin/time -l go build -a cmd/compile
27.65 real 44.58 user 4.02 sys
1126416384 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
1477640 page reclaims
0 page faults
0 swaps
0 block input operations
62 block output operations
0 messages sent
0 messages received
92 signals received
36955 voluntary context switches
23861 involuntary context switches
$ compilebench -alloc
BenchmarkTemplate 1 556708260 ns/op 171546368 B/op 1194963 allocs/op
BenchmarkGoTypes 1 1770277297 ns/op 529007048 B/op 3685483 allocs/op
BenchmarkCompiler 1 6631047305 ns/op 1847756392 B/op 14478548 allocs/op

Dave Cheney

unread,
Mar 8, 2016, 3:27:47 PM3/8/16
to golang-dev
Thanks Egon,

On my list today was to modify my local cmd/go to output build times per package so I could sort them. I'll report back when I have done that.

--

Russ Cox

unread,
Mar 8, 2016, 3:35:02 PM3/8/16
to Dave Cheney, golang-dev
On Tue, Mar 8, 2016 at 3:27 PM, Dave Cheney <da...@cheney.net> wrote:
Thanks Egon,
On my list today was to modify my local cmd/go to output build times per package so I could sort them. I'll report back when I have done that.

go build -toolexec time, right? :-)

Russ

Dave Cheney

unread,
Mar 8, 2016, 3:42:13 PM3/8/16
to Russ Cox, golang-dev
/me slaps forehead

go build -toolexec="/usr/bin/time -f '%e %U %S %C'"
github.com/juju/juju/cmd/jujud

with some post processing looks promising, I'll report back with the results.

Dave Cheney

unread,
Mar 8, 2016, 4:21:40 PM3/8/16
to Russ Cox, golang-dev
https://gist.github.com/davecheney/401ee7514927887683cb

^ tip (0c7ccbf6) fresh this morning

go1.4 does not support -toolexec, so I'll bodge something in and do
the same test there.

andrewc...@gmail.com

unread,
Mar 8, 2016, 6:37:10 PM3/8/16
to golang-dev, r...@golang.org
Thanks, that is interesting. Lines of code per second in github.com/juju/juju/state and github.com/juju/govmomi/vim25/types are around 5k per second here, where the total average for me is 16k lines per second. types is really a single huge file, whereas state is just lots of smaller files totalling ~50k lines each,

There is 1 million lines of non test code taking 1 minute of real time, and these packages are 100k total. Combined they take 20 seconds, or 1/3 of the build time for 1/10 the code. It would be great to see some profiles of those two packages if someone has time.

Dave Cheney

unread,
Mar 8, 2016, 6:41:10 PM3/8/16
to Andrew Chambers, golang-dev
The compiler, go tool compile (or -gcflags) has support for generating
a profile directly, Have a look at that if you have time.

Konstantin Shaposhnikov

unread,
Mar 8, 2016, 7:03:06 PM3/8/16
to Andrew Chambers, Russ Cox, golang-dev


> There is 1 million lines of non test code taking 1 minute of real time, and these packages are 100k total. Combined they take 20 seconds, or 1/3 of the build time for 1/10 the code. It would be great to see some profiles of those two packages if someone has time.

Not sure if you have seen it but I've sent some profiling results of compiling types package earlier to this thread.

>
> On Wednesday, March 9, 2016 at 10:21:40 AM UTC+13, Dave Cheney wrote:
>>
>> https://gist.github.com/davecheney/401ee7514927887683cb
>>
>> ^ tip (0c7ccbf6) fresh this morning
>>
>> go1.4 does not support -toolexec, so I'll bodge something in and do
>> the same test there.
>>
>> On Wed, Mar 9, 2016 at 7:42 AM, Dave Cheney <da...@cheney.net> wrote:
>> > /me slaps forehead
>> >
>> > go build -toolexec="/usr/bin/time -f '%e %U %S %C'"
>> > github.com/juju/juju/cmd/jujud
>> >
>> > with some post processing looks promising, I'll report back with the results.
>> >
>> > On Wed, Mar 9, 2016 at 7:34 AM, Russ Cox <r...@golang.org> wrote:
>> >> On Tue, Mar 8, 2016 at 3:27 PM, Dave Cheney <da...@cheney.net> wrote:
>> >>>
>> >>> Thanks Egon,
>> >>> On my list today was to modify my local cmd/go to output build times per
>> >>> package so I could sort them. I'll report back when I have done that.
>> >>
>> >>
>> >> go build -toolexec time, right? :-)
>> >>
>> >> Russ
>

Dave Cheney

unread,
Mar 9, 2016, 3:25:16 AM3/9/16
to Konstantin Shaposhnikov, Andrew Chambers, Russ Cox, golang-dev

The vim25/types package is odd, it contains 5200+ func init()'s, which is part of the autogenerated nature of the code (I'm thinking back to what bwk said about generated code being the meanest to compile).

Dave Cheney

unread,
Mar 9, 2016, 4:23:33 AM3/9/16
to Konstantin Shaposhnikov, Andrew Chambers, Russ Cox, golang-dev
I wrote a quick package to generate mock data the resembled the
strange form of the vim25/types package; many types, each with its own
init function.

https://godoc.org/github.com/davecheney/genpkg

The results are pretty comparable, n=5000 creates a package that compiles thusly

lucky(~/src/genpkg) % time go1.4 build ./x
real 0m2.165s
user 0m1.986s
sys 0m0.180s

lucky(~/src/genpkg) % time go build ./x
real 0m6.724s
user 0m9.556s
sys 0m0.161s

Thinking that it was the thousands of init functions, I added a
-noinitfn flag to the generator to have it emit one giant map
initialiser and the results were the compiler blew out to 4gb before I
killed it. Running with a smaller value revealed this

lucky(~/src/genpkg) % go run gen.go -n 1000 > x/types.go
lucky(~/src/genpkg) % time go1.4 build ./x
real 0m0.316s
user 0m0.273s
sys 0m0.044s

lucky(~/src/genpkg) % time go build ./x
real 0m1.456s
user 0m1.816s
sys 0m0.122s

lucky(~/src/genpkg) % go run gen.go -n 1000 -noinitfn > x/types.go
lucky(~/src/genpkg) % time go1.4 build ./x
real 0m0.618s
user 0m0.575s
sys 0m0.047s

lucky(~/src/genpkg) % time go build ./x
real 0m8.491s
user 0m10.595s
sys 0m0.338s

Good hunting!

Thanks

Dave

Bjørn Erik Pedersen

unread,
Mar 9, 2016, 5:38:44 AM3/9/16
to golang-dev
This is from Travis for Hugo (https://github.com/spf13/hugo):


linux/amd64darwin/amd64
1.4.31.1622.111
1.5.32.4173.846
1.62.5213.758
aa3650f1.7112.346

All real times.

https://travis-ci.org/spf13/hugo/builds/114741729

Not sure about the spec of the hardware, but happy to see the result of the latest Go.

bep

mandag 7. mars 2016 11.22.36 UTC+1 skrev Dave Cheney følgende:
Hello,

By a quirk of fate, the juju codebase has had to maintain compatibility with go 1.2 since that release. This afternoon I did an experiment compiling and linking our biggest command, 512 packages in total, with the latest release compiler from each minor revision.

The raw results are here, along with a graph on the second tab. The test machine was a thinkpad x220, Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 8gb of ram, everything running out of a ramdisk on /tmp.





I have heard rumours that others are also collecting benchmark data, but they may not have access to such a pathological data set as juju :) I intend to keep this data set current for as long as I have access to this machine. 

Thanks

Dave

Dave Cheney

unread,
Mar 9, 2016, 5:42:54 AM3/9/16
to Bjørn Erik Pedersen, golang-dev
Thank you for posting your results. I think for accuracy you should delete $GOPATH/pkg before issuing time go build, and also add -v so you can verify that all the packages you expect are being compiled.

Bjørn Erik Pedersen

unread,
Mar 9, 2016, 5:57:53 AM3/9/16
to golang-dev, bjorn.eri...@gmail.com
You are right, the numbers below seems to be for hugo binary only. Will have a look at how to clear the cache, rm -rf $GOPATH/pkg didn't do the trick.

bep

David Chase

unread,
Mar 9, 2016, 12:21:46 PM3/9/16
to Bjørn Erik Pedersen, golang-dev
Playing with genpkg on tip:

go run gen.go -n 3000  -noinitfn > x/types.go
time go build  -gcflags='-d=ssa/check/off -memprofile=x.mprof' ./x
go tool pprof -alloc_space /Users/drchase/GoogleDrive/work/go-ssa/pkg/tool/darwin_amd64/compile x/x.mprof

So it happens that check is a pig and I haven't looked into that yet, but...

Showing top 30 nodes out of 40 (cum >= 1810.24MB)
      flat  flat%   sum%        cum   cum%
 1819.24MB 20.43% 20.43%  1819.24MB 20.43%  cmd/compile/internal/ssa.(*Func).newValue
  947.75MB 10.64% 31.08%  2753.49MB 30.93%  cmd/compile/internal/gc.(*state).lookupVarOutgoing
  947.44MB 10.64% 41.72%   947.44MB 10.64%  cmd/compile/internal/ssa.(*stackAllocState).buildInterferenceGraph
  638.83MB  7.18% 48.89%   642.34MB  7.21%  cmd/compile/internal/ssa.(*stackAllocState).stackalloc
  625.34MB  7.02% 55.92%  1041.10MB 11.69%  cmd/compile/internal/ssa.(*regAllocState).init
  581.15MB  6.53% 62.45%   649.08MB  7.29%  cmd/compile/internal/gc.genssa
  525.89MB  5.91% 68.35%   526.62MB  5.91%  cmd/compile/internal/ssa.cse
  482.10MB  5.41% 73.77%   482.10MB  5.41%  cmd/compile/internal/ssa.schedule
  460.90MB  5.18% 78.94%  2667.34MB 29.96%  cmd/compile/internal/ssa.(*regAllocState).regalloc
  416.07MB  4.67% 83.62%  1476.37MB 16.58%  cmd/compile/internal/ssa.(*stackAllocState).init
  415.25MB  4.66% 88.28%   415.77MB  4.67%  cmd/compile/internal/ssa.(*regAllocState).computeLive
  312.07MB  3.51% 91.79%   312.07MB  3.51%  cmd/compile/internal/ssa.(*Func).newSparseSet
  169.05MB  1.90% 93.68%   169.05MB  1.90%  cmd/compile/internal/ssa.tighten
  111.31MB  1.25% 94.93%   112.86MB  1.27%  cmd/compile/internal/ssa.(*stackAllocState).computeLive
   85.67MB  0.96% 95.90%    85.67MB  0.96%  cmd/compile/internal/ssa.(*Func).setHome
   66.58MB  0.75% 96.64%    66.58MB  0.75%  cmd/compile/internal/ssa.liveValues
   59.52MB  0.67% 97.31%    59.52MB  0.67%  cmd/compile/internal/gc.Prog
   45.01MB  0.51% 97.82%    45.01MB  0.51%  cmd/compile/internal/gc.Nod

and

Showing top 30 nodes out of 40 (cum >= 169.05MB)
      flat  flat%   sum%        cum   cum%
         0     0%     0%  8902.90MB   100%  cmd/compile/internal/amd64.Main
         0     0%     0%  8902.90MB   100%  cmd/compile/internal/gc.Main
         0     0%     0%  8902.90MB   100%  main.main
         0     0%     0%  8902.90MB   100%  runtime.main
         0     0%     0%  8902.40MB   100%  runtime.goexit
         0     0%     0%  8815.58MB 99.01%  cmd/compile/internal/gc.funccompile
         0     0%     0%  8812.58MB 98.98%  cmd/compile/internal/gc.fninit
    0.50MB 0.0056% 0.0056%  8806.29MB 98.91%  cmd/compile/internal/gc.compile
         0     0% 0.0056%  8096.44MB 90.94%  cmd/compile/internal/gc.buildssa
         0     0% 0.0056%  5310.90MB 59.65%  cmd/compile/internal/ssa.Compile
         0     0% 0.0056%  3708.45MB 41.65%  cmd/compile/internal/ssa.regalloc
         0     0% 0.0056%  2753.49MB 30.93%  cmd/compile/internal/gc.(*state).linkForwardReferences
  947.75MB 10.64% 10.65%  2753.49MB 30.93%  cmd/compile/internal/gc.(*state).lookupVarOutgoing
         0     0% 10.65%  2753.49MB 30.93%  cmd/compile/internal/gc.(*state).resolveFwdRef
  460.90MB  5.18% 15.83%  2667.34MB 29.96%  cmd/compile/internal/ssa.(*regAllocState).regalloc
         0     0% 15.83%  2118.71MB 23.80%  cmd/compile/internal/ssa.stackalloc
 1819.24MB 20.43% 36.26%  1819.24MB 20.43%  cmd/compile/internal/ssa.(*Func).newValue
         0     0% 36.26%  1810.24MB 20.33%  cmd/compile/internal/ssa.(*Block).NewValue0A
  416.07MB  4.67% 40.93%  1476.37MB 16.58%  cmd/compile/internal/ssa.(*stackAllocState).init
  625.34MB  7.02% 47.96%  1041.10MB 11.69%  cmd/compile/internal/ssa.(*regAllocState).init
  947.44MB 10.64% 58.60%   947.44MB 10.64%  cmd/compile/internal/ssa.(*stackAllocState).buildInterferenceGraph
  581.15MB  6.53% 65.13%   649.08MB  7.29%  cmd/compile/internal/gc.genssa
  638.83MB  7.18% 72.30%   642.34MB  7.21%  cmd/compile/internal/ssa.(*stackAllocState).stackalloc
  525.89MB  5.91% 78.21%   526.62MB  5.91%  cmd/compile/internal/ssa.cse
  482.10MB  5.41% 83.62%   482.10MB  5.41%  cmd/compile/internal/ssa.schedule
  415.25MB  4.66% 88.29%   415.77MB  4.67%  cmd/compile/internal/ssa.(*regAllocState).computeLive
  312.07MB  3.51% 91.79%   312.07MB  3.51%  cmd/compile/internal/ssa.(*Func).newSparseSet

i.e., (sorry for the crap formatting) it's reference linking in ssa.go, and regalloc, by a lot.

newValue is almost 100% linkForwardReferences -> resolveFwdRef -> lookupVarOutgoing -> (*Block)NewValue0A -> (*Func)NewValue
lookupVarOutgoing (flat cost) is all at m[name] = v (i.e., forward references).
buildInterferenceGraph is just the interference graph.

I think that's good for a start.

Damian Gryski

unread,
Mar 11, 2016, 8:35:18 AM3/11/16
to golang-dev, bjorn.eri...@gmail.com


On Wednesday, March 9, 2016 at 6:21:46 PM UTC+1, David Chase wrote:
Playing with genpkg on tip:

go run gen.go -n 3000  -noinitfn > x/types.go
time go build  -gcflags='-d=ssa/check/off -memprofile=x.mprof' ./x
go tool pprof -alloc_space /Users/drchase/GoogleDrive/work/go-ssa/pkg/tool/darwin_amd64/compile x/x.mprof

So it happens that check is a pig and I haven't looked into that yet, but...

<snipped>
 
i.e., (sorry for the crap formatting) it's reference linking in ssa.go, and regalloc, by a lot.

newValue is almost 100% linkForwardReferences -> resolveFwdRef -> lookupVarOutgoing -> (*Block)NewValue0A -> (*Func)NewValue
lookupVarOutgoing (flat cost) is all at m[name] = v (i.e., forward references).
buildInterferenceGraph is just the interference graph.

I think that's good for a start.
 
Was there an issue filed for this?

Damian

David Chase

unread,
Mar 11, 2016, 9:05:22 AM3/11/16
to Damian Gryski, golang-dev, Bjørn Erik Pedersen
Now there is.  I filed two, since the memory hogs are two very different pieces of SSA.

--

Dave Cheney

unread,
Mar 13, 2016, 9:39:25 PM3/13/16
to golang-dev
Thanks for filing those bugs David. Taking vim25/types out of the picture for the moment, the next worst package is github.com/juju/juju/state (hereinafter state)

The compilation times for state are in excess of 14 seconds on this machine, however I noticed that if I reduce the -p setting from the default of 4 (for this machine) to 2, the timings for state drop signficantly, yet the overall times for the whole build are comparable (they are not exact, but given we're halving the effective concurrency, I feel they are close enough to say they are similar)

lucky(~/devel/benchjuju) % grep ^real /tmp/benchjuju.bb48b86-p*.txt
/tmp/benchjuju.bb48b86-p2.txt:real      1m56.889s
/tmp/benchjuju.bb48b86-p2.txt:real      2m7.545s
/tmp/benchjuju.bb48b86-p2.txt:real      2m12.262s
/tmp/benchjuju.bb48b86-p2.txt:real      2m13.698s
/tmp/benchjuju.bb48b86-p2.txt:real      2m14.841s
/tmp/benchjuju.bb48b86-p4.txt:real      1m50.869s
/tmp/benchjuju.bb48b86-p4.txt:real      1m58.680s
/tmp/benchjuju.bb48b86-p4.txt:real      2m0.544s
/tmp/benchjuju.bb48b86-p4.txt:real      2m1.168s
/tmp/benchjuju.bb48b86-p4.txt:real      2m2.037s

lucky(~/devel/benchjuju) % grep juju/state.a\  /tmp/benchjuju.bb48b86-p*.txt | cut -f -6 -d ' '
/tmp/benchjuju.bb48b86-p2.txt:7.77 9.82 0.15 /home/dfc/go/pkg/tool/linux_amd64/compile -o $WORK/github.com/juju/juju/state.a

/tmp/benchjuju.bb48b86-p2.txt:8.61 10.78 0.24 /home/dfc/go/pkg/tool/linux_amd64/compile -o $WORK/github.com/juju/juju/state.a

/tmp/benchjuju.bb48b86-p2.txt:9.20 11.58 0.19 /home/dfc/go/pkg/tool/linux_amd64/compile -o $WORK/github.com/juju/juju/state.a

/tmp/benchjuju.bb48b86-p2.txt:9.08 11.37 0.20 /home/dfc/go/pkg/tool/linux_amd64/compile -o $WORK/github.com/juju/juju/state.a

/tmp/benchjuju.bb48b86-p2.txt:9.36 11.82 0.21 /home/dfc/go/pkg/tool/linux_amd64/compile -o $WORK/github.com/juju/juju/state.a

/tmp/benchjuju.bb48b86-p4.txt:11.74 12.20 0.24 /home/dfc/go/pkg/tool/linux_amd64/compile -o $WORK/github.com/juju/juju/state.a

/tmp/benchjuju.bb48b86-p4.txt:14.10 14.56 0.26 /home/dfc/go/pkg/tool/linux_amd64/compile -o $WORK/github.com/juju/juju/state.a

/tmp/benchjuju.bb48b86-p4.txt:14.70 14.83 0.31 /home/dfc/go/pkg/tool/linux_amd64/compile -o $WORK/github.com/juju/juju/state.a

/tmp/benchjuju.bb48b86-p4.txt:14.81 15.10 0.27 /home/dfc/go/pkg/tool/linux_amd64/compile -o $WORK/github.com/juju/juju/state.a

/tmp/benchjuju.bb48b86-p4.txt:14.94 15.21 0.31 /home/dfc/go/pkg/tool/linux_amd64/compile -o $WORK/github.com/juju/juju/state.a


Which is a very unexpected result

Compiler version was bb48b86


Dave Cheney

unread,
Mar 13, 2016, 9:59:00 PM3/13/16
to golang-dev
For another data point, I modified benchjuju to only build up to the state package, and the compile time for that back is lower again.

My simplistic explanation of this is where 1.4 and earlier could say that one compile job == one cpu, depending on the size of the package being compiled, 1.5 or later's tool/*/compile is more than a cpu's worth of work, and that is causing contention.

Dave Cheney

unread,
Mar 14, 2016, 8:33:36 PM3/14/16
to golang-dev
Hello,

I've been keeping the benchjuju graph up to date and the good news is the work that Matthew, Josh, Ian and others have been doing is having an effect.


​(i have no idea why google sheets is determined to export a postage stamp sized image)


Thanks

Dave

Dave Cheney

unread,
Mar 22, 2016, 7:12:34 PM3/22/16
to golang-dev
Good news everyone! After a month's work the results are really starting to pay off. The following is a graph of real and user time relative to 1.4.3. Both real and user time is trending downwards. Real time is just above 200% relative to 1.4.3, and both real and user time are trending downwards at the same rate.



Adam Strzelecki

unread,
Mar 24, 2016, 6:49:38 AM3/24/16
to golang-dev
Dave Cheney wrote:
Good news everyone! After a month's work the results are really starting to pay off. The following is a graph of real and user time relative to 1.4.3. Both real and user time is trending downwards. Real time is just above 200% relative to 1.4.3, and both real and user time are trending downwards at the same rate.

These are really impressive results. Just a small question, what platform you are testing it on? Because I see a little less optimistic results here on OS X. Maybe due that known macho linker ineffectiveness?

-- Adam

Dave Cheney

unread,
Mar 24, 2016, 6:53:24 AM3/24/16
to Adam Strzelecki, golang-dev
I'm using linux. I know that macho generation in the darwin linker is
a performance pain point.

Adam Strzelecki

unread,
Mar 24, 2016, 10:17:42 AM3/24/16
to golang-dev, adam.st...@java.pl
Dave Cheney wrote:
I'm using linux. I know that macho generation in the darwin linker is
a performance pain point.

Can you please point me to some OS X Go linker benchmarks. I have reproduced the problem described in #12259 and updated the issue, as it seems it is dsymutil external call to extract dwarf which stalls the linker, not the Go linker itself. But the issue there is really specific to linking v8 via cgo, and this is pretty huge library of 100MB dwarf file, so I can expect it go slow. I am willing to try better understand the problem, especially how it affects regular (non-cgo) programs build time on OSX.

Dave Cheney

unread,
Mar 24, 2016, 10:19:34 AM3/24/16
to Adam Strzelecki, golang-dev
I'm sorry, other than /usr/bin/time I have not advice to give you specific to you v8 cgo issue.

--

Adam Strzelecki

unread,
Mar 24, 2016, 4:07:39 PM3/24/16
to golang-dev, adam.st...@java.pl
I did a build performance comparison for https://github.com/gogits/gogs and my numbers are little bit different. Still the improvement in latest tip is great. Also after all I don't see a problem with OS X times, at least not in case of gogs.

OS X 10.11.4 i5-4258U CPU @ 2.40GHz 2 core

   version        real   user   sys  real/1.4.3
go 1.4.3         10.93  19.88  4.08  --
go 1.5.3         22.32  54.68  4.87  2.04x
go 1.6           21.87  52.19  4.88  2.00x
go 1.7 d8ee180   16.98  40.72  4.08  1.55x

Linux 3.16 E5-1650v2 @ 3.50GHz 6 core

   version        real   user   sys  real/1.4.3
go 1.4.3          4.78  12.87  2.06  --
go 1.5.3         10.83  44.82  2.51  2.27x
go 1.6           10.76  42.56  2.30  2.25x
go 1.7 d8ee180    8.77  33.64  2.02  1.83x

Cheers,
-- Adam

Dave Cheney

unread,
Mar 24, 2016, 5:53:22 PM3/24/16
to Adam Strzelecki, golang-dev
Thanks for posting your gogs numbers, I'll try to work them up into a benchjuju style datatset so they can be used as a reference.

--

Dave Cheney

unread,
Mar 28, 2016, 7:23:19 PM3/28/16
to Adam Strzelecki, golang-dev
Hello,

gogs has many build options. Could you please post the complete list
of steps you used to build gogs so I can turn that into a benchmark
suite.

Thanks

Adam Strzelecki

unread,
Mar 30, 2016, 4:51:12 PM3/30/16
to golang-dev, adam.st...@java.pl
gogs has many build options. Could you please post the complete list 
of steps you used to build gogs so I can turn that into a benchmark
suite.

Well, regardless of maybe a bit overcomplicated build instructions on Gogs site, gogs is just a plain Go app you "go get -u -d github.com/gogits/gogs" then "go get -u -d ./..." and finally "go build". I am not installing its dependencies (-d) flag, so they are built as well on every "go build". Optionally you may just remove $GOPATH/pkg to ensure everything is rebuilt.

Then I was just timing with "time -p $GOROOT/bin/go build" for each go version.

What's nice about gogs in terms of compiler performance testing, is that it is quite a large project, with pretty many deps, and what is most important still builds on 1.4.

-- Adam

anca.e...@gmail.com

unread,
Mar 30, 2016, 7:34:17 PM3/30/16
to golang-dev
Hi.


I think this benchmark must be executed automatically, and included in CI.
And rule the no regression policy, exceptions: only for crash and security bugs.

Andrew Gerrand

unread,
Mar 30, 2016, 7:37:09 PM3/30/16
to Anca Emanuel, golang-dev

On 31 March 2016 at 06:32, <anca.e...@gmail.com> wrote:
I think this benchmark must be executed automatically, and included in CI.

We're working on it

David Chase

unread,
Mar 30, 2016, 11:06:02 PM3/30/16
to anca.e...@gmail.com, golang-dev
It might not hurt to think about how compiler developers might react to a "no regression policy" on compile-time performance before promoting such a draconian policy, and what effect that would have had if instituted back in Go 1.4.

--

Dave Cheney

unread,
Mar 31, 2016, 7:48:02 PM3/31/16
to Adam Strzelecki, golang-dev
Hello,

I put together

https://github.com/davecheney/benchgogs

and I see on my linux/amd64 machine roughly

go version go1.4.2 linux/amd64
real 0m8.259s
user 0m17.618s
sys 0m2.517s


go version devel +4b209db Thu Mar 31 02:54:10 2016 +0000 linux/amd64
real 0m16.313s
user 0m44.641s
sys 0m2.305s

Which is consistent with the numbers I posted on twitter yesterday.

Please try this benchmark out and send any corrections you wish via PR.

frank.s...@gmail.com

unread,
Apr 2, 2016, 10:30:15 PM4/2/16
to golang-dev
FWIW, I ran a test on our internal codebase of 280k lines, 349 pkgs, 57 binaries with rm $GOPATH/{bin,pkg}/*; go install ./...

darwin/amd64

OSX 10.11.4, MBP 2014, 2,8 GHz Intel Core i7

go1.4: 14 sec, 487MB
go1.5.3: 35sec, 559MB
go1.5.3 GOGC=off: 24.4 sec
go1.6: 37 sec, 687MB
go1.6 GOGC=off: 26 sec
devel +ac8d97b: 18 sec, 471MB
devel +ac8d97b GOGC=off: 14.3 sec

Awesome job...
Frank

Dave Cheney

unread,
Apr 9, 2016, 1:41:54 AM4/9/16
to golang-dev
Hello,

As things are getting colder in Australia it is become more
complicated to get my test machine up to temperature that matches the
values taken previously. To counter this I've changed my benchmark
methodology to run 1.4.3, 1.6 and tip in a loop until all three values
stabalise. Here is a graph of the current state of play.
image (3).png

andrewc...@gmail.com

unread,
Apr 9, 2016, 2:03:35 AM4/9/16
to golang-dev
From a pure PR point of view the difference between "New backend that slows the compiler down but produces better code" to "New backend that speeds up the compiler AND produces better code." is really big. The number of complaints this prevents makes it very worthwhile.

Good work.

Ugorji

unread,
Apr 9, 2016, 11:41:47 AM4/9/16
to golang-dev
Hi Dave,

Thanks so much for running this regularly for the group and sharing with us.

Did you run with GOGC=off or a high number (>400). It will be nice to see a column bar for development with GC less aggressive or off, to see clearly how much of an impact GC has.

Thanks.

Dave Cheney

unread,
Apr 9, 2016, 6:20:40 PM4/9/16
to Ugorji, golang-dev
The last time I measured running building Jujud from scratch was worth
about 20 seconds of real time. Disabling the gc is not a practical
strategy in general.

>
> Thanks.

anca.e...@gmail.com

unread,
Apr 21, 2016, 4:40:14 AM4/21/16
to golang-dev
There are some opinions about how feasible is https://github.com/gonum/plot to this job, on gonum-dev list.

Seb Binet

unread,
Apr 21, 2016, 7:25:00 AM4/21/16
to anca.e...@gmail.com, golang-dev
On Thu, Apr 21, 2016 at 10:40 AM, <anca.e...@gmail.com> wrote:
There are some opinions about how feasible is https://github.com/gonum/plot to this job, on gonum-dev list.

apologies if this is obvious but, what are you replying to or referring to?
(google-groups doesn't show any obvious thread structure...)

-s

Dave Cheney

unread,
Apr 28, 2016, 7:52:18 PM4/28/16
to Seb Binet, anca.e...@gmail.com, golang-dev
Good news everyone, since gri's switched to making binary
export/import the default, and a few followups from khr the time to
build jujud compared to 1.4.3 is now solidly below 2x.
Screen Shot 2016-04-29 at 09.49.07.png

Manlio Perillo

unread,
Jan 11, 2018, 11:17:20 AM1/11/18
to golang-dev
Il giorno lunedì 7 marzo 2016 11:22:36 UTC+1, Dave Cheney ha scritto:
Hello,

By a quirk of fate, the juju codebase has had to maintain compatibility with go 1.2 since that release. This afternoon I did an experiment compiling and linking our biggest command, 512 packages in total, with the latest release compiler from each minor revision.

The raw results are here, along with a graph on the second tab. The test machine was a thinkpad x220, Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 8gb of ram, everything running out of a ramdisk on /tmp.





I have heard rumours that others are also collecting benchmark data, but they may not have access to such a pathological data set as juju :) I intend to keep this data set current for as long as I have access to this machine. 

Thanks

Dave


Are you still updating this data set ?
The spreadsheet requires access permission.


Thanks
Manlio

Dave Cheney

unread,
Jan 11, 2018, 3:03:06 PM1/11/18
to Manlio Perillo, golang-dev
No. 
--
Reply all
Reply to author
Forward
0 new messages