troubleshooting 100% cpu peg - runtime._ExternalCode ?

632 views
Skip to first unread message

Jason E. Aten

unread,
Feb 7, 2017, 12:06:09 AM2/7/17
to golang-nuts

On linux/amd64 using go1.8rc3, I'm seeing one Go built process peg one cpu to 100% after running for a few minutes.


Tasks: 499 total,   1 running, 497 sleeping,   0 stopped,   1 zombie                                                                                                                                     
%Cpu(s):  2.8 us,  3.5 sy,  0.0 ni, 93.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st                                                                                                                          
KiB Mem : 12356040+total, 65169064 free,  5018704 used, 53372628 buff/cache                                                                                                                              
KiB Swap:        0 total,        0 free,        0 used. 11790566+avail Mem                                                                                                                               
                                                                                                                                                                                                         
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                              
  84359 jaten     20   0 1156916  52152   4632 S 100.7  0.0   9:05.82 vhaline -info -web -addr loc+                                                                                                      
  84486 jaten     20   0  941964  19884   4268 S   0.7  0.0   0:05.58 vhaline -info -addr localhos+                                                                                                      
  84501 jaten     20   0 1020484  14712   4216 S   0.7  0.0   0:04.95 vhaline -info -addr localhos+  


when I profile, I get alot fo runtime._ExternalCode. What is that and what does it mean?

$ go tool pprof --nodefraction=0.1 vhaline ./profile.cpu.runaway.rootnode                                  
Entering interactive mode (type "help" for commands)                                                                         
(pprof) top10                                                                                                                
21.16s of 29.97s total (70.60%)                                                                                              
Dropped 55 nodes (cum <= 3s)                                                                                                 
Showing top 10 nodes out of 16 (cum >= 10.61s)                                                                               
      flat  flat%   sum%        cum   cum%                                                                                   
    13.55s 45.21% 45.21%     13.55s 45.21%  runtime._ExternalCode                                                            
     6.23s 20.79% 66.00%      7.85s 26.19%  syscall.Syscall                                                                  
     0.27s   0.9% 66.90%     16.42s 54.79%  github.com/glycerine/vhaline/vhaline.(*srvReader).start.func1                    
     0.25s  0.83% 67.73%     10.20s 34.03%  net.(*netFD).Read                                                                
     0.23s  0.77% 68.50%      3.96s 13.21%  net.setDeadlineImpl                                                              
     0.20s  0.67% 69.17%     10.96s 36.57%  bufio.(*Reader).Peek                                                             
     0.12s   0.4% 69.57%     10.76s 35.90%  bufio.(*Reader).fill                                                             
     0.11s  0.37% 69.94%     11.09s 37.00%  github.com/glycerine/tmframe2.(*FrameReader).NextFrame                           
     0.11s  0.37% 70.30%      7.96s 26.56%  syscall.read                                                                     
     0.09s   0.3% 70.60%     10.61s 35.40%  net.(*conn).Read                                                                 
(pprof) 

source code is here; the processes can be built from https://github.com/glycerine/vhaline  at commit commit ca32a98799b87627798e05bb8e6cc29d9cac7ec3 or using tag "runaway".

pprof profiling was turned on; could this account for this peg?

I was running three vhaline processes in a chain; the cpu peg happened to the first process, but I've also seen it happen for the 2nd.

they were started like this, in three separate shells
$ vhaline -info -web -addr localhost:9000                          # process
84359
$ vhaline -info -addr localhost:9001 -parent localhost:9000 -cpu   # process
84486
$ vhaline -info -addr localhost:9002 -parent localhost:9001 -cpu   # process 84501

Since there is nothing that changes in the processes over time, the fact that it kicks in after a few minutes makes me think it may be a garbage collection issue.           
                                                                                        

    

Dave Cheney

unread,
Feb 7, 2017, 12:07:55 AM2/7/17
to golang-nuts
Since there is nothing that changes in the processes over time, the fact that it kicks in after a few minutes makes me think it may be a garbage collection issue.   

Running with GODEBUG=gctrace=1 will confirm / deny this hypothesis.

Dave Cheney

unread,
Feb 7, 2017, 12:16:02 AM2/7/17
to golang-nuts
It looks like there are some data races in your code


It's not possible to debug runtime problems until there are no data races.

Jason E. Aten

unread,
Feb 7, 2017, 12:42:28 AM2/7/17
to golang-nuts
the race is fixed in the latest push; a572f570c52f70c9518bc1b3e3319ff9e2424885; it was an artifact of adding logging levels, and would have affected only the tests.

I did manage to catch the processing running away again, this time at 300%, and I got some output with gctrace=1 as you suggested. I'm not sure how to read the lines though; could you advise Dave?


runaway cpu behavior happening from at least:
 Tue Feb  7 05:26:24 UTC 2017  - 2017/02/07 05:37:05.587710

[jaten@host]$ GODEBUG=gctrace=1 go/bin/vhaline.go1.8rc3 -info -addr localhost:9001 -parent localhost:9000 -cpu
2017/02/07 05:14:07 mid node 'a7fb658c' started on localhost:9001. my parent is 'localhost:9000'
2017/02/07 05:14:07 mid node 'a7fb658c' using ttl=3s and beat=1s.
2017/02/07 05:14:07.136012 client.go:85: [pid 86063] replica (a7fb658c) client dialing parent at 'localhost:9000'
2017/02/07 05:14:07.136266 client.go:97: [pid 86063] replica (a7fb658c) client reached parent at 'localhost:9000'
gc 1 @0.007s 1%: 0.18+0.39+0.093 ms clock, 0.36+0.23/0.56/0.23+0.18 ms cpu, 17->17->16 MB, 18 MB goal, 16 P
gc 2 @0.008s 2%: 0.063+0.43+0.078 ms clock, 0.44+0.033/0.69/0.28+0.55 ms cpu, 32->32->32 MB, 33 MB goal, 16 P
2017/02/07 05:14:07.139570 replica.go:511: [pid 86063] replica (a7fb658c) sees from parent: ToChildConnectAck.
2017/02/07 05:14:07 pprof profiler providing web server on 'localhost:6061'
2017/02/07 05:14:07 cmd/vhaline/main.go: contacted parent event.
2017/02/07 05:14:09.139731 replica.go:587: [pid 86063] replica (a7fb658c) line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) -> child(/).
2017/02/07 05:14:09.139759 replica.go:598: [pid 86063] replica (a7fb658c) last-ping (parent/child): 1.000176418s/never
gc 3 @52.176s 0%: 0.065+0.24+0.082 ms clock, 0.58+0.15/0.62/0.47+0.74 ms cpu, 49->49->48 MB, 65 MB goal, 16 P
gc 4 @52.177s 0%: 0.041+0.55+0.094 ms clock, 0.502017/02/07 05:14:59.307973 replica.go:742: [pid 86063] replica (a7fb658c) NewChildConnect from child 4040ba06' at '127.0.0.1:46194'
+0.054/0.54/0.40+1.1 ms cpu, 64->64->64 MB, 97 MB goal, 16 P
2017/02/07 05:14:59 cmd/vhaline/main.go: contacted child event.
2017/02/07 05:15:16.160638 replica.go:587: [pid 86063] replica (a7fb658c) line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) -> child(4040ba06/127.0.0.1:46194).
2017/02/07 05:15:16.160673 replica.go:598: [pid 86063] replica (a7fb658c) last-ping (parent/child): 1.000166804s/1.000144038s
2017/02/07 05:16:24.182488 replica.go:587: [pid 86063] replica (a7fb658c) line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) -> child(4040ba06/127.0.0.1:46194).
2017/02/07 05:16:24.182518 replica.go:598: [pid 86063] replica (a7fb658c) last-ping (parent/child): 1.000148515s/1.000133515s
scvg0: inuse: 68, idle: 0, sys: 69, released: 0, consumed: 69 (MB)
GC forced
gc 5 @172.183s 0%: 0.089+0.30+0.086 ms clock, 1.2+0/0.74/1.0+1.2 ms cpu, 67->67->64 MB, 129 MB goal, 16 P
2017/02/07 05:17:24.201369 replica.go:587: [pid 86063] replica (a7fb658c) line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) -> child(4040ba06/127.0.0.1:46194).
2017/02/07 05:17:24.201404 replica.go:598: [pid 86063] replica (a7fb658c) last-ping (parent/child): 1.000158566s/1.000172823s
2017/02/07 05:18:29.222380 replica.go:587: [pid 86063] replica (a7fb658c) line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) -> child(4040ba06/127.0.0.1:46194).
2017/02/07 05:18:29.222413 replica.go:598: [pid 86063] replica (a7fb658c) last-ping (parent/child): 1.000154109s/1.000142059s
GC forced
gc 6 @292.191s 0%: 0.18+0.31+0.10 ms clock, 2.7+0/0.72/0.89+1.6 ms cpu, 67->67->64 MB, 129 MB goal, 16 P
scvg1: inuse: 67, idle: 2, sys: 69, released: 0, consumed: 69 (MB)
2017/02/07 05:19:35.243248 replica.go:587: [pid 86063] replica (a7fb658c) line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) -> child(4040ba06/127.0.0.1:46194).
2017/02/07 05:19:35.243274 replica.go:598: [pid 86063] replica (a7fb658c) last-ping (parent/child): 1.000142593s/1.000160351s
2017/02/07 05:20:45.265426 replica.go:587: [pid 86063] replica (a7fb658c) line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) -> child(4040ba06/127.0.0.1:46194).
2017/02/07 05:20:45.265455 replica.go:598: [pid 86063] replica (a7fb658c) last-ping (parent/child): 1.000176146s/1.000179458s
GC forced
gc 7 @412.200s 0%: 0.027+0.28+0.097 ms clock, 0.43+0/0.78/0.69+1.5 ms cpu, 67->67->64 MB, 129 MB goal, 16 P
scvg2: inuse: 67, idle: 1, sys: 69, released: 0, consumed: 69 (MB)
2017/02/07 05:21:54.287241 replica.go:587: [pid 86063] replica (a7fb658c) line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) -> child(4040ba06/127.0.0.1:46194).
2017/02/07 05:21:54.287275 replica.go:598: [pid 86063] replica (a7fb658c) last-ping (parent/child): 1.000167178s/2.000483842s
2017/02/07 05:21:54.287396 replica.go:278: [pid 86063] replica (a7fb658c) child connection lost
2017/02/07 05:21:55.287620 replica.go:613: [pid 86063] replica (a7fb658c) it's been 3.000864314s (> ttl == 3s) since last child contact, declaring child '4040ba06' at '127.0.0.1:46194' to have failed.
2017/02/07 05:21:55 cmd/vhaline/main.go:child failed event.
2017/02/07 05:21:55.821703 replica.go:742: [pid 86063] replica (a7fb658c) NewChildConnect from child 96bfad81' at '127.0.0.1:46534'
2017/02/07 05:22:01.289558 replica.go:278: [pid 86063] replica (a7fb658c) child connection lost
2017/02/07 05:22:01.865167 replica.go:374: [pid 86063] replica (a7fb658c) rejecting new child 'b305026f/127.0.0.1:46542' b/c already have '96bfad81/127.0.0.1:46534'
gc 8 @474.734s 0%: 0.031+0.51+0.11 ms clock, 0.50+0.093/0.99/0.020+1.9 ms cpu, 130->130->128 MB, 131 MB goal, 16 P
2017/02/07 05:22:03.290073 replica.go:613: [pid 86063] replica (a7fb658c) it's been 4.001142473s (> ttl == 3s) since last child contact, declaring child '96bfad81' at '127.0.0.1:46534' to have failed.
2017/02/07 05:22:03 cmd/vhaline/main.go:child failed event.
2017/02/07 05:22:05.038889 replica.go:742: [pid 86063] replica (a7fb658c) NewChildConnect from child 753dd7fe' at '127.0.0.1:46550'
2017/02/07 05:23:04.309740 replica.go:587: [pid 86063] replica (a7fb658c) line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) -> child(753dd7fe/127.0.0.1:46550).
2017/02/07 05:23:04.309767 replica.go:598: [pid 86063] replica (a7fb658c) last-ping (parent/child): 1.000190244s/1.000179191s
GC forced
gc 9 @594.743s 0%: 0.025+0.48+0.12 ms clock, 0.40+0/1.0/0.29+2.0 ms cpu, 163->163->161 MB, 257 MB goal, 16 P
scvg3: inuse: 163, idle: 2, sys: 165, released: 0, consumed: 165 (MB)
2017/02/07 05:24:12.331876 replica.go:587: [pid 86063] replica (a7fb658c) line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) -> child(753dd7fe/127.0.0.1:46550).
2017/02/07 05:24:12.331911 replica.go:598: [pid 86063] replica (a7fb658c) last-ping (parent/child): 1.000161445s/1.000345842s
2017/02/07 05:25:16.352877 replica.go:587: [pid 86063] replica (a7fb658c) line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) -> child(753dd7fe/127.0.0.1:46550).
2017/02/07 05:25:16.352912 replica.go:598: [pid 86063] replica (a7fb658c) last-ping (parent/child): 1.000113785s/1.000117716s
GC forced
gc 10 @714.754s 0%: 0.019+0.34+0.12 ms clock, 0.31+0/1.0/0.52+2.0 ms cpu, 164->164->161 MB, 322 MB goal, 16 P
2017/02/07 05:26:20.373956 replica.go:587: [pid 86063] replica (a7fb658c) line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) -> child(753dd7fe/127.0.0.1:46550).
2017/02/07 05:26:20.373982 replica.go:598: [pid 86063] replica (a7fb658c) last-ping (parent/child): 1.000188576s/1.000199715s
scvg4: inuse: 164, idle: 2, sys: 166, released: 0, consumed: 166 (MB)

### observing the 300% cpu spike from here on...

2017/02/07 05:27:26.395974 replica.go:587: [pid 86063] replica (a7fb658c) line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) -> child(753dd7fe/127.0.0.1:46550).
2017/02/07 05:27:26.396009 replica.go:598: [pid 86063] replica (a7fb658c) last-ping (parent/child): 1.000216294s/1.000187739s
GC forced
gc 11 @834.754s 0%: 0.047+0.35+0.11 ms clock, 0.76+0/1.0/0.55+1.8 ms cpu, 164->164->161 MB, 323 MB goal, 16 P
2017/02/07 05:28:36.418341 replica.go:587: [pid 86063] replica (a7fb658c) line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) -> child(753dd7fe/127.0.0.1:46550).
2017/02/07 05:28:36.418369 replica.go:598: [pid 86063] replica (a7fb658c) last-ping (parent/child): 1.000121649s/1.000160242s
scvg5: inuse: 164, idle: 1, sys: 166, released: 0, consumed: 166 (MB)
2017/02/07 05:29:38.438866 replica.go:587: [pid 86063] replica (a7fb658c) line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) -> child(753dd7fe/127.0.0.1:46550).
2017/02/07 05:29:38.438893 replica.go:598: [pid 86063] replica (a7fb658c) last-ping (parent/child): 1.000151323s/1.000168023s
GC forced
gc 12 @954.763s 0%: 0.025+0.36+0.11 ms clock, 0.40+0/1.0/0.54+1.8 ms cpu, 166->166->162 MB, 323 MB goal, 16 P
2017/02/07 05:30:42.459870 replica.go:587: [pid 86063] replica (a7fb658c) line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) -> child(753dd7fe/127.0.0.1:46550).
2017/02/07 05:30:42.459896 replica.go:598: [pid 86063] replica (a7fb658c) last-ping (parent/child): 1.000071789s/1.000163893s
scvg6: inuse: 166, idle: 1, sys: 168, released: 0, consumed: 168 (MB)
2017/02/07 05:31:46.480874 replica.go:587: [pid 86063] replica (a7fb658c) line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) -> child(753dd7fe/127.0.0.1:46550).
2017/02/07 05:31:46.480901 replica.go:598: [pid 86063] replica (a7fb658c) last-ping (parent/child): 1.000216787s/1.000181407s
GC forced
gc 13 @1074.772s 0%: 0.034+0.39+0.12 ms clock, 0.54+0/1.0/0.37+2.0 ms cpu, 165->165->162 MB, 325 MB goal, 16 P
2017/02/07 05:32:52.502406 replica.go:587: [pid 86063] replica (a7fb658c) line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) -> child(753dd7fe/127.0.0.1:46550).
2017/02/07 05:32:52.502432 replica.go:598: [pid 86063] replica (a7fb658c) last-ping (parent/child): 1.000112204s/1.00011439s
2017/02/07 05:33:54.523284 replica.go:587: [pid 86063] replica (a7fb658c) line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) -> child(753dd7fe/127.0.0.1:46550).
2017/02/07 05:33:54.523312 replica.go:598: [pid 86063] replica (a7fb658c) last-ping (parent/child): 1.000156022s/1.000152209s
GC forced
gc 14 @1194.775s 0%: 0.024+0.32+0.12 ms clock, 0.39+0/0.98/0.50+1.9 ms cpu, 169->169->164 MB, 325 MB goal, 16 P
scvg7: inuse: 167, idle: 4, sys: 171, released: 0, consumed: 171 (MB)
2017/02/07 05:34:57.544262 replica.go:587: [pid 86063] replica (a7fb658c) line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) -> child(753dd7fe/127.0.0.1:46550).
2017/02/07 05:34:57.544288 replica.go:598: [pid 86063] replica (a7fb658c) last-ping (parent/child): 1.000198885s/1.000148081s
GC forced
gc 15 @1314.779s 0%: 0.031+0.29+0.11 ms clock, 0.49+0/0.99/0.47+1.7 ms cpu, 167->167->164 MB, 329 MB goal, 16 P
2017/02/07 05:36:02.566141 replica.go:587: [pid 86063] replica (a7fb658c) line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) -> child(753dd7fe/127.0.0.1:46550).
2017/02/07 05:36:02.566168 replica.go:598: [pid 86063] replica (a7fb658c) last-ping (parent/child): 1.00034139s/1.000175237s
scvg8: 0 MB released
scvg8: inuse: 167, idle: 4, sys: 171, released: 0, consumed: 171 (MB)
2017/02/07 05:37:05.587678 replica.go:587: [pid 86063] replica (a7fb658c) line status: parent(bbda6890/localhost:9000) -> me(a7fb658c/localhost:9001) -> child(753dd7fe/127.0.0.1:46550).
2017/02/07 05:37:05.587710 replica.go:598: [pid 86063] replica (a7fb658c) last-ping (parent/child): 1.000171768s/1.000188289s

### the 300% cpu peg is still in progress at this point.

none of the go1.7.4 processes are showing the cpu peg.



Message has been deleted

Dave Cheney

unread,
Feb 7, 2017, 12:49:42 AM2/7/17
to golang-nuts
>  did manage to catch the processing running away again, this time at 300%, and I got some output with gctrace=1 as you suggested. I'm not sure how to read the lines though; could you advise Dave?

The give away is the frequency of the gc lines. gc 15 (the 15th gc event) happened at 1314 seconds into your programs execution this tells me that gc is likely not your problem. If it were your terminal would be swamped by gc log lines indicating the gc was running constantly.

Jason E. Aten

unread,
Feb 7, 2017, 12:53:45 AM2/7/17
to golang-nuts

On Monday, February 6, 2017 at 11:49:42 PM UTC-6, Dave Cheney wrote:
The give away is the frequency of the gc lines. gc 15 (the 15th gc event) happened at 1314 seconds into your programs execution this tells me that gc is likely not your problem. If it were your terminal would be swamped by gc log lines indicating the gc was running constantly.

Ok. Any idea was runtime._ExternalCode means?

Dave Cheney

unread,
Feb 7, 2017, 12:55:21 AM2/7/17
to golang-nuts
I think there are more data races in your product

Dave Cheney

unread,
Feb 7, 2017, 12:55:48 AM2/7/17
to golang-nuts
None whatsoever I'm afraid

Jason E. Aten

unread,
Feb 7, 2017, 1:08:47 AM2/7/17
to golang-nuts
On Monday, February 6, 2017 at 11:55:21 PM UTC-6, Dave Cheney wrote:
I think there are more data races in your product

Thanks for pointing that out, Dave.  Fixed in latest. How were you running (what is in stress.bash?) that caused more issues that "go test -race" alone ?

Dave Cheney

unread,
Feb 7, 2017, 1:31:05 AM2/7/17
to golang-nuts
https://dave.cheney.net/2013/06/19/stress-test-your-go-packages

It's just a tiny script to randomly set GOMAXPROCS

Jason E. Aten

unread,
Feb 7, 2017, 11:13:28 AM2/7/17
to Dave Cheney, golang-nuts
Nice, stress.bash is indeed super simple, but finds issues that -race alone does not. Thanks for that.

I'm still having the 100% cpu peg problem. I was able to observe it under go1.7.4 as well, so it's not specifically a go1.8rc3 issue.

If anyone can supply information on the mystery runtime._ExternalCode that would be super helpful.

I'm not linking in any cgo libraries. (beyond what the runtime does automatically for dns). Err... so actually there are some C libs... what is the magic phrase for not linking any C .so?

[jaten@biggie bin]$ ldd vhaline
        linux-vdso.so.1 =>  (0x00007ffecfd3c000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f2a6b5dc000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f2a6b21b000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f2a6b808000)
[jaten@biggie bin]$


On Tue, Feb 7, 2017 at 12:31 AM, Dave Cheney <da...@cheney.net> wrote:
https://dave.cheney.net/2013/06/19/stress-test-your-go-packages

It's just a tiny script to randomly set GOMAXPROCS

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/uytOgKskQdg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jason E. Aten

unread,
Feb 7, 2017, 11:38:55 AM2/7/17
to golang-nuts
On Tue, Feb 7, 2017 at 10:12 AM, Jason E. Aten <j.e....@gmail.com> wrote:
If anyone can supply information on the mystery runtime._ExternalCode that would be super helpful.

I'm not linking in any cgo libraries. (beyond what the runtime does automatically for dns). Err... so actually there are some C libs... what is the magic phrase for not linking any C .so?

[jaten@biggie bin]$ ldd vhaline
        linux-vdso.so.1 =>  (0x00007ffecfd3c000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f2a6b5dc000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f2a6b21b000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f2a6b808000)
[jaten@biggie bin]$

I'm not having any luck building with CGO_ENABLED=0 under go1.7.4.... linux/amd64. I'm getting "go install net: open /usr/local/go/pkg/linux_amd64/net.a: permission". Context:

[jaten@biggie vhaline]$ make
cd vhaline && go install && cd ../cmd/vhaline && go install
go install net: open /usr/local/go/pkg/linux_amd64/net.a: permission denied
make: *** [build] Error 1
[jaten@biggie vhaline]$ ls -al /usr/local/go/pkg/linux_amd64/net.a
-rw-r-xr-x 1 root root 1007434 Dec  1 21:17 /usr/local/go/pkg/linux_amd64/net.a
[jaten@biggie vhaline]$ file /usr/local/go/pkg/linux_amd64/net.a
/usr/local/go/pkg/linux_amd64/net.a: current ar archive
[jaten@biggie vhaline]$ ar x /usr/local/go/pkg/linux_amd64/net.a
[jaten@biggie vhaline]$ ls -alFtrh|tail
                                                           
-rw-rw-r--  1 jaten jaten  121 Feb  7 03:30 Makefile
drwxrwxr-x  2 jaten jaten    6 Feb  7 04:35 go1.8rc3.prof/
-rw-rw-r--  1 jaten jaten    0 Feb  7 04:36 cpu-profile.85518
-rw-rw-r--  1 jaten jaten    0 Feb  7 04:36 cpu-profile.85524
drwxrwxr-x  8 jaten jaten 4.0K Feb  7 15:09 .git/
drwxrwxr-x  5 jaten jaten 4.0K Feb  7 15:25 vhaline/
-rw-r--r--  1 jaten jaten  19K Feb  7 16:27 __.PKGDEF
-rw-r--r--  1 jaten jaten 947K Feb  7 16:27 _go_.o
-rw-r--r--  1 jaten jaten  19K Feb  7 16:27 _all.o
drwxrwxr-x  6 jaten jaten 4.0K Feb  7 16:27 ./
[jaten@biggie vhaline]$ env|grep CGO
CGO_ENABLED=0
[jaten@biggie vhaline]$  


Jason E. Aten

unread,
Feb 7, 2017, 11:51:47 AM2/7/17
to golang-nuts


On Tue, Feb 7, 2017 at 10:38 AM, Jason E. Aten <> wrote:

hmm... the profiling instructions from the blog post

https://blog.golang.org/profiling-go-programs

doesn't seem to work for go1.7.4/amd64:

[jaten@biggie 1.7.4]$ go tool pprof http://localhost:6067/debug/pprof/profile                                                
Fetching profile from http://localhost:6067/debug/pprof/profile                                                              
Please wait... (30s)                                                                                                         
server response: 500 Internal Server Error                                                                                   
[jaten@biggie 1.7.4]$  

 

Kale B

unread,
Feb 7, 2017, 11:58:23 AM2/7/17
to golang-nuts
Go to http://localhost:6067/debug/pprof/profile with a browser/curl/etc, it should display a more useful error message.

Jason E. Aten

unread,
Feb 7, 2017, 12:20:17 PM2/7/17
to golang-nuts
On Tue, Feb 7, 2017 at 10:58 AM, Kale B <ka...@lemnisys.com> wrote:
Go to http://localhost:6067/debug/pprof/profile with a browser/curl/etc, it should display a more useful error message.

Thanks Kale. I got the much more useful:
Could not enable CPU profiling: cpu profiling already in use
So the web didn't work, but it did seem to force the internal profiling to actually output something to the cpu-profile file that I set up.

So I got the following profile to show up, on the 100% cpu pegged process: somehow the svg and the top10 don't exactly tell the same story...

[jaten@biggie 1.7.4]$ go tool pprof vhaline.174 cpu-profile.85790.cp                                                                           
Entering interactive mode (type "help" for commands)                                                                                           
(pprof) top10                                                                                                                                  
294.19s of 785.18s total (37.47%)                                                                                                              
Dropped 106 nodes (cum <= 3.93s)                                                                                                               
Showing top 10 nodes out of 69 (cum >= 382.14s)                                                                                                
      flat  flat%   sum%        cum   cum%                                                                                                     
    42.63s  5.43%  5.43%     54.84s  6.98%  runtime.additab                                                                                    
    35.12s  4.47%  9.90%    112.79s 14.36%  os.(*File).Chdir                                                                                   
    34.45s  4.39% 14.29%    199.63s 25.42%  reflect.Value.Method                                                                               
    33.11s  4.22% 18.51%     54.20s  6.90%  runtime.gfput                                                                                      
    30.16s  3.84% 22.35%     52.32s  6.66%  runtime.sigprof                                                                                    
    25.49s  3.25% 25.59%     25.49s  3.25%  github.com/glycerine/zebrapack/msgp.(*NilBitsStack).ReadFloat64Bytes                               
    24.51s  3.12% 28.72%     96.87s 12.34%  runtime.newdefer.func1                                                                             
    23.11s  2.94% 31.66%     23.11s  2.94%  runtime.dopanic_m                                                                                  
    23.10s  2.94% 34.60%     40.13s  5.11%  runtime.netpoll                                                                                    
    22.51s  2.87% 37.47%    382.14s 48.67%  crypto/tls.(*certificateRequestMsg).unmarshal                                                      
(pprof) top10 -cum                                                                                                                             
2.22mins of 13.09mins total (16.97%)                                                                                                           
Dropped 106 nodes (cum <= 0.07mins)                                                                                                            
Showing top 10 nodes out of 69 (cum >= 3.11mins)                                                                                               
      flat  flat%   sum%        cum   cum%                                                                                                     
         0     0%     0%  12.89mins 98.47%  type..hash.runtime.traceStackTable                                                                 
  0.12mins  0.96%  0.96%   6.85mins 52.33%  os.(*File).ReadAt                                                                                  
  0.35mins  2.71%  3.66%   6.72mins 51.38%  encoding/gob.init                                                                                  
  0.38mins  2.87%  6.53%   6.37mins 48.67%  crypto/tls.(*certificateRequestMsg).unmarshal                                                      
  0.18mins  1.39%  7.92%   5.99mins 45.80%  crypto/tls.(*certificateRequestMsg).marshal                                                        
  0.14mins  1.04%  8.96%   5.57mins 42.60%  fmt.Fprintln                                                                                       
  0.19mins  1.45% 10.41%   4.15mins 31.72%  os.(*File).Read                                                                                    
  0.24mins  1.85% 12.26%   3.61mins 27.61%  fmt.(*pp).badVerb                                                                                  
  0.57mins  4.39% 16.64%   3.33mins 25.42%  reflect.Value.Method                                                                               
  0.04mins  0.32% 16.97%   3.11mins 23.79%  reflect.Value.SetComplex                                                                           
(pprof) svg                                                                                                                                    
Generating report in profile001.svg                                                                                                            
(pprof)    


Ian Lance Taylor

unread,
Feb 7, 2017, 12:55:00 PM2/7/17
to Jason E. Aten, golang-nuts
It means that a profiling signal was received while executing code
that was not written in Go. It should only happen in a program that
uses cgo.

Ian

Jason E. Aten

unread,
Feb 7, 2017, 6:09:23 PM2/7/17
to Ian Lance Taylor, golang-nuts
On Tue, Feb 7, 2017 at 11:54 AM, Ian Lance Taylor <ia...@golang.org> wrote
>
> Ok. Any idea was runtime._ExternalCode means?

It means that a profiling signal was received while executing code
that was not written in Go.  It should only happen in a program that
uses cgo.

Suprisingly, that does not seem to be the case.. After figuring out how to compile without cgo (c.f. https://github.com/golang/go/issues/18981), and how to reliably create profiles, my profile still shows 50% of the programs time being taken up by this mystery bucket of _ExternalCode:

(pprof) top10 -cum                                                                        
134.12s of 266.39s total (50.35%)                                                         
Dropped 40 nodes (cum <= 1.33s)                                                           
Showing top 10 nodes out of 46 (cum >= 89.93s)                                            
      flat  flat%   sum%        cum   cum%                                                
         0     0%     0%    143.12s 53.73%  runtime.goexit                                
     2.97s  1.11%  1.11%    143.09s 53.71%  github.com/glycerine/vhaline/vhaline.(*srvReader).start.func1                                                                           
   123.26s 46.27% 47.39%    123.26s 46.27%  runtime._ExternalCode                         
         0     0% 47.39%    123.26s 46.27%  runtime._System                               
     0.59s  0.22% 47.61%     98.03s 36.80%  github.com/glycerine/tmframe2.(*FrameReader).NextFrame                                                                                  
     0.75s  0.28% 47.89%     97.44s 36.58%  github.com/glycerine/tmframe2.(*FrameReader).PeekNextFrameBytes                                                                         
     1.97s  0.74% 48.63%     96.69s 36.30%  bufio.(*Reader).Peek                          
     1.05s  0.39% 49.02%     94.72s 35.56%  bufio.(*Reader).fill                          
     1.36s  0.51% 49.53%     93.45s 35.08%  net.(*conn).Read                              
     2.17s  0.81% 50.35%     89.93s 33.76%  net.(*netFD).Read                             
(pprof)    

 

profile001.svg

Ian Lance Taylor

unread,
Feb 7, 2017, 6:48:39 PM2/7/17
to Jason E. Aten, golang-nuts
On Tue, Feb 7, 2017 at 3:08 PM, Jason E. Aten <j.e....@gmail.com> wrote:
> On Tue, Feb 7, 2017 at 11:54 AM, Ian Lance Taylor <ia...@golang.org> wrote
>>
>> >
>> > Ok. Any idea was runtime._ExternalCode means?
>>
>> It means that a profiling signal was received while executing code
>> that was not written in Go. It should only happen in a program that
>> uses cgo.
>
>
> Suprisingly, that does not seem to be the case.. After figuring out how to
> compile without cgo (c.f. https://github.com/golang/go/issues/18981), and
> how to reliably create profiles, my profile still shows 50% of the programs
> time being taken up by this mystery bucket of _ExternalCode:

I have no explanation for that.

What does `ldd PROGRAM` print for your program?

Ian

Jason E. Aten

unread,
Feb 7, 2017, 8:53:35 PM2/7/17
to Ian Lance Taylor, golang-nuts
Yes, ldd is how I confirmed I didn't have cgo:

[jaten@biggie no.cgo]$ ldd ./vhaline-5min                                                 
    not a dynamic executable                                                              
[jaten@biggie no.cgo]$    

Dave Cheney

unread,
Feb 7, 2017, 8:57:17 PM2/7/17
to golang-nuts
You mentioned that this was reproducible with 1.7.x. it might be worth sticking to that version to avoid having to concurrently debug this external code issue.

Jason E. Aten

unread,
Feb 7, 2017, 10:28:26 PM2/7/17
to Dave Cheney, golang-nuts
> ... reproducible with 1.7.x. it might be worth sticking to that version

Good suggestion!

Oddly enough, go1.7.4 shows the same thing...

[jaten@biggie no.cgo]$ ldd ./vhaline.174.nocgo                                            
    not a dynamic executable                                                              
[jaten@biggie no.cgo]$ go tool pprof ./vhaline.174.nocgo cpu-profile.120387

Entering interactive mode (type "help" for commands)
(pprof) top10 -cum                                                                        
103.82s of 224.70s total (46.20%)                                                         
Dropped 25 nodes (cum <= 1.12s)                                                           
Showing top 10 nodes out of 57 (cum >= 80.28s)                                            
      flat  flat%   sum%        cum   cum%                                                
         0     0%     0%    128.94s 57.38%  runtime.goexit                                
     3.18s  1.42%  1.42%    128.93s 57.38%  github.com/glycerine/vhaline/vhaline.(*srvRead\
er).start.func1                                                                           
         0     0%  1.42%     95.76s 42.62%  runtime._System                               
    94.03s 41.85% 43.26%     94.03s 41.85%  runtime._ExternalCode                         
     0.76s  0.34% 43.60%     87.29s 38.85%  github.com/glycerine/tmframe2.(*FrameReader).N\
extFrame                                                                                  
     0.49s  0.22% 43.82%     86.53s 38.51%  github.com/glycerine/tmframe2.(*FrameReader).P\
eekNextFrameBytes                                                                         
     1.88s  0.84% 44.66%     86.04s 38.29%  bufio.(*Reader).Peek                          
     0.88s  0.39% 45.05%     84.16s 37.45%  bufio.(*Reader).fill                          
     0.92s  0.41% 45.46%     83.14s 37.00%  net.(*conn).Read                              
     1.68s  0.75% 46.20%     80.28s 35.73%  net.(*netFD).Read                             
(pprof)



On Tue, Feb 7, 2017 at 7:57 PM, Dave Cheney <da...@cheney.net> wrote:
You mentioned that this was reproducible with 1.7.x. it might be worth sticking to that version to avoid having to concurrently debug this external code issue.
vhaline-profile-no-cfg-go174.svg
Reply all
Reply to author
Forward
0 new messages