Observing Large GC STW time during mark setup phase

198 views
Skip to first unread message

Neetish Pathak

unread,
Jan 24, 2023, 8:34:31 AM1/24/23
to golang-nuts

Hi All,

Golang version 1.18.2

Have a kvstore implemented in golang. There is a proxy layer that communicates with storage that uses CGo for data access from storage engine. Both proxy and storage are implemented in golang. I observe that there are occasional spikes in the latency.  (< 0.01% of client request timeout due to this).

This is affecting the tail latency performance. Traces on both proxy and storage show that there are high latency in STW (mark setup) phase. Sometimes , it is higher than 100ms as highlighted in shared trace. Need advice to debug further and understand the potential reasons.

In my test setup, 

GCP VM: vCPu 64 , 500GB RAM

Linux, Ubuntu 16.04

Go version: 1.18.2

with high payloads request, high STW are happening often.

With low payloads, this  behavior is rare.


I do not have much experience with golang internals . Curious to learn. Appreciate your help.


Sharing traces for high payload tests

gc 34757 @5898.241s 1%: 11+2.0+0.20 ms clock, 747+1.2/13/0.008+12 ms cpu, 42->44->gc 2135135 MB,  @5898.23042s  MB goal, 10%:  MB stacks, 21+02.2 MB globals, +0.07964 ms clock,  P1398

gc 35136 @5898.365s 1%: 54+2.1+0.16 ms clock, 3507+0.13/13/0+10 ms cpu, 44->45->22 MB, 48 MB goal, 0 MB stacks, 0 MB globals, 64 P

gc 32914 @5898.272s 1%: 147+2.3+0.19 ms clock, 9465+0.27/15/0.008+12 ms cpu, 44->47->24 MB, 47 MB goal, 0 MB stacks, 0 MB globals, 64 P

gc 34758 @5898.411s 1%: 9.4+2.6+0.20 ms clock, 606+0.26/11/0+12 ms cpu, 41->43->21 MB, 43 MB goal, 0 MB stacks, 0 MB globals, 64 P

gc 35137 @5898.488s 1%: 0.22+1.5+0.13 ms clock, 14+0.22/12/0.17+8.8 ms cpu, 39->41->23 MB, 46 MB goal, 0 MB stacks, 0 MB globals, 64 P

gc 35079 @5898.554s 1%: 0.11+1.4+0.18 ms clock, 7.2+0/11/0.22+11 ms cpu, 47->48->22 MB, 48 MB goal, 0 MB stacks, 0 MB globals, 64 P

gc 34759 @5898.574s 1%: 0.19+1.6+0.084 ms clock, 12+0.27/12/0.49+5.3 ms cpu, 44->44->21 MB, 44 MB goal, 0 MB stacks, 0 MB globals, 64 P

gc 35138 @5898.580s 1%: 0.24+1.6+0.15 ms clock, 15+0.81/12/0.36+9.6 ms cpu, 43->44->23 MB, 47 MB goal, 0 MB stacks, 0 MB globals, 64 P

gc 32915 @5898.680s 1%: 0.12+1.7+0.099 ms clock, 8.2+0.35/12/0.093+6.3 ms cpu, 47->49->23 MB, 50 MB goal, 0 MB stacks, 0 MB globals, 64 P

gc 35139 @5898.688s 1%: 0.30+1.6+0.21 ms clock, 19+0.15/11/0.10+14 ms cpu, 39->41->23 MB, 47 MB goal, 0 MB stacks, 0 MB globals, 64 P

gc 35080 @5898.780s 1%: 0.10+1.8+0.14 ms clock, 7.0+0.64/13/0.60+9.2 ms cpu, 45->46->22 MB, 46 MB goal, 0 MB stacks, 0 MB globals, 64 P

gc 34760 @5898.838s 1%: 124+1.9+0.14 ms clock, 7955+0.59/12/0+9.5 ms cpu, 42->43->21 MB, 43 MB goal, 0 MB stacks, 0 MB globals, 64 P

gc 35081 @5898.894pacer: s 125%: 69% CPU (+252.2 exp.) for +0.1415557280 ms clock, +4420571392+0.10+/32993615/ B work (0.09816468824+ B exp.) 9.4gc in  ms cpu, 444600279332916-> B -> 46 @48641064->5898.81823 B (∆goal s  MB, 1-45%:  MB goal, 6339921440 MB stacks, +, cons/mark 01.7+6.634698e-002+ MB globals, )0.4164 P ms clock, 

gc 35140 @5898.821s 1%: 141+2.6+0.25 ms clock, 9049+0.061/14/0+16 ms cpu, 43->46->24 MB, 46 MB goal, 0 MB stacks, 0 MB globals, 64 P

gc 35082 @5899.017s 1%: 0.18+1.9+0.10 ms clock, 11+1.9/12/0.29+6.9 ms cpu, 47->49->23 MB, 48 MB goal, 0 MB stacks, 0 MB globals, 64 P

gc 32917 @5899.058s 1%: 0.21+1.5+0.096 ms clock, 13+0.59/12/0.33+6.1 ms cpu, 45->46->23 MB, 47 MB goal, 0 MB stacks, 0 MB globals, 64 P

gc 35141 @5899.062s 1%: 0.20+1.5+0.11 ms clock, 12+0.26/14/0.31+7.4 ms cpu, 47->49->23 MB, 48 MB goal, 0 MB stacks, 0 MB globals, 64 P

gc 35083 @5899.138s 1%: 0.14+1.3+0.048 ms clock, 9.1+0.51/11/0.22+3.0 ms cpu, 45->46->22 MB, 47 MB goal, 0 MB stacks, 0 MB globals, 64 P

gc 34761 @5899.160s 1%: 0.18+1.7+0.053 ms clock, 11+0.32/9.4/2.1+3.4 ms cpu, 42->43->21 MB, 43 MB goal, 0 MB stacks, 0 MB globals, 64 P

gc 32918 @5899.187s 1%: 0.26+1.6+0.087 ms clock, 16+0.55/11/0.19+5.6 ms cpu, 46->47->23 MB, 47 MB goal, 0 MB stacks, 0 MB globals, 64 P

gc 35142 @5899.206s 1%: 0.21+1.3+0.063 ms clock, 13+0.80/11/0.14+4.0 ms cpu, 45->46->22 MB, 48 MB goal, 0 MB stacks, 0 MB globals, 64 P

gc 32919 @5899.322s 1%: 0.11+1.3+0.080 ms clock, 7.4+0.39/11/0.34+5.1 ms cpu, 45->46->23 MB, 47 MB goal, 0 MB stacks, 0 MB globals, 64 P

gc 35143 @5899.352s 1%: 0.20+1.8+0.11 ms clock, 13+0.30/12/0.63+7.6 ms cpu, 45->47->23 MB, 46 MB goal, 0 MB stacks, 0 MB globals, 64 P

gc 35084 @5899.368s 1%: 0.21+1.5+0.094 ms clock, 13+1.7/11/0.20+6.0 ms cpu, 45->47->23 MB, 46 MB goal, 0 MB stacks, 0 MB globals, 64 P


Full_Trace_high_payload: https://raw.githubusercontent.com/NeetishPathak/GolangDebug/main/0110_proxy_gc_high_payload

Full_Trace_low_payload: https://raw.githubusercontent.com/NeetishPathak/GolangDebug/main/0122_proxy_gc_low_payload

I cannot share the code as it is proprietary.

Thanks

Neetish Pathak

unread,
Jan 30, 2023, 2:50:40 PM1/30/23
to golang-nuts
This seems to be related to https://github.com/golang/go/issues/31222
does runtime.mallocgc blocks the pre-emption in go1.18 ? We see long sweep termination phase. Allocation size is 32K - 64K.  
Reply all
Reply to author
Forward
0 new messages