--
PauseNs = [193362 139849 215886 354955 1139166 869210 885827 749825 717999 739718 796862 758218 830196 875997 813210 833146 753679 666689 2244061 3024867 6223584 12751824 15244126 14108606 15190071 15242359 15481447 14515872 14899972 16844133 20233153 20703863 19931616 19677763 19687679 20194490 18716629 20782510 21587603 20450057 19975239 21727241 20537230 19641046 19978267 21632859 19754527 19206874 19117572 18173713 18552263 19050671 19651854 19073441 19009287 19082801 20596580 20736944 21126457 19498226 19614106 19492911 19338896 18720463 20489838 18724089 19249127 19367583 19632704 21272779 19576836 19733512 19743549 19087672 19130331 20012132 19663709 18826675 20014375 19818349 19444131 19237984 20459478 23593077 20263337 19909322 19316240 19804505 19464156 21222421 18852788 19259735 19153750 18839483 21134444 19009301 18434221 19060135 19368925 20276724 19168876 19362019 19468423 19158909 20239024 18832774 20567718 20443718 19463311 18945032 19266531 20384705 18977398 18895428 19766300 19667944 19742453 20155593 19345293 24220071 19466362 19949797 18932857 21698370 26644424 19660098 20092514 19815533 21387015 20460447 19776343 19915475 19707494 21028323 20919843 20171274 20258974 19711202 19818581 19488355 21913258 20883622 22296698 20558774 21289753 20600677 22462309 19938073 19596276 21299690 19598157 20036925 22244357 20471555 21173985 20008180 20245849 20237666 21295721 21206428 22670985 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# NumGC = 161
of this last 132 instances are triggered during 100s of stress testing. This means during 100 s it stopped for ~20ms 132 times, which does not really work for realtime application.
Can you please share your thoughts?
Hi Dmitry,Thanks! I tested the scenario and difference is marginal - for less concurrency (which translated upto 24000 go-routines creation per second) creating go routines on the fly beats go-routine pool. For concurrency above this pool beats go routine creation on the fly. Though difference in rps < 10% in both cases.During this,1. I found that for each persistent TCP connection go maintains two go-routines internally. Won't this become a bottleneck at high scale (consider server having >50k connections), #goroutines will exceed > 100k. I think # will affect schedule performance.
2. Garbage collection - in my application - http://play.golang.org/p/a5GathVxt3 (application it connects to respond 1k data with random response time between 40ms to 100ms) garbage collection at scale is very high. When I do a stresstesting on this machine, with 400 concurrent requests - my garbage collection times are:
PauseNs = [193362 139849 215886 354955 1139166 869210 885827 749825 717999 739718 796862 758218 830196 875997 813210 833146 753679 666689 2244061 3024867 6223584 12751824 15244126 14108606 15190071 15242359 15481447 14515872 14899972 16844133 20233153 20703863 19931616 19677763 19687679 20194490 18716629 20782510 21587603 20450057 19975239 21727241 20537230 19641046 19978267 21632859 19754527 19206874 19117572 18173713 18552263 19050671 19651854 19073441 19009287 19082801 20596580 20736944 21126457 19498226 19614106 19492911 19338896 18720463 20489838 18724089 19249127 19367583 19632704 21272779 19576836 19733512 19743549 19087672 19130331 20012132 19663709 18826675 20014375 19818349 19444131 19237984 20459478 23593077 20263337 19909322 19316240 19804505 19464156 21222421 18852788 19259735 19153750 18839483 21134444 19009301 18434221 19060135 19368925 20276724 19168876 19362019 19468423 19158909 20239024 18832774 20567718 20443718 19463311 18945032 19266531 20384705 18977398 18895428 19766300 19667944 19742453 20155593 19345293 24220071 19466362 19949797 18932857 21698370 26644424 19660098 20092514 19815533 21387015 20460447 19776343 19915475 19707494 21028323 20919843 20171274 20258974 19711202 19818581 19488355 21913258 20883622 22296698 20558774 21289753 20600677 22462309 19938073 19596276 21299690 19598157 20036925 22244357 20471555 21173985 20008180 20245849 20237666 21295721 21206428 22670985 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# NumGC = 161
of this last 132 instances are triggered during 100s of stress testing. This means during 100 s it stopped for ~20ms 132 times, which does not really work for realtime application.
On Tuesday, 9 September 2014 08:45:52 UTC+10, Suraj Narkhede wrote:Hi Dmitry,Thanks! I tested the scenario and difference is marginal - for less concurrency (which translated upto 24000 go-routines creation per second) creating go routines on the fly beats go-routine pool. For concurrency above this pool beats go routine creation on the fly. Though difference in rps < 10% in both cases.During this,1. I found that for each persistent TCP connection go maintains two go-routines internally. Won't this become a bottleneck at high scale (consider server having >50k connections), #goroutines will exceed > 100k. I think # will affect schedule performance.Nope. It is unlikely more than a handful of goroutines will be runnable at any given time, goroutines that are not runnable don't impact the scheduler.
2. Garbage collection - in my application - http://play.golang.org/p/a5GathVxt3 (application it connects to respond 1k data with random response time between 40ms to 100ms) garbage collection at scale is very high. When I do a stresstesting on this machine, with 400 concurrent requests - my garbage collection times are:Thanks for providing sample code, it looks like your benchmark data contains both the client and the server. Can you split those apart into two pieces so that the overhead of the client are not reported against the server and vice versa.
PauseNs = [193362 139849 215886 354955 1139166 869210 885827 749825 717999 739718 796862 758218 830196 875997 813210 833146 753679 666689 2244061 3024867 6223584 12751824 15244126 14108606 15190071 15242359 15481447 14515872 14899972 16844133 20233153 20703863 19931616 19677763 19687679 20194490 18716629 20782510 21587603 20450057 19975239 21727241 20537230 19641046 19978267 21632859 19754527 19206874 19117572 18173713 18552263 19050671 19651854 19073441 19009287 19082801 20596580 20736944 21126457 19498226 19614106 19492911 19338896 18720463 20489838 18724089 19249127 19367583 19632704 21272779 19576836 19733512 19743549 19087672 19130331 20012132 19663709 18826675 20014375 19818349 19444131 19237984 20459478 23593077 20263337 19909322 19316240 19804505 19464156 21222421 18852788 19259735 19153750 18839483 21134444 19009301 18434221 19060135 19368925 20276724 19168876 19362019 19468423 19158909 20239024 18832774 20567718 20443718 19463311 18945032 19266531 20384705 18977398 18895428 19766300 19667944 19742453 20155593 19345293 24220071 19466362 19949797 18932857 21698370 26644424 19660098 20092514 19815533 21387015 20460447 19776343 19915475 19707494 21028323 20919843 20171274 20258974 19711202 19818581 19488355 21913258 20883622 22296698 20558774 21289753 20600677 22462309 19938073 19596276 21299690 19598157 20036925 22244357 20471555 21173985 20008180 20245849 20237666 21295721 21206428 22670985 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# NumGC = 161
of this last 132 instances are triggered during 100s of stress testing. This means during 100 s it stopped for ~20ms 132 times, which does not really work for realtime application.
Can you run your benchmark withGODEBUG=gctrace=1Those numbers are usually easier to understand.
10.5 100.0% 100.0% 10.5 100.0% runtime.gettype
On 17 Sep 2014 04:28, "Suraj Narkhede" <suraj...@gmail.com> wrote:
>
> Thanks Dmitry. Completed heap profiling as suggested by you. I am looking into net/http, if allocations could be reduced.
>
> My primary concern is gc time when net/http library is used heavily.
> For a simple hello world server (with 40 to 100ms response time), - http://play.golang.org/p/w0AqIP6a8w, when tested with wrk --latency -d 10 -c 2000 -t 24 (this leads to 28k rps) (CPU-11% network io: 24Mbps in 54Mbps out), I get gc pauses as -
> gc30(8): 10+23+3360+160 us, 22 -> 44 MB, 240321 (3126041-2885720) objects, 6585/4441/0 sweeps,
3.36ms to sweep a small heap like this sounds wrong. Dmitry, is my estimation off?