What is better - high number of go routine creation per sec for network IO OR go routine pool and communication using channel.

2,090 views
Skip to first unread message

Suraj Narkhede

unread,
Sep 3, 2014, 3:23:01 PM9/3/14
to golan...@googlegroups.com
Please share your thoughts on - What is scalable to do very high number of network calls -

func goRoutine() {
    for {
        req := <-signal
        resp, err := <Do Outbound network call - can be redis call or HTTP call using http.client or anything>
       if err != nil {
            req.resp <- nil       // <- On a side note - is this allowed? Program crashes when this case is triggered, shall I use dummy instead of nil to send on channel?
            continue
       }
       req.resp <- resp   //req.resp is buffered channel, created by caller.
   }
}

func caller() {
    resp := make(chan T, size)
    for i := 1; i < n; i++ {
         signal <- req{REQBODY[i], resp)                      
    }
     Select call to wait for response
}

Or normal approach of creating go routine, passing channel and waiting on channel + timeout.

I am basically interested in the cost of go routine creation vs cost of extra channel communication + maintaining go routine pool(any scheduling overhead?).

I did few tests, but could not conclude anything, will be doing more later this week.

Klaus Post

unread,
Sep 3, 2014, 3:59:17 PM9/3/14
to golan...@googlegroups.com
Hi!

I am in no terms an expert, and you should of course test things out if they are critical to your application.

From my experience, go with the goroutine calls (sorry for the pun). Doing a pool would require synchronization of the signal channel, which isn't really needed. It is still fairly lightweight, but will scale worse than independent goroutines on multicore machines. Also, if all workers in your pool is waiting for IO nothing will be done - with goroutines you don't run that "risk".

Don't be too afraid of goroutines - they are far more lightweight than threads in traditional terms.

That being said, in most cases the performance difference will likely be quite small.

Dmitry Vyukov

unread,
Sep 4, 2014, 3:56:41 AM9/4/14
to Suraj Narkhede, golang-nuts
The fastest is just directly making the http request (setting request
timeout if necessary).
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Skip Tavakkolian

unread,
Sep 4, 2014, 4:56:48 AM9/4/14
to Suraj Narkhede, golang-nuts
if you want to manage the resources  (e.g. # of concurrent tcp connections) from the program then a dispatcher/worker pool model might be ok.

but it is better to create the goroutines as needed because the runtime will limit number of simultaneous OS threads (running threads, not blocked on i/o) according to GOMAXPROCS and network i/o is at least several factors slower than channel communication.

a hybrid model could have an upper limit on the number of outstanding goroutines; something like this:



--

Suraj Narkhede

unread,
Sep 5, 2014, 3:09:40 AM9/5/14
to golan...@googlegroups.com, suraj...@gmail.com
Hi Dmitry - thanks for reply!

One http request spawns into multiple outgoing http requests (>10), which need to be done parallel. So I need to do it using go routines. Now to achieve lets say throughput of 10k rps, I will still be doing >100k outgoing http calls per second. Each call have a timeout of 100ms. In this case can you please advise - what can be a more scalable approach? 

Dmitry Vyukov

unread,
Sep 5, 2014, 3:24:35 AM9/5/14
to Suraj Narkhede, golang-nuts
On Fri, Sep 5, 2014 at 11:09 AM, Suraj Narkhede <suraj...@gmail.com> wrote:
> Hi Dmitry - thanks for reply!
>
> One http request spawns into multiple outgoing http requests (>10), which
> need to be done parallel. So I need to do it using go routines. Now to
> achieve lets say throughput of 10k rps, I will still be doing >100k outgoing
> http calls per second. Each call have a timeout of 100ms. In this case can
> you please advise - what can be a more scalable approach?


I think that spawning 10 goroutines must be faster than goroutine pool
in a highly-loaded, highly-parallel server.

Suraj Narkhede

unread,
Sep 8, 2014, 6:45:52 PM9/8/14
to golan...@googlegroups.com, suraj...@gmail.com
Hi Dmitry,

Thanks! I tested the scenario and difference is marginal - for less concurrency (which translated upto 24000 go-routines creation per second) creating go routines on the fly beats go-routine pool. For concurrency above this pool beats go routine creation on the fly. Though difference in rps < 10% in both cases.

During this, 
1. I found that for each persistent TCP connection go maintains two go-routines internally. Won't this become a bottleneck at high scale (consider server having >50k connections), #goroutines will exceed > 100k. I think # will affect schedule performance.
2. Garbage collection - in my application  - http://play.golang.org/p/a5GathVxt3 (application it connects to respond 1k data with random response time between 40ms to 100ms) garbage collection at scale is very high. When I do a stresstesting on this machine, with 400 concurrent requests - my garbage collection times are: 

PauseNs = [193362 139849 215886 354955 1139166 869210 885827 749825 717999 739718 796862 758218 830196 875997 813210 833146 753679 666689 2244061 3024867 6223584 12751824 15244126 14108606 15190071 15242359 15481447 14515872 14899972 16844133 20233153 20703863 19931616 19677763 19687679 20194490 18716629 20782510 21587603 20450057 19975239 21727241 20537230 19641046 19978267 21632859 19754527 19206874 19117572 18173713 18552263 19050671 19651854 19073441 19009287 19082801 20596580 20736944 21126457 19498226 19614106 19492911 19338896 18720463 20489838 18724089 19249127 19367583 19632704 21272779 19576836 19733512 19743549 19087672 19130331 20012132 19663709 18826675 20014375 19818349 19444131 19237984 20459478 23593077 20263337 19909322 19316240 19804505 19464156 21222421 18852788 19259735 19153750 18839483 21134444 19009301 18434221 19060135 19368925 20276724 19168876 19362019 19468423 19158909 20239024 18832774 20567718 20443718 19463311 18945032 19266531 20384705 18977398 18895428 19766300 19667944 19742453 20155593 19345293 24220071 19466362 19949797 18932857 21698370 26644424 19660098 20092514 19815533 21387015 20460447 19776343 19915475 19707494 21028323 20919843 20171274 20258974 19711202 19818581 19488355 21913258 20883622 22296698 20558774 21289753 20600677 22462309 19938073 19596276 21299690 19598157 20036925 22244357 20471555 21173985 20008180 20245849 20237666 21295721 21206428 22670985 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

# NumGC = 161

of this last 132 instances are triggered during 100s of stress testing. This means during 100 s it stopped for ~20ms 132 times, which does not really work for realtime application.

Can you please share your thoughts?


Dave Cheney

unread,
Sep 8, 2014, 8:46:13 PM9/8/14
to golan...@googlegroups.com, suraj...@gmail.com


On Tuesday, 9 September 2014 08:45:52 UTC+10, Suraj Narkhede wrote:
Hi Dmitry,

Thanks! I tested the scenario and difference is marginal - for less concurrency (which translated upto 24000 go-routines creation per second) creating go routines on the fly beats go-routine pool. For concurrency above this pool beats go routine creation on the fly. Though difference in rps < 10% in both cases.

During this, 
1. I found that for each persistent TCP connection go maintains two go-routines internally. Won't this become a bottleneck at high scale (consider server having >50k connections), #goroutines will exceed > 100k. I think # will affect schedule performance.

Nope. It is unlikely more than a handful of goroutines will be runnable at any given time, goroutines that are not runnable don't impact the scheduler.
 
2. Garbage collection - in my application  - http://play.golang.org/p/a5GathVxt3 (application it connects to respond 1k data with random response time between 40ms to 100ms) garbage collection at scale is very high. When I do a stresstesting on this machine, with 400 concurrent requests - my garbage collection times are: 

Thanks for providing sample code, it looks like your benchmark data contains both the client and the server. Can you split those apart into two pieces so that the overhead of the client are not reported against the server and vice versa.
 

PauseNs = [193362 139849 215886 354955 1139166 869210 885827 749825 717999 739718 796862 758218 830196 875997 813210 833146 753679 666689 2244061 3024867 6223584 12751824 15244126 14108606 15190071 15242359 15481447 14515872 14899972 16844133 20233153 20703863 19931616 19677763 19687679 20194490 18716629 20782510 21587603 20450057 19975239 21727241 20537230 19641046 19978267 21632859 19754527 19206874 19117572 18173713 18552263 19050671 19651854 19073441 19009287 19082801 20596580 20736944 21126457 19498226 19614106 19492911 19338896 18720463 20489838 18724089 19249127 19367583 19632704 21272779 19576836 19733512 19743549 19087672 19130331 20012132 19663709 18826675 20014375 19818349 19444131 19237984 20459478 23593077 20263337 19909322 19316240 19804505 19464156 21222421 18852788 19259735 19153750 18839483 21134444 19009301 18434221 19060135 19368925 20276724 19168876 19362019 19468423 19158909 20239024 18832774 20567718 20443718 19463311 18945032 19266531 20384705 18977398 18895428 19766300 19667944 19742453 20155593 19345293 24220071 19466362 19949797 18932857 21698370 26644424 19660098 20092514 19815533 21387015 20460447 19776343 19915475 19707494 21028323 20919843 20171274 20258974 19711202 19818581 19488355 21913258 20883622 22296698 20558774 21289753 20600677 22462309 19938073 19596276 21299690 19598157 20036925 22244357 20471555 21173985 20008180 20245849 20237666 21295721 21206428 22670985 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

# NumGC = 161

of this last 132 instances are triggered during 100s of stress testing. This means during 100 s it stopped for ~20ms 132 times, which does not really work for realtime application.

Can you run your benchmark with 

GODEBUG=gctrace=1

Those numbers are usually easier to understand.

Suraj Narkhede

unread,
Sep 9, 2014, 3:31:03 AM9/9/14
to golan...@googlegroups.com, suraj...@gmail.com


On Monday, September 8, 2014 5:46:13 PM UTC-7, Dave Cheney wrote:


On Tuesday, 9 September 2014 08:45:52 UTC+10, Suraj Narkhede wrote:
Hi Dmitry,

Thanks! I tested the scenario and difference is marginal - for less concurrency (which translated upto 24000 go-routines creation per second) creating go routines on the fly beats go-routine pool. For concurrency above this pool beats go routine creation on the fly. Though difference in rps < 10% in both cases.

During this, 
1. I found that for each persistent TCP connection go maintains two go-routines internally. Won't this become a bottleneck at high scale (consider server having >50k connections), #goroutines will exceed > 100k. I think # will affect schedule performance.

Nope. It is unlikely more than a handful of goroutines will be runnable at any given time, goroutines that are not runnable don't impact the scheduler.

Thanks. I will try to run a test with 40k idle connections and share the results here.

 
2. Garbage collection - in my application  - http://play.golang.org/p/a5GathVxt3 (application it connects to respond 1k data with random response time between 40ms to 100ms) garbage collection at scale is very high. When I do a stresstesting on this machine, with 400 concurrent requests - my garbage collection times are: 

Thanks for providing sample code, it looks like your benchmark data contains both the client and the server. Can you split those apart into two pieces so that the overhead of the client are not reported against the server and vice versa.

Actually my use case is like this - wherein application acts both as a server and clients. Each incoming requests spawns into 10 outgoing requests which have timeout of 120 ms and generally take 40 to 100ms to respond. 
 

PauseNs = [193362 139849 215886 354955 1139166 869210 885827 749825 717999 739718 796862 758218 830196 875997 813210 833146 753679 666689 2244061 3024867 6223584 12751824 15244126 14108606 15190071 15242359 15481447 14515872 14899972 16844133 20233153 20703863 19931616 19677763 19687679 20194490 18716629 20782510 21587603 20450057 19975239 21727241 20537230 19641046 19978267 21632859 19754527 19206874 19117572 18173713 18552263 19050671 19651854 19073441 19009287 19082801 20596580 20736944 21126457 19498226 19614106 19492911 19338896 18720463 20489838 18724089 19249127 19367583 19632704 21272779 19576836 19733512 19743549 19087672 19130331 20012132 19663709 18826675 20014375 19818349 19444131 19237984 20459478 23593077 20263337 19909322 19316240 19804505 19464156 21222421 18852788 19259735 19153750 18839483 21134444 19009301 18434221 19060135 19368925 20276724 19168876 19362019 19468423 19158909 20239024 18832774 20567718 20443718 19463311 18945032 19266531 20384705 18977398 18895428 19766300 19667944 19742453 20155593 19345293 24220071 19466362 19949797 18932857 21698370 26644424 19660098 20092514 19815533 21387015 20460447 19776343 19915475 19707494 21028323 20919843 20171274 20258974 19711202 19818581 19488355 21913258 20883622 22296698 20558774 21289753 20600677 22462309 19938073 19596276 21299690 19598157 20036925 22244357 20471555 21173985 20008180 20245849 20237666 21295721 21206428 22670985 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

# NumGC = 161

of this last 132 instances are triggered during 100s of stress testing. This means during 100 s it stopped for ~20ms 132 times, which does not really work for realtime application.

Can you run your benchmark with 

GODEBUG=gctrace=1

Those numbers are usually easier to understand.

Please find it attached. This ~20ms pause time really makes it unsuitable for our use case now, because in that 20 ms pause time I see my fetches from redis/aerospike are getting timed out since I have only 1ms TO there. 
gnuts.out

Dave Cheney

unread,
Sep 9, 2014, 3:59:01 AM9/9/14
to Suraj Narkhede, golang-nuts
So during that time you are carrying the allocation cost of servicing
that incoming connection on your heap. What mitigations are you
putting in place to limit the amount of incoming work ?
> You received this message because you are subscribed to a topic in the
> Google Groups "golang-nuts" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/golang-nuts/kfvPQOwCRLU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to

Suraj Narkhede

unread,
Sep 9, 2014, 1:29:22 PM9/9/14
to Dave Cheney, golang-nuts
None currently. 
- Aim of this exercise is evaluating this application using Go and other platforms. So rps and response time matters.
- Currently CPU utilization is < 50%. Don't really want to limit incoming requests at this point. 
- I really liked golang and would like to work on ways to increase throughput and have consistent performance (no timeouts during gc) in my application.

Do you think that gc will always be an issue in network bound applications if I am dealing with timeouts of 1ms for some io. Because even if gc  time reduces to 5ms after limiting connections (without significantly affecting throughput), then I feel I may not be able to use it because it will just wipe (time) out all my redis calls and other calls having 1 to 2ms timeout every second.  

Dmitry Vyukov

unread,
Sep 11, 2014, 9:35:25 PM9/11/14
to Suraj Narkhede, Dave Cheney, golang-nuts
On Tue, Sep 9, 2014 at 10:28 AM, Suraj Narkhede <suraj...@gmail.com> wrote:
> None currently.
> - Aim of this exercise is evaluating this application using Go and other
> platforms. So rps and response time matters.
> - Currently CPU utilization is < 50%. Don't really want to limit incoming
> requests at this point.
> - I really liked golang and would like to work on ways to increase
> throughput and have consistent performance (no timeouts during gc) in my
> application.
>
> Do you think that gc will always be an issue in network bound applications
> if I am dealing with timeouts of 1ms for some io. Because even if gc time
> reduces to 5ms after limiting connections (without significantly affecting
> throughput), then I feel I may not be able to use it because it will just
> wipe (time) out all my redis calls and other calls having 1 to 2ms timeout
> every second.

We are working on a concurrent GC which will significantly reduce
pauses. But that will happen in 1.5 or 1.6 (no exact dates yet).
Today your best bet is reducing heap size, and in particular number of
small objects with pointers.
Also do you set GOMAXPROCS? Go GC is parallel, so setting GOMAXPROCS
to number of cores can significantly reduce pause time.
Also if you set GOGC to a higher value, that will reduce frequency of
pauses (but not duration of pauses).

Suraj Narkhede

unread,
Sep 15, 2014, 3:23:34 AM9/15/14
to golan...@googlegroups.com, suraj...@gmail.com, da...@cheney.net
Thanks Dmitry.
I am setting GOMAXPROCS to 24.

I have changed code to allocate no memory explicitly in my application - http://play.golang.org/p/A8rPW_ECJf.
Now I think, all allocations are done in go packages.

While testing with - wrk  -t 24  --latency "http://ip:8081/" -c 120 -d 1000, I get 3-5 ms pauses, 2/3 times per second. 
At full capacity on test setup, with c=1200, I get 50 to 65ms pauses/sec. 

I am not sure if I profiled it correctly - for heap profile output is -
Total: 10.5 MB

10.5 100.0% 100.0%     10.5 100.0% runtime.gettype

list gettype gives :  10.5   10.5  802: t = data[8*sizeof(uintptr) + ofs/s->elemsize];

1. If profiling above is not correct, is there a way to profile go packages? I could not locate where allocations are happening in go packages through heap profile using "net/http/pprof".
2. Are there any plans or work going on to reduce garbage collection overhead in net/http?

Thanks for - http://go-talks.appspot.com/code.google.com/p/go-conc/trunk/slides/goruntime.slide#27. This was really helpful to understand ways to decrease gc overhead in go.  

Dmitry Vyukov

unread,
Sep 15, 2014, 3:35:41 AM9/15/14
to Suraj Narkhede, golang-nuts, Dave Cheney, Brad Fitzpatrick
On Mon, Sep 15, 2014 at 12:23 AM, Suraj Narkhede <suraj...@gmail.com> wrote:
> Thanks Dmitry.
> I am setting GOMAXPROCS to 24.
>
> I have changed code to allocate no memory explicitly in my application -
> http://play.golang.org/p/A8rPW_ECJf.
> Now I think, all allocations are done in go packages.
>
> While testing with - wrk -t 24 --latency "http://ip:8081/" -c 120 -d 1000,
> I get 3-5 ms pauses, 2/3 times per second.
> At full capacity on test setup, with c=1200, I get 50 to 65ms pauses/sec.
>
> I am not sure if I profiled it correctly - for heap profile output is -
> Total: 10.5 MB
>
> 10.5 100.0% 100.0% 10.5 100.0% runtime.gettype
>
> list gettype gives : 10.5 10.5 802: t = data[8*sizeof(uintptr) +
> ofs/s->elemsize];
>
> 1. If profiling above is not correct, is there a way to profile go packages?
> I could not locate where allocations are happening in go packages through
> heap profile using "net/http/pprof".

There must be something wrong in they way you profile. gettype does
not allocate. Probably you pprof it a wrong binary. What if you
directly query the profile from the server with:
http://myserver:6060/debug/pprof/heap?debug=1
See the following article for details:
https://software.intel.com/en-us/blogs/2014/05/10/debugging-performance-issues-in-go-programs


> 2. Are there any plans or work going on to reduce garbage collection
> overhead in net/http?

+Brad for this

All low hanging fruits are already collected.
Do you use https?


> Thanks for -
> http://go-talks.appspot.com/code.google.com/p/go-conc/trunk/slides/goruntime.slide#27.
> This was really helpful to understand ways to decrease gc overhead in go.

You are welcome!

Suraj Narkhede

unread,
Sep 16, 2014, 2:28:28 PM9/16/14
to golan...@googlegroups.com, suraj...@gmail.com, da...@cheney.net, brad...@golang.org
Thanks Dmitry. Completed heap profiling as suggested by you. I am looking into net/http, if allocations could be reduced.

My primary concern is gc time when net/http library is used heavily.
For a simple hello world server (with 40 to 100ms response time),  - http://play.golang.org/p/w0AqIP6a8w, when tested with wrk --latency -d 10 -c 2000 -t 24 (this leads to 28k rps) (CPU-11% network io: 24Mbps in 54Mbps out), I get gc pauses as - 
gc30(8): 10+23+3360+160 us, 22 -> 44 MB, 240321 (3126041-2885720) objects, 6585/4441/0 sweeps, 29(2606) handoff, 66(1182) steal, 388/132/21 yields
gc31(8): 17+40+3050+109 us, 25 -> 50 MB, 297711 (3367514-3069803) objects, 6585/3905/0 sweeps, 34(2014) handoff, 51(860) steal, 372/149/19 yields
gc32(8): 21+20+3003+9 us, 25 -> 51 MB, 300292 (3610686-3310394) objects, 6634/4184/0 sweeps, 27(2218) handoff, 44(730) steal, 433/180/20 yields
gc33(8): 23+19+3124+59 us, 25 -> 51 MB, 300567 (3854098-3553531) objects, 6667/4192/0 sweeps, 42(2245) handoff, 59(887) steal, 530/183/20 yields
gc34(8): 32+18+3508+14 us, 25 -> 51 MB, 300467 (4097368-3796901) objects, 6667/3942/0 sweeps, 38(3442) handoff, 46(950) steal, 426/147/23 yields
gc35(8): 25+17+3002+61 us, 25 -> 51 MB, 300356 (4340519-4040163) objects, 6667/3978/0 sweeps, 43(3262) handoff, 42(734) steal, 557/186/19 yields
gc36(8): 8+23+3062+9 us, 25 -> 51 MB, 300559 (4583889-4283330) objects, 6667/3871/0 sweeps, 42(3180) handoff, 28(663) steal, 449/156/14 yields
gc37(8): 30+19+2864+53 us, 25 -> 51 MB, 300292 (4827015-4526723) objects, 6669/3755/0 sweeps, 42(3408) handoff, 47(772) steal, 529/195/20 yields
gc38(8): 22+16+3082+19 us, 25 -> 51 MB, 300074 (5069931-4769857) objects, 6669/3946/0 sweeps, 28(1996) handoff, 32(780) steal, 342/115/22 yields
gc39(8): 26+20+3158+30 us, 25 -> 51 MB, 299869 (5312666-5012797) objects, 6669/3940/0 sweeps, 48(2924) handoff, 32(583) steal, 491/213/17 yields

Now if I have to do cache lookup, e.g. redis with 2ms timeout, then that call will fail during gc. At this point this looks to be a blocker to me to move ahead.
Do you think concurrent gc will help here? More importantly, will it not be having the pauses like we have today? 
Is there any other approach(other than blocking) which I can use so that cache calls (timeout lower than gc pause time) are not wiped out?  

Dmitry Vyukov

unread,
Sep 16, 2014, 2:38:36 PM9/16/14
to Suraj Narkhede, golang-nuts, Dave Cheney, Brad Fitzpatrick
I don't completely understand redis timeout. Your server is a redis
client, right? It does a request. I can understand how a pause inside
of redis server can trigger a timeout in your client. But I don't
understand how a pause in your client affects this. Your client is
free to read the response from the network after any delay.
So what exactly happens and how the delay affects it?

James Bardin

unread,
Sep 16, 2014, 2:44:55 PM9/16/14
to golan...@googlegroups.com, suraj...@gmail.com, da...@cheney.net, brad...@golang.org
I've pondered the idea of making a high-performance http.RoudTripper (and maybe http.Client), by trading off some of the convenience of http.Transport. 

I wonder if we could make a transport that does 0 allocations?

Suraj Narkhede

unread,
Sep 16, 2014, 2:46:40 PM9/16/14
to golan...@googlegroups.com, suraj...@gmail.com, da...@cheney.net, brad...@golang.org
Yes my server is a redis client. I do a redis call with timeout of 2ms. Now if during this timeout garbage collection is triggered then it times out. 
Secondly consider this hypothetical case -
go doDB1(chOut)
go doDB2(chOut)
timeout := time.after(time.Duration(2)*time.Millisecond)
for i := 0; i < 2; i++ {
   select {
       case <-timeout:
              break forloop
       case obj[i] = <- chOut:
   }
}
Now select does not guarantee which event will be delivered first and thus timeout event trigger leads to not reading from channel.

Brad Fitzpatrick

unread,
Sep 16, 2014, 3:27:14 PM9/16/14
to James Bardin, golang-nuts, suraj...@gmail.com, Dave Cheney
If that's a trade-off you need, go for it.

I've spent more time (actually a lot of time) working on the Server's allocations, but I've barely touched optimizing *http.Transport's allocations.  I addressed some of the easier ones, but you could probably do a bunch more easily.

Dmitry Vyukov

unread,
Sep 16, 2014, 3:50:07 PM9/16/14
to Suraj Narkhede, golang-nuts, Dave Cheney, Brad Fitzpatrick
On Tue, Sep 16, 2014 at 11:46 AM, Suraj Narkhede <suraj...@gmail.com> wrote:
> Yes my server is a redis client. I do a redis call with timeout of 2ms. Now
> if during this timeout garbage collection is triggered then it times out.
> Secondly consider this hypothetical case -
> go doDB1(chOut)
> go doDB2(chOut)
> timeout := time.after(time.Duration(2)*time.Millisecond)
> for i := 0; i < 2; i++ {
> select {
> case <-timeout:
> break forloop
> case obj[i] = <- chOut:
> }
> }
> Now select does not guarantee which event will be delivered first and thus
> timeout event trigger leads to not reading from channel.


I can't offer anything magical here.

Can you just increase that timeout? 2ms looks pretty small value for a
full round-trip to a different server. Doesn't it the timeout
occasionally fire even w/o GC?

You can try Go tip, there are some GC-related changes and also limit
on GC threads is bumped from 8 to 32, so if you have more cores that
can help.

You can also bump GC env var to, say, 400. That won't reduce GC
pauses, but will make GCs less frequent.

Another common technique is to split a server into several processes,
so that GC in each process is shorter.

And, yes, concurrent GC is aimed at solving the pause problem. But we
are still far from the actual implementation.

Dave Cheney

unread,
Sep 16, 2014, 6:30:15 PM9/16/14
to Suraj Narkhede, Brad Fitzpatrick, golang-nuts


On 17 Sep 2014 04:28, "Suraj Narkhede" <suraj...@gmail.com> wrote:
>
> Thanks Dmitry. Completed heap profiling as suggested by you. I am looking into net/http, if allocations could be reduced.
>
> My primary concern is gc time when net/http library is used heavily.
> For a simple hello world server (with 40 to 100ms response time),  - http://play.golang.org/p/w0AqIP6a8w, when tested with wrk --latency -d 10 -c 2000 -t 24 (this leads to 28k rps) (CPU-11% network io: 24Mbps in 54Mbps out), I get gc pauses as - 
> gc30(8): 10+23+3360+160 us, 22 -> 44 MB, 240321 (3126041-2885720) objects, 6585/4441/0 sweeps,

3.36ms to sweep a small heap like this sounds wrong. Dmitry, is my estimation off?

Dmitry Vyukov

unread,
Sep 16, 2014, 9:00:06 PM9/16/14
to Dave Cheney, Suraj Narkhede, Brad Fitzpatrick, golang-nuts
On Tue, Sep 16, 2014 at 3:28 PM, Dave Cheney <da...@cheney.net> wrote:
>
> On 17 Sep 2014 04:28, "Suraj Narkhede" <suraj...@gmail.com> wrote:
>>
>> Thanks Dmitry. Completed heap profiling as suggested by you. I am looking
>> into net/http, if allocations could be reduced.
>>
>> My primary concern is gc time when net/http library is used heavily.
>> For a simple hello world server (with 40 to 100ms response time), -
>> http://play.golang.org/p/w0AqIP6a8w, when tested with wrk --latency -d 10 -c
>> 2000 -t 24 (this leads to 28k rps) (CPU-11% network io: 24Mbps in 54Mbps
>> out), I get gc pauses as -
>> gc30(8): 10+23+3360+160 us, 22 -> 44 MB, 240321 (3126041-2885720) objects,
>> 6585/4441/0 sweeps,
>
> 3.36ms to sweep a small heap like this sounds wrong. Dmitry, is my
> estimation off?


Difficult to say.
Goroutine traceback is slow, so it can be caused by large number of goroutines.
I've mailed
https://codereview.appspot.com/136310044
to output number of live goroutines during GC.

Suraj Narkhede

unread,
Sep 17, 2014, 1:40:38 PM9/17/14
to golan...@googlegroups.com, da...@cheney.net, suraj...@gmail.com, brad...@golang.org
If its helpful, output of runtime.NumGoroutine() is 1998 during this test.
Reply all
Reply to author
Forward
0 new messages