golang multiple go routines reading from a channel and performance implications

536 views
Skip to first unread message

Sankar

unread,
Nov 21, 2019, 1:30:15 PM11/21/19
to golang-nuts
We have a setup where we have a producer goroutine pumping in a few thousand objects into a channel (Approximately 2k requests per second). There are a configurable number of goroutines that work as consumers, consuming from this single channel. If none of the consumer threads could receive the message, the message gets discarded. Each consumer go routine takes about 2 seconds for each work to be completed, by which they will come back to read the next item in the channel. The channel is sized such that it can hold up to 10,000 messages.

The code is roughly something like:

producer.go:
func produce() {
 ch <- item
}

func consumer() {
 for i:=0; i < NumberOfWorkers; i ++ {
   go func() {
      for _, item := range ch {
         // process item
      }
   } ()
 }
}

With this above setup, we are seeing about 40% of our messages getting dropped.

So my questions are:

1) In such a high velocity incoming data, will this above design work ? (Producer, Consumer Worker Threads)
2) We did not go for an external middleware for saving the message and processing data later, as we are concerned about latency for now.
3) Are channels bad for such an approach ? Are there any other alternate performant mechanism to achieve this in the go way ?
4) Are there any sample FOSS projects that we can refer to see such performant code ? Any other book, tutorial, video or some such for these high performance Golang application development guidelines ?

I am planning to do some profiling of our system to see where the performance is getting dropped, but before that I wanted to ask here, in case there are any best-known-methods that I am missing out on. Thanks.

burak serdar

unread,
Nov 21, 2019, 1:45:11 PM11/21/19
to Sankar, golang-nuts
On Thu, Nov 21, 2019 at 11:30 AM Sankar <sankar.c...@gmail.com> wrote:
>
> We have a setup where we have a producer goroutine pumping in a few thousand objects into a channel (Approximately 2k requests per second). There are a configurable number of goroutines that work as consumers, consuming from this single channel. If none of the consumer threads could receive the message, the message gets discarded. Each consumer go routine takes about 2 seconds for each work to be completed, by which they will come back to read the next item in the channel. The channel is sized such that it can hold up to 10,000 messages.

If each goroutine takes 2secs, you'd need > 4k worker goroutines to
keep up with the inflow. How many do you have?

Have you tried with a separate channel per goroutine, with a smaller
channel size?


>
> The code is roughly something like:
>
> producer.go:
> func produce() {
> ch <- item
> }
>
> func consumer() {
> for i:=0; i < NumberOfWorkers; i ++ {
> go func() {
> for _, item := range ch {
> // process item
> }
> } ()
> }
> }
>
> With this above setup, we are seeing about 40% of our messages getting dropped.
>
> So my questions are:
>
> 1) In such a high velocity incoming data, will this above design work ? (Producer, Consumer Worker Threads)
> 2) We did not go for an external middleware for saving the message and processing data later, as we are concerned about latency for now.
> 3) Are channels bad for such an approach ? Are there any other alternate performant mechanism to achieve this in the go way ?
> 4) Are there any sample FOSS projects that we can refer to see such performant code ? Any other book, tutorial, video or some such for these high performance Golang application development guidelines ?
>
> I am planning to do some profiling of our system to see where the performance is getting dropped, but before that I wanted to ask here, in case there are any best-known-methods that I am missing out on. Thanks.
>
> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/f8b5d9fb-d9b7-44b8-bc50-70dcb5f10cd0%40googlegroups.com.

Robert Engels

unread,
Nov 21, 2019, 2:25:35 PM11/21/19
to Sankar, golang-nuts
You need to determine how well they parallelize and what the resource consumption of a request is. For example, if every request can run concurrently at 100% (not possible btw because of switching overhead), and each request takes 0.5 secs of CPU, and 1.5 secs of IO, for a total wall time of 2 secs). At 2k request/per sec, you need a machine with 1000 CPUs. IO can run concurrently on most modern setups, so you can essentially factor this out, less so if most of the operations are writes.

Your local CPU requirements may be less if the request is then handled by a cluster (over the network, database etc), but you will still need 1000 cpus in the cluster (probably a lot more due to the network overhead).

You can look at github.com/robaho/go-trader for an example of very high CPU based processing using Go and channels (and other concurrency structs).



-----Original Message-----
From: Sankar
Sent: Nov 21, 2019 12:30 PM
To: golang-nuts
Subject: [go-nuts] golang multiple go routines reading from a channel and performance implications

We have a setup where we have a producer goroutine pumping in a few thousand objects into a channel (Approximately 2k requests per second). There are a configurable number of goroutines that work as consumers, consuming from this single channel. If none of the consumer threads could receive the message, the message gets discarded. Each consumer go routine takes about 2 seconds for each work to be completed, by which they will come back to read the next item in the channel. The channel is sized such that it can hold up to 10,000 messages.

The code is roughly something like:

producer.go:
func produce() {
 ch <- item
}

func consumer() {
 for i:=0; i < NumberOfWorkers; i ++ {
   go func() {
      for _, item := range ch {
         // process item
      }
   } ()
 }
}

With this above setup, we are seeing about 40% of our messages getting dropped.

So my questions are:

1) In such a high velocity incoming data, will this above design work ? (Producer, Consumer Worker Threads)
2) We did not go for an external middleware for saving the message and processing data later, as we are concerned about latency for now.
3) Are channels bad for such an approach ? Are there any other alternate performant mechanism to achieve this in the go way ?
4) Are there any sample FOSS projects that we can refer to see such performant code ? Any other book, tutorial, video or some such for these high performance Golang application development guidelines ?

I am planning to do some profiling of our system to see where the performance is getting dropped, but before that I wanted to ask here, in case there are any best-known-methods that I am missing out on. Thanks.

Michael Jones

unread,
Nov 21, 2019, 3:07:41 PM11/21/19
to Robert Engels, Sankar, golang-nuts
In my (past) benchmarking, I got ~3M channel send/receive operations per second on my MacBook Pro. It is faster on faster computers. 2k requests/src is much less than 3M, clearly, and the 1/1000 duty cycle suggests that you'll have 99.9% non-overhead to do your processing. This is back of the envelope thinking, but what I go through for every path explored. What should it be? How is in in fact? What explains the difference? ... that kind of thing.



--
Michael T. Jones
michae...@gmail.com

Robert Engels

unread,
Nov 21, 2019, 3:12:54 PM11/21/19
to Michael Jones, Sankar, golang-nuts
He stated "each request takes 2 secs to process" - what's involved in that is the important aspect imo.

Michael Jones

unread,
Nov 21, 2019, 3:24:26 PM11/21/19
to Robert Engels, Sankar, golang-nuts
Agree. Essentially I'm saying the "channel aspect" is not an issue.

Ivan Bertona

unread,
Nov 21, 2019, 6:59:08 PM11/21/19
to golang-nuts
1) Yes if you set NumberOfWorkers high enough (> 4k / num CPUs), and your machine is actually capable of handling this workload. Based on experience I'd say you shouldn't expect significant overhead for job scheduling.
2) Not sure this is a question
3) No
4) What you are doing is totally fine at 2k/s

I'll add that you shouldn't trust me, you can easily measure the overhead yourself by making the consumer work be a 2s sleep, setting NumberOfWorkers to 4k / num CPUs, pushing 2k/s jobs, and looking at how your system load looks like when it's running. As for whether this would work with your actual workload, again the only way is to try and measure.

Best,
Ivan
To unsubscribe from this group and stop receiving emails from it, send an email to golan...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golan...@googlegroups.com.


--
Michael T. Jones
michae...@gmail.com

burak serdar

unread,
Nov 21, 2019, 7:14:33 PM11/21/19
to Ivan Bertona, golang-nuts
On Thu, Nov 21, 2019 at 4:59 PM Ivan Bertona <iv...@ibrt.me> wrote:
>
> 1) Yes if you set NumberOfWorkers high enough (> 4k / num CPUs), and your machine is actually capable of handling this workload. Based on experience I'd say you shouldn't expect significant overhead for job scheduling.

Not divided by nCPUs though, right?

If there are w workers, and with w workers each goroutine takes 2secs,
and if you're getting work at a rate of 2k/sec, you need at least 4k
goroutines to keep up, regardless of the cpu count. After the 2nd
second, you'll have all 4k goroutines busy. Am I missing something?

The important thing is: does it take 2 secs for each goroutine to
complete when w is > 4k
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/f5e1aa15-2e67-40a8-9eae-2626c8b460f7%40googlegroups.com.

Ivan Bertona

unread,
Nov 21, 2019, 7:18:24 PM11/21/19
to burak serdar, golang-nuts
You are totally right on that, sorry. It's just > 4k.
Reply all
Reply to author
Forward
0 new messages