Goroutines vs OS threads

11391 views
Skip to first unread message

martin....@gmail.com

unread,
Aug 11, 2013, 5:35:46 PM8/11/13
to golan...@googlegroups.com
How is it possible Go supports 100k+ concurrent goroutines on a single CPU, doing context switching between all of them without large overhead?

I'm trying to understand how the Go scheduler works and in particular how it compares to a traditional OS scheduler.

We all know it is a bad idea for a (C, Java) program to create thousands of OS threads (e.g. one thread per request). This is mainly because:

1. Each thread uses fixed amount of memory for its stack 
    (while in Go, the stack grows and shrinks as needed)

2. With many threads, the overhead of context switching becomes significant
    (isn't this true in Go as well?)

Even if Go context-switched only on IO and channel access, doesn't it still have to save and restore the state of variables for each goroutine? How is it possible it is so efficient compared to an OS scheduler?

There's a design doc for the Go scheduler
(and an earlier discussion on the topic).

Thanks!
Martin

Ian Lance Taylor

unread,
Aug 12, 2013, 12:11:28 AM8/12/13
to martin....@gmail.com, golang-nuts
On Sun, Aug 11, 2013 at 2:35 PM, <martin....@gmail.com> wrote:
>
> How is it possible Go supports 100k+ concurrent goroutines on a single CPU,
> doing context switching between all of them without large overhead?

If all 100k goroutines are actively doing things in parallel, the Go
code will tend to have significant overhead. In a normal Go program,
though, each goroutine will be waiting for network input.


> We all know it is a bad idea for a (C, Java) program to create thousands of
> OS threads (e.g. one thread per request). This is mainly because:
>
> 1. Each thread uses fixed amount of memory for its stack
> (while in Go, the stack grows and shrinks as needed)
>
> 2. With many threads, the overhead of context switching becomes significant
> (isn't this true in Go as well?)

The context of a goroutine is much smaller and easier to change than
the context of an OS thread. If nothing else an OS thread has a
signal mask. At least with the gc toolchain, a goroutine really just
has two values: a stack pointer and a pointer to the g structure.
Switching to a new goroutine is just a matter of a few instructions.

Also goroutines do cooperative scheduling and threads do not.


> Even if Go context-switched only on IO and channel access, doesn't it still
> have to save and restore the state of variables for each goroutine? How is
> it possible it is so efficient compared to an OS scheduler?

It does not have to save and restore each variable. All goroutines
see the same global variables, and local variables are always accessed
via the stack pointer.

Ian

Rodrigo Kochenburger

unread,
Aug 12, 2013, 2:55:22 AM8/12/13
to Ian Lance Taylor, martin....@gmail.com, golang-nuts

- RK



--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



Dmitry Vyukov

unread,
Aug 12, 2013, 4:23:09 AM8/12/13
to martin....@gmail.com, golang-nuts
On Mon, Aug 12, 2013 at 1:35 AM, <martin....@gmail.com> wrote:
How is it possible Go supports 100k+ concurrent goroutines on a single CPU, doing context switching between all of them without large overhead?

I'm trying to understand how the Go scheduler works and in particular how it compares to a traditional OS scheduler.

We all know it is a bad idea for a (C, Java) program to create thousands of OS threads (e.g. one thread per request). This is mainly because:

1. Each thread uses fixed amount of memory for its stack 
    (while in Go, the stack grows and shrinks as needed)

2. With many threads, the overhead of context switching becomes significant
    (isn't this true in Go as well?)


Go scheduler just as thread schedulers in modern OSes is O(1). Overhead of making a goroutine runnable and scheduling it for execution is not dependent on number of goroutines.

Martin Koníček

unread,
Aug 15, 2013, 5:42:58 PM8/15/13
to Dmitry Vyukov, golang-nuts
Thank you very much Ian, Rodrigo and Dmitry for helping me understand how goroutines work.

Let me just a ask a clarifying followup question:

When writing a server in Go, is it OK to spawn one goroutine per connection and do some blocking I/O inside each goroutine?

I've seen a presentation by a member of the Go team which said that "Go APIs are blocking, and blocking in Go is fine."

Is this really the case?


Ian, it's interesting to know that local variables are always accessed via the stack pointer - thank you! I assumed Go would use registers for local variables and would have to save and restore all registers when switching a goroutine.

I can see how the smaller context helps to keep goroutines more "lightweight" than OS threads. I can also imagine that doing scheduling in user mode helps.
Still don't see why cooperative scheduling helps as opposed to say rescheduling every 10ms.

Thanks for your patience!

Best regards,
Martin

Martin Koníček

unread,
Aug 15, 2013, 6:00:41 PM8/15/13
to Dmitry Vyukov, golang-nuts
I think I found a satisfactory answer:

The following operations do not cause the goroutine to use a thread when they block:
  • channel operations
  • network operations
  • sleeping
This means, for example, that if you have many goroutines that open /dev/ttyxx and block on read, you'll be using a thread for each one.


Thanks again! Note that Go is my first language with non-(thread)-blocking APIs that don't rely on callbacks (apart from C#, which uses a compiler trick to hide the callbacks from the user).

Dmitry Vyukov

unread,
Aug 15, 2013, 6:03:39 PM8/15/13
to Martin Koníček, golang-nuts
On Fri, Aug 16, 2013 at 1:42 AM, Martin Koníček <martin....@gmail.com> wrote:
Thank you very much Ian, Rodrigo and Dmitry for helping me understand how goroutines work.

Let me just a ask a clarifying followup question:

When writing a server in Go, is it OK to spawn one goroutine per connection and do some blocking I/O inside each goroutine?


Yes, this is the way to go.
On "physical" (OS) level such program behaves the same way as an event-driven C program that uses epoll to handle large number of connections on few threads.


 
I've seen a presentation by a member of the Go team which said that "Go APIs are blocking, and blocking in Go is fine."

Is this really the case?


Ian, it's interesting to know that local variables are always accessed via the stack pointer - thank you! I assumed Go would use registers for local variables and would have to save and restore all registers when switching a goroutine.

I can see how the smaller context helps to keep goroutines more "lightweight" than OS threads. I can also imagine that doing scheduling in user mode helps.
Still don't see why cooperative scheduling helps as opposed to say rescheduling every 10ms.


If you do asynchronous preemption you need save/restore ALL registers, that is, 16 general purpose registers, PC, SP, segment registers, 16 XMM registers, FP coprocessor state, X AVX registers, all MSRs and so on.
If you cooperatively preempt at known points, then you need to safe/restore only what is known to be alive at these points. In Go that's just 3 registers -- PC, SP and DX.

Martin Koníček

unread,
Aug 15, 2013, 6:05:24 PM8/15/13
to Dmitry Vyukov, golang-nuts
Wow, thank you for the in-depth explanation Dmitry!

Dmitry Vyukov

unread,
Aug 15, 2013, 6:09:04 PM8/15/13
to Martin Koníček, golang-nuts
On Fri, Aug 16, 2013 at 2:00 AM, Martin Koníček <martin....@gmail.com> wrote:
I think I found a satisfactory answer:

The following operations do not cause the goroutine to use a thread when they block:
  • channel operations
  • network operations
  • sleeping
This also includes blocking on all primitives in sync package.

 
This means, for example, that if you have many goroutines that open /dev/ttyxx and block on read, you'll be using a thread for each one.

Yes, that's correct.
There is no reason why we can not do "non-blocking under the hoods" handling of files/devices/etc. It's just not done yet. On windows it's easy with IOCP, nonblocking network and file operations are the same. I am not sure what is the state of AIO on unixes today.

Alexei Sholik

unread,
Aug 15, 2013, 7:18:25 PM8/15/13
to Dmitry Vyukov, Martin Koníček, golang-nuts
If you cooperatively preempt at known points, then you need to safe/restore only what is known to be alive at these points. In Go that's just 3 registers -- PC, SP and DX.

You can still have values stored in other registers. Does this mean that the compiler is aware of those scheduling points so that it emit spilling code before, say, reading from a channel?


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Best regards
Alexei Sholik

Dmitry Vyukov

unread,
Aug 16, 2013, 4:56:09 AM8/16/13
to Alexei Sholik, Martin Koníček, golang-nuts
On Fri, Aug 16, 2013 at 3:18 AM, Alexei Sholik <alcos...@gmail.com> wrote:
If you cooperatively preempt at known points, then you need to safe/restore only what is known to be alive at these points. In Go that's just 3 registers -- PC, SP and DX.

You can still have values stored in other registers. Does this mean that the compiler is aware of those scheduling points so that it emit spilling code before, say, reading from a channel?


Yes, the calling convention is that the callee can use all registers, so the spilling needs to be done anyway. So if we preempt at function entry, we get only-3-live-registers for free.

Alexei Sholik

unread,
Aug 17, 2013, 1:11:25 PM8/17/13
to Dmitry Vyukov, Martin Koníček, golang-nuts
Thanks for the clarification, Dmitry. Is Go's calling convention documented anywhere?

Dmitry Vyukov

unread,
Aug 17, 2013, 1:12:47 PM8/17/13
to Alexei Sholik, Martin Koníček, golang-nuts
I have not seen any description.

adriansc...@gmail.com

unread,
Dec 11, 2015, 8:17:01 AM12/11/15
to golang-nuts, martin....@gmail.com


On Monday, August 12, 2013 at 11:11:28 AM UTC+7, Ian Lance Taylor wrote:
On Sun, Aug 11, 2013 at 2:35 PM,  <martin....@gmail.com> wrote:
>
> How is it possible Go supports 100k+ concurrent goroutines on a single CPU,
> doing context switching between all of them without large overhead?

If all 100k goroutines are actively doing things in parallel, the Go
code will tend to have significant overhead.  In a normal Go program,
though, each goroutine will be waiting for network input.

Isn't the net/http package essentially doing that by calling a go routine for every connection?
Maybe I misunderstand it, but it seems like, that ServeHttp is called inside this goroutines. Which would mean, that a lot of computation happens inside of that routine, like dispatching the request based on the url and computing the response. Which would mean a lot of scheduling overhead in the webapplications build with this package.

Ian Lance Taylor

unread,
Dec 11, 2015, 9:03:55 AM12/11/15
to adriansc...@gmail.com, golang-nuts, martin....@gmail.com
That CPU processing time is normally less than the time it takes a new
network packet arrive, so it remains true that most goroutines are
waiting for network input. That mighjt be false if the HTTP server is
being hit at high speed over a very high speed network connection
(that is, not the general Internet), but that is not the normal case.

Ian

Tamás Gulácsi

unread,
Dec 11, 2015, 9:05:07 AM12/11/15
to golang-nuts
Those computations must be done somewhere. Goroutine switching is very cheap, and is done at specific points in code (esp. at function calls), so won't cut your important loop into pieces.

Adrian Scheffler

unread,
Dec 11, 2015, 10:08:22 AM12/11/15
to golang-nuts, adriansc...@gmail.com, martin....@gmail.com

>> If all 100k goroutines are actively doing things in parallel, the Go
>> code will tend to have significant overhead.  In a normal Go program,
>> though, each goroutine will be waiting for network input.
>>
> Isn't the net/http package essentially doing that by calling a go routine
> for every connection?
> Maybe I misunderstand it, but it seems like, that ServeHttp is called inside
> this goroutines. Which would mean, that a lot of computation happens inside
> of that routine, like dispatching the request based on the url and computing
> the response. Which would mean a lot of scheduling overhead in the
> webapplications build with this package.

That CPU processing time is normally less than the time it takes a new
network packet arrive, so it remains true that most goroutines are
waiting for network input.  That mighjt be false if the HTTP server is
being hit at high speed over a very high speed network connection
(that is, not the general Internet), but that is not the normal case.
Okay, I just thought it would be most interesting to look at the behaviour of webservers, when they are under load. So if there is a big number of incoming requests, there will be a lot of blocked go routines, but also a lot of active ones. You said, that if a lot of go routines do a lot of active stuff in parallel, there will be a significant overhead. So I thought maybe it would be better to have a model with a fixed number of go routines, that have specialized professions, like accepting incoming requests, dispatching urls, processing requests, sending responses back. I would be interested, if its worth the effort to think more about this. Can you elaborate a bit more on this overhead under the condition of big numbers of goroutines doing stuff in parallel and how it compares to a small number of goroutines doing stuff in parallel? Is there any good material on the goroutine scheduler in its current state? 

Dave Cheney

unread,
Dec 11, 2015, 10:14:30 AM12/11/15
to golang-nuts, adriansc...@gmail.com, martin....@gmail.com
Okay, I just thought it would be most interesting to look at the behaviour of webservers, when they are under load. So if there is a big number of incoming requests, there will be a lot of blocked go routines, but also a lot of active ones. You said, that if a lot of go routines do a lot of active stuff in parallel, there will be a significant overhead. So I thought maybe it would be better to have a model with a fixed number of go routines, that have specialized professions, like accepting incoming requests, dispatching urls, processing requests, sending responses back. I would be interested, if its worth the effort to think more about this. Can you elaborate a bit more on this overhead under the condition of big numbers of goroutines doing stuff in parallel and how it compares to a small number of goroutines doing stuff in parallel? Is there any good material on the goroutine scheduler in its current state? 

There are never more than GOMAXPROCS goroutines active in a Go program. All the other goroutines are either runnable; or waiting for some condition which will make them runnable. 

Adrian Scheffler

unread,
Dec 11, 2015, 10:46:31 AM12/11/15
to golang-nuts, adriansc...@gmail.com, martin....@gmail.com


On Friday, December 11, 2015 at 10:14:30 PM UTC+7, Dave Cheney wrote:
Okay, I just thought it would be most interesting to look at the behaviour of webservers, when they are under load. So if there is a big number of incoming requests, there will be a lot of blocked go routines, but also a lot of active ones. You said, that if a lot of go routines do a lot of active stuff in parallel, there will be a significant overhead. So I thought maybe it would be better to have a model with a fixed number of go routines, that have specialized professions, like accepting incoming requests, dispatching urls, processing requests, sending responses back. I would be interested, if its worth the effort to think more about this. Can you elaborate a bit more on this overhead under the condition of big numbers of goroutines doing stuff in parallel and how it compares to a small number of goroutines doing stuff in parallel? Is there any good material on the goroutine scheduler in its current state? 

There are never more than GOMAXPROCS goroutines active in a Go program. All the other goroutines are either runnable; or waiting for some condition which will make them runnable. 
Okay; sorry I am new to all of this. I messed up the terminology. If we insist on a distinction between active and runnable goroutines I would interpret the sentence "If all 100k goroutines are actively doing things in parallel, the Go code will tend to have significant overhead." from Ian Lance Taylor as: "If there are a lot of runnable goroutines around(that have a CPU-bound behaviour), the Go code will tend to have significant overhead." Is that right? I assume runnable means not blocked for I/O and wants to be active. In the net/http package, the Serve() function accepts new connections and creates a new goroutine for each. So I would think, that under load, there would be a lot of runnable goroutines, because they  cant finish, as quickly as new connections/requests come in. it would be nice to understand, why there is a significant overhead from a big number of goroutines that are not blocked and if this has to do with the scheduler(I just assumed that, because it seemed natural).

Ian Lance Taylor

unread,
Dec 11, 2015, 12:05:32 PM12/11/15
to Adrian Scheffler, golang-nuts, martin....@gmail.com
That is more or less right, though it depends on what you mean by
significant. If you have 100k CPU-bound goroutines, there will be
scheduling overhead.


> In the net/http package, the Serve() function accepts new
> connections and creates a new goroutine for each. So I would think, that
> under load, there would be a lot of runnable goroutines, because they cant
> finish, as quickly as new connections/requests come in. it would be nice to
> understand, why there is a significant overhead from a big number of
> goroutines that are not blocked and if this has to do with the scheduler(I
> just assumed that, because it seemed natural).

If your server gets more connections than it can process, then it is
overloaded. That is true no matter how you partition your program.

From your earlier message, above, I think you are suggesting that it
would be better to have a fixed number of goroutines handling
different parts of each request. It would not be better. You would
have the same amount of work, and therefore the same scheduling
overhead, as if you had a separate goroutine per request.

That is, when I talked earlier about "actively doing things in
parallel," I didn't mean that the scheduling overhead increases based
on the number of goroutines. I meant that the scheduling overhead
increases based on the amount of work there is to do.

Ian

Luna Duclos

unread,
Dec 11, 2015, 12:14:42 PM12/11/15
to golang-nuts
Switching goroutines, inlike switching threads, does not incur a huge overhead. It's fairly cheap. I'm uncertain where the "significant overhead" would be coming from under high load as a result.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Konstantin Khomoutov

unread,
Dec 11, 2015, 12:43:36 PM12/11/15
to Adrian Scheffler
On Fri, 11 Dec 2015 07:46:31 -0800 (PST)
Adrian Scheffler <adriansc...@gmail.com> wrote:

[...]
> So I would think, that under load, there
> would be a lot of runnable goroutines, because they cant finish, as
> quickly as new connections/requests come in. it would be nice to
> understand, why there is a significant overhead from a big number of
> goroutines that are not blocked and if this has to do with the
> scheduler(I just assumed that, because it seemed natural).

The problem is that you are trying to solicit abstract discussion about
pretty abstract things. The first (some beleive the sole) rule of
optimization is that you never know where exactly it will be required
until you test your program in production conditions. From what I
observe lurking in this list for pretty some time, people in high-load
segment tend to experience problems related to garbage collection, and
not with some imaginary scheduling overhead you're trying to discuss.
And that with servers experiencing several thouthands of requests per
second.

Go runtime comes with a quite strong support for profiling of various
kinds of resources. So if you're really interested, search the list
for those profiling techniques, wrote a sample server, put a load on it
using `wrk' or another similar tool and study the profiling data.

Dave Cheney

unread,
Dec 11, 2015, 5:23:13 PM12/11/15
to golang-nuts, adriansc...@gmail.com, martin....@gmail.com
You can investigate this yourself, write a small hello world style Go program, run it with the env var GODEBUG=gctrace=1000 and hit it with some kind of load testing software. The runtime will output a summary of the run queue once per second. Dmitry has a good blog post describing this output in more detail, https://software.intel.com/en-us/blogs/2014/05/10/debugging-performance-issues-in-go-programs.

Thanks

Dave

Adrian Scheffler

unread,
Dec 12, 2015, 6:39:32 AM12/12/15
to golang-nuts, adriansc...@gmail.com, martin....@gmail.com
On Saturday, December 12, 2015 at 5:23:13 AM UTC+7, Dave Cheney wrote:
You can investigate this yourself, write a small hello world style Go program, run it with the env var GODEBUG=gctrace=1000 and hit it with some kind of load testing software. The runtime will output a summary of the run queue once per second. Dmitry has a good blog post describing this output in more detail, https://software.intel.com/en-us/blogs/2014/05/10/debugging-performance-issues-in-go-programs.
Okay, so I am a few days into learning go and I am trying to understand the go routines. The two things I heard so far about them( in a redundant manner) is, that they are very cheap and that it is easy to reason about architectures that are build with them. I would like to build an opinion on different ways of composing these go routines, because I don't want to implement and measure all possibilities for every design decision. Even though I believe, go comes with good tools for this. :-)
 I didn't mean that the scheduling overhead increases based 
on the number of goroutines.  I meant that the scheduling overhead 
increases based on the amount of work there is to do. 
Okay, but there is other overhead, than scheduling overhead attached to a goroutine, right? like the blocked OS-Thread, when the goroutine does a syscall. And there is a maximum number of threads (set by default to 10000). So if there is a peak in requests, either by an increased amount of users, or because an adversary floods the server with requests, the server could crash pretty quickly. I just somehow have the suspicion, that it is not a good idea to create go routines in direct relation to the input of a program. Especially, when the input comes from the network, because it could lead to potential Denial Of Service vulnerabilities. This cant really be mitigated by a higher number of maxthreads, because eventually the operating system will crash. I would be glad to hear if this train of thought is right.
During my digging I found this paper: http://www1.cs.columbia.edu/~aho/cs6998/reports/12-12-11_DeshpandeSponslerWeiss_GO.pdf but I couldn't find a discussion about it and if the considerations were accepted and implemented. If that is the case, the kind of architecture I thought of would be better, because the grouping of goroutines, proposed in this paper, works on the assumption that the goroutines communicate with each other over channels. The current net/http package has a lot of goroutines, that don't interact with each other, so this grouping yields no performance benefit. It would be nice, if someone could point me to the discussion.
Thanks,
Adrian

Dave Cheney

unread,
Dec 12, 2015, 6:52:37 AM12/12/15
to golang-nuts
The number of concurrent connections per process is limited by the number of file descriptors available to the process. It is correct that some syscall consume an OS thread while they are in progress. If there is no free OS thread to replace the one that is blocked on a syscall a new one will be created to service goroutines, up to GOMAXPROCS. But the net package uses the available polling mechanisms provided by the operating system to coalesce all outstanding read and write operations onto a single blocking syscall.

I gave a talk about this at oscon earlier in the year, maybe this blog post will give some useful background. http://dave.cheney.net/2015/08/08/performance-without-the-event-loop

Thanks

Dave

Adrian Scheffler

unread,
Dec 12, 2015, 7:16:01 AM12/12/15
to golang-nuts
I know about the netpoller. To be more specific: I mean the OS-thread overhead, by a goroutine, that is blocked on a syscall, that does not get handled by the netpoller. For example a John Doe like me might have a website with a page, that does some kind of syscall, that is not intercepted by the netpoller. I don't think this is so far fetched. So if 10000 connections for this page come in in a short amount of time, the server should crash, because of the thread limit. Would be cool, if someone could falsify that concern.

If there is no free OS thread to replace the one that is blocked on a syscall a new one will be created to service goroutines, up to GOMAXPROCS. 
This is very confusing to me. I thought GOMAXPROCS refers to the maximum number of parallel executing threads, which is now set by default to the number of CPUs instead of 1. 
 

Dave Cheney

unread,
Dec 12, 2015, 7:24:22 AM12/12/15
to golang-nuts
GOMAXPROCS is the maximum number of operating system threads assigned to service goroutines. There are other threads in use in a go program, blocking syscalls are one, cgo calls are another.

The scenario you describe with a poorly written program consuming lots of a OS threads blocked on syscalls is possible, but rarely happens because those would be file system operations which would quickly overwhelm the io subsystem of the host. Also, you'll still run out of file descriptors. Yes, it can happen, possibly easily, but it is also easily debugged.

There is an open issue to use the net packages polled code for local systems operations, is would reduce the number of blocking syscalls.

Adrian Scheffler

unread,
Dec 13, 2015, 2:56:09 AM12/13/15
to golang-nuts
I am sorry. I did not look closely enough at the sentence you wrote. Makes total sense to me now. I don't know ... if you look at the canonical "Writing Webapplications"-Tutorial https://golang.org/doc/articles/wiki/ it seems like there will be some filesystem I/O in these http request serving goroutines. So in theory, if there is a traffic spike, the webserver will crash. But maybe it has no relevance in practice, because the threads unblock faster, than new requests(connections) come in. So the traffic spike would need to be so big, that the server would overload anyway. I will just go and play around with it. Thanks for your help!
Peace Out! :-)

Jesper Louis Andersen

unread,
Dec 13, 2015, 9:25:57 AM12/13/15
to Adrian Scheffler, golang-nuts, martin....@gmail.com

On Sat, Dec 12, 2015 at 12:39 PM, Adrian Scheffler <adriansc...@gmail.com> wrote:
During my digging I found this paper: http://www1.cs.columbia.edu/~aho/cs6998/reports/12-12-11_DeshpandeSponslerWeiss_GO.pdf but I couldn't find a discussion about it and if the considerations were accepted and implemented.

The proposal is, rather simplified, to record which goroutines interact on which channels and then keep those goroutines on the same CPU core. The observation stems from Occam-Pi and an implementation by Ritson in which one uses the notion of counting "process was blocked on contention" to dynamically create batches of processes which are likely to be blocked on the same contention. The hope is to speed up computation speed by localization of data.

My critique is the paper doesn't experiment with this proposal. You have to demonstrate the recording cost is outweighed by faster execution, and the Go code exists as Open Source, which leaves you with no excuse. Barring a concrete design and algorithm, you can only speculate to its efficiency, and you have no chance at pointing out pathological behavior in the implementation. I think the Occam-PI design is nice and it could work as a good batcher, but I'd like it demonstrated on some real-world examples first.




--
J.

Jesper Louis Andersen

unread,
Dec 13, 2015, 9:38:23 AM12/13/15
to Adrian Scheffler, golang-nuts

On Sat, Dec 12, 2015 at 1:16 PM, Adrian Scheffler <adriansc...@gmail.com> wrote:
I know about the netpoller. To be more specific: I mean the OS-thread overhead, by a goroutine, that is blocked on a syscall, that does not get handled by the netpoller. For example a John Doe like me might have a website with a page, that does some kind of syscall, that is not intercepted by the netpoller. I don't think this is so far fetched. So if 10000 connections for this page come in in a short amount of time, the server should crash, because of the thread limit. Would be cool, if someone could falsify that concern.

Any system will have a capacity limit somewhere. It is, for the most part, not 10k syscalls since database connections pools are far more likely to queue you long before you run out of syscalls. In a real system, once you know your engineered capacity limit, you can just start rejecting work as you approach it.

An observation from the Erlang community is that the best design choice is often to limit at the border of your application as early as possible and then don't worry about internal resource usage down the line. That is most internal queues/channels are effectively unbounded in size, whereas you bound incoming requests, perhaps also with a short queue to smoothen out spikes and make the processing a bit more stable and less spiky. Erlang systems also picked a different solution in which you run a pool of threads to do the blocking I/O calls. The default is 5 threads, and I have never ever seen more than 300 in an application. We also have monitors in apps when you hit the limit and it usually happens when you end up being contended on disk, at which point you've lost in the speed department anyway...


--
J.

Adrian Scheffler

unread,
Dec 13, 2015, 2:55:07 PM12/13/15
to golang-nuts, adriansc...@gmail.com
My critique is the paper doesn't experiment with this proposal. You have to demonstrate the recording cost is outweighed by faster execution, and the Go code exists as Open Source, which leaves you with no excuse. Barring a concrete design and algorithm, you can only speculate to its efficiency, and you have no chance at pointing out pathological behavior in the implementation. I think the Occam-PI design is nice and it could work as a good batcher, but I'd like it demonstrated on some real-world examples first.
But nevertheless, someone should think about this. I looked at proc.go which seems to implement the scheduler and it links to a design document dated May 2012
( https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit# ) So either their comments are not updated, or there was no progress on this in the last 3 years.
 
An observation from the Erlang community is that the best design choice is often to limit at the border of your application as early as possible and then don't worry about internal resource usage down the line. That is most internal queues/channels are effectively unbounded in size, whereas you bound incoming requests, perhaps also with a short queue to smoothen out spikes and make the processing a bit more stable and less spiky. Erlang systems also picked a different solution in which you run a pool of threads to do the blocking I/O calls. The default is 5 threads, and I have never ever seen more than 300 in an application. We also have monitors in apps when you hit the limit and it usually happens when you end up being contended on disk, at which point you've lost in the speed department anyway...
I like this kind of thinking. Thats why I got interested in go. Maybe I should look into Erlang instead. What would you recommend? The net/http package seems to care about other stuff than that. And its part of the standard library. So it reflects the culture of go somehow. 

tu....@zalora.com

unread,
Jun 22, 2016, 8:32:53 AM6/22/16
to golang-nuts, martin....@gmail.com
I think that the idea of Goroutine come from [https://en.wikipedia.org/wiki/Coroutine], the number of Goroutines are mapped to OS Threads. In my own opinion, Go maintain the Thread pool and task queue to dispatch the job.

This pattern has been implemented in some modern JEE container.
Hope that the answer help

Konstantin Khomoutov

unread,
Jun 22, 2016, 9:54:02 AM6/22/16
to tu....@zalora.com, golang-nuts, martin....@gmail.com
On Wed, 22 Jun 2016 00:34:56 -0700 (PDT)
tu....@zalora.com wrote:

> I think that the idea of Goroutine come from
> [https://en.wikipedia.org/wiki/Coroutine],

This is hardly true: coroutines imply cooperative scheduling, where
each coroutine explictly relinquishes control to some other coroutine
(typically it's said it "yields" some intermediate result) without
actually returning. In contrast, goroutines behave much more like
OS-level threads in that the Go runtime scheduler is free to preempt any
goroutine at will at certain key points of their execution (presently
these include the event when a goroutine is about to block on a syscall
or call another Go function). Hence while it's not a "full" preemptive
scheduling which commodity operating systems apply to their OS-level
threads, it's still way more closer to it than the cooperative ("at
will") scheduling used by coroutines.

One more point to this is that coroutines are usually considered as
being executed by a single thread of execution which is not true for
goroutines -- which, due to this reason, cannot safely access shared
data without locking, and must synchronize all such access or use
specific communication primitives -- channels.

Go draws its approach to goroutines from a number of languages which
predates it, but ultimately this idea comes from [1].

> the number of Goroutines are mapped to OS Threads.

It's more accurate to say that an arbitrary number of goroutines can be
mapped to an arbitrary number of OS threads, which is usually referred
to as "N x M scheduling", with N -- the number of goroutines --
typically being way larger than M -- the number of OS threads undelying
them.

> In my own opinion, Go maintain the Thread pool and task queue to
> dispatch the job.

At a very abstract level, this is mostly true.

[...]

1. https://en.wikipedia.org/wiki/Communicating_sequential_processes

Tu Pham Anh

unread,
Jun 22, 2016, 10:29:11 AM6/22/16
to Konstantin Khomoutov, golang-nuts, martin....@gmail.com
Thank you very much for your response, I think that at abstract level it is true.

When we go into more detail, we will that GO optimize to have Routine context ( that some kind of Thread context), I think that at the conception level it some kind of yeild (conception ) to start to GORoutine.

But I read somewhere that GoRoutine optimize to create the thread when it dispatched ( at REGIGER level, they said that 3 Registers need to be handled to create new GoRoutine ).


You see that although you could create the large number of GORoutine but almost of them have status BLOCK, and small number of them are running ( basing OS Thread ). For the sharing data between routines, I think that in mode RUNNING, there are Routine context that keep this sharing work smoothly.

In my own opinion, it is one of the best choice to schedule job, Imagine that each time you start the GoRoutine, the new OS thread is created and start - I do not think that way cheap or lightweight routine as everyone said.
signature.asc
Reply all
Reply to author
Forward
0 new messages