--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
How is it possible Go supports 100k+ concurrent goroutines on a single CPU, doing context switching between all of them without large overhead?I'm trying to understand how the Go scheduler works and in particular how it compares to a traditional OS scheduler.We all know it is a bad idea for a (C, Java) program to create thousands of OS threads (e.g. one thread per request). This is mainly because:
1. Each thread uses fixed amount of memory for its stack(while in Go, the stack grows and shrinks as needed)2. With many threads, the overhead of context switching becomes significant(isn't this true in Go as well?)
Thank you very much Ian, Rodrigo and Dmitry for helping me understand how goroutines work.Let me just a ask a clarifying followup question:When writing a server in Go, is it OK to spawn one goroutine per connection and do some blocking I/O inside each goroutine?
I've seen a presentation by a member of the Go team which said that "Go APIs are blocking, and blocking in Go is fine."Is this really the case?Ian, it's interesting to know that local variables are always accessed via the stack pointer - thank you! I assumed Go would use registers for local variables and would have to save and restore all registers when switching a goroutine.I can see how the smaller context helps to keep goroutines more "lightweight" than OS threads. I can also imagine that doing scheduling in user mode helps.Still don't see why cooperative scheduling helps as opposed to say rescheduling every 10ms.
I think I found a satisfactory answer:The following operations do not cause the goroutine to use a thread when they block:
- channel operations
- network operations
- sleeping
This means, for example, that if you have many goroutines that open /dev/ttyxx and block on read, you'll be using a thread for each one.
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
> If you cooperatively preempt at known points, then you need to safe/restore only what is known to be alive at these points. In Go that's just 3 registers -- PC, SP and DX.You can still have values stored in other registers. Does this mean that the compiler is aware of those scheduling points so that it emit spilling code before, say, reading from a channel?
On Sun, Aug 11, 2013 at 2:35 PM, <martin....@gmail.com> wrote:
>
> How is it possible Go supports 100k+ concurrent goroutines on a single CPU,
> doing context switching between all of them without large overhead?
If all 100k goroutines are actively doing things in parallel, the Go
code will tend to have significant overhead. In a normal Go program,
though, each goroutine will be waiting for network input.
>> If all 100k goroutines are actively doing things in parallel, the Go
>> code will tend to have significant overhead. In a normal Go program,
>> though, each goroutine will be waiting for network input.
>>
> Isn't the net/http package essentially doing that by calling a go routine
> for every connection?
> Maybe I misunderstand it, but it seems like, that ServeHttp is called inside
> this goroutines. Which would mean, that a lot of computation happens inside
> of that routine, like dispatching the request based on the url and computing
> the response. Which would mean a lot of scheduling overhead in the
> webapplications build with this package.
That CPU processing time is normally less than the time it takes a new
network packet arrive, so it remains true that most goroutines are
waiting for network input. That mighjt be false if the HTTP server is
being hit at high speed over a very high speed network connection
(that is, not the general Internet), but that is not the normal case.
Okay, I just thought it would be most interesting to look at the behaviour of webservers, when they are under load. So if there is a big number of incoming requests, there will be a lot of blocked go routines, but also a lot of active ones. You said, that if a lot of go routines do a lot of active stuff in parallel, there will be a significant overhead. So I thought maybe it would be better to have a model with a fixed number of go routines, that have specialized professions, like accepting incoming requests, dispatching urls, processing requests, sending responses back. I would be interested, if its worth the effort to think more about this. Can you elaborate a bit more on this overhead under the condition of big numbers of goroutines doing stuff in parallel and how it compares to a small number of goroutines doing stuff in parallel? Is there any good material on the goroutine scheduler in its current state?
Okay, I just thought it would be most interesting to look at the behaviour of webservers, when they are under load. So if there is a big number of incoming requests, there will be a lot of blocked go routines, but also a lot of active ones. You said, that if a lot of go routines do a lot of active stuff in parallel, there will be a significant overhead. So I thought maybe it would be better to have a model with a fixed number of go routines, that have specialized professions, like accepting incoming requests, dispatching urls, processing requests, sending responses back. I would be interested, if its worth the effort to think more about this. Can you elaborate a bit more on this overhead under the condition of big numbers of goroutines doing stuff in parallel and how it compares to a small number of goroutines doing stuff in parallel? Is there any good material on the goroutine scheduler in its current state?There are never more than GOMAXPROCS goroutines active in a Go program. All the other goroutines are either runnable; or waiting for some condition which will make them runnable.
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You can investigate this yourself, write a small hello world style Go program, run it with the env var GODEBUG=gctrace=1000 and hit it with some kind of load testing software. The runtime will output a summary of the run queue once per second. Dmitry has a good blog post describing this output in more detail, https://software.intel.com/en-us/blogs/2014/05/10/debugging-performance-issues-in-go-programs.
I didn't mean that the scheduling overhead increases based
on the number of goroutines. I meant that the scheduling overhead
increases based on the amount of work there is to do.
I gave a talk about this at oscon earlier in the year, maybe this blog post will give some useful background. http://dave.cheney.net/2015/08/08/performance-without-the-event-loop
Thanks
Dave
If there is no free OS thread to replace the one that is blocked on a syscall a new one will be created to service goroutines, up to GOMAXPROCS.
The scenario you describe with a poorly written program consuming lots of a OS threads blocked on syscalls is possible, but rarely happens because those would be file system operations which would quickly overwhelm the io subsystem of the host. Also, you'll still run out of file descriptors. Yes, it can happen, possibly easily, but it is also easily debugged.
There is an open issue to use the net packages polled code for local systems operations, is would reduce the number of blocking syscalls.
During my digging I found this paper: http://www1.cs.columbia.edu/~aho/cs6998/reports/12-12-11_DeshpandeSponslerWeiss_GO.pdf but I couldn't find a discussion about it and if the considerations were accepted and implemented.
I know about the netpoller. To be more specific: I mean the OS-thread overhead, by a goroutine, that is blocked on a syscall, that does not get handled by the netpoller. For example a John Doe like me might have a website with a page, that does some kind of syscall, that is not intercepted by the netpoller. I don't think this is so far fetched. So if 10000 connections for this page come in in a short amount of time, the server should crash, because of the thread limit. Would be cool, if someone could falsify that concern.