Questions on the 'g', 'p', and 'm' structures, and how they relate to the Garbage Collector.

Kyle Stanly

unread,

Jun 17, 2016, 12:26:12 PM6/17/16

to golang-dev

I see the terms 'g' and 'p' used a lot, never really 'm' but I may as well get it out of the way now. These structures are defined in runtime2.go. The main reason I'm asking specifically about these structures is that they are used almost everywhere in the runtime, and I need to know exactly how they work before I can proceed.

What is the 'g' structure?

From what I've heard before, the 'g' is a structure used to keep track of a Goroutine's stack (and StackGuard to prevent buffer overflows) and state, kept in TLS. So I'm assuming this structure, pretty much is how a Goroutine is implemented in Go.

However, I can tell whatever this structure does is MASSIVE, and I want to know precisely how it works. It's not like there's an article describing how the runtime works at a low-level (... or is there?), hence I have to ask it here.

1) How precisely does the 'g' interact with the Garbage Collector? From what I have read (and its been a lot so bear with me if I miss a few details), it has stack barriers in place to help during the _GCmark and _GCtermination phase. These stack barriers effectively overwrite the original return address to a "stack barrier trampoline", which after researching what a trampoline is and looking at further documentation, it basically jumps to a function, or a 'thunk' (of which the actual details I am very iffy on) which effectively keeps track of what stack barriers have been hit. This is so the GC knows precisely what areas of the stack to re-scan, for efficiency reasons of course.

Now, all of this revolves around "GC Safe Points", which is pretty much where the GC deems it safe to analyze the "true roots", before allowing the mutators (I.E Goroutines) to continue. However, these "Safe Points" apparently, as described in the synchronization section of runtime/mstkbar.go, are visited only when installing stack barriers(?). Am I correct in that these safe-points are visited from transition to phase GCoff to GCscan only? It seems that "Stop-The-World" is done this way? (Making all 'g' Goroutines stop at safe-points until it is ready to proceed past GCscan phase)

2) Am I right in assuming that the 'g' is quite literally the only thread structure available to Go? In the introduction for the GC, it states "The GC runs concurrently with mutator threads", and the 'g' structures holds data relevant to the GC mutator assist algorithm.

3) How exactly does the 'g' help in race detection? How can you detect race conditions based on the 'g', or with the help of the 'g'? It has a field called 'raceignore', which if true, ignores race detection events. Am I correct in assuming that if a race condition has been detected, it will throw a signal to the specific thread (The 'g')?

4) How are program counters used in general? They are everywhere, but I've no idea how they work. My limited understanding, the PC can be used as a direct offset into the PC-Value table to receive meta-data that makes stack-unwinding easier/possible. However, this seems to only apply to a base pointer (or frame pointer), so I assume that the other PC is used as the return address, or for debugging purposes? I.E, 'gopc' field is the PC of the Go statement that created the goroutine, hence I'm assuming that is just used to debug where it was created. 'startpc' field is PC of the actual function to call, which can be looked up in the PC-Value table.

What is the 'p' structure?

Now its beginning to get a bit more difficult to understand. 'g' could stand for goroutine, but 'p'? Does 'p' stand for 'process'? It contains a lot of information a process should have, so maybe. It has with it a 'p.id' => 'pid', so it starts to click a bit more.

Oh whats this? The 'p' has a run-queue for it's goroutines, so maybe it is a process scheduling it's threads... except on some systems, I.E Linux, there is no difference between a Thread and a Process. Hence would it be a thread scheduling other threads then? Should I have a look further at the scheduler Go uses?

1) Is 'p' a process with it's own scheduler for the threads/Goroutines it spawns?

2) How does this work on systems where there is no difference between a process and a thread besides for whether memory is shared or how "lightweight" it is?

3) What precisely does it cache with mcache? Which leads to my next question...

What is the 'm' structure?

Does the 'm' stand for... memory? It would kind of make sense since ti consists of a lot of buffers, but that's not right... it has 'procid'... hm, maybe this is the actual process? It can block, has directly access to TLS, and... what's this? A 'thread handle'? Field 'thread' is a pointer to a thread... so 'g' can't be a thread in and of itself, right? Otherwise it would use a guintptr...

1) Is the 'm' an actual process/thread? What relation does it have to everything else?

Brendan Tracey

unread,

Jun 17, 2016, 12:30:12 PM6/17/16

to Kyle Stanly, golang-dev

A good place to start is the 1.1 design document by Dmitry https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit

With this accompanying blog post by Daniel Morsing

https://morsmachine.dk/go-scheduler

I’ll leave the rest of the questions to someone qualified.

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kyle Stanly

unread,

Jun 17, 2016, 12:34:26 PM6/17/16

to golang-dev, thei...@gmail.com

Whoa, didn't expect there to be a design document. Is there a Hub on all of these articles and documents I can search? Thanks for the link, I'll read it now.

Kyle Stanly

unread,

Jun 17, 2016, 1:08:04 PM6/17/16

to golang-dev, thei...@gmail.com

Okay, I read it, Now I understand...

P -> Processor (Context)

G -> Goroutine (Function)

M -> OS/Machine Thread (Worker thread)

I also found myself questioning "Why not just use pthreads or the native threads like many other languages do", and now I see why there can be so many Goroutines at once. M has a context, P, which contains the many Goroutines to run. It schedules based on this run-queue of Goroutines to effectively run each Goroutine like it's own thread. This illusion of having "Tens of Thousands" of threads is just like the illusion of having multi-tasking on a single processor. I'm assuming as well that because there's 1 M per P (One OS thread per Processor context), that they are directly mapped to the processor themselves, hence have a much longer time-slice and distribute this among it's many Goroutines?

On Friday, June 17, 2016 at 12:30:12 PM UTC-4, Brendan Tracey wrote:

Keith Randall

unread,

Jun 17, 2016, 1:31:45 PM6/17/16

to Kyle Stanly, golang-dev

On Fri, Jun 17, 2016 at 10:08 AM, Kyle Stanly <thei...@gmail.com> wrote:

Okay, I read it, Now I understand...

P -> Processor (Context)
G -> Goroutine (Function)
M -> OS/Machine Thread (Worker thread)

Right.

I also found myself questioning "Why not just use pthreads or the native threads like many other languages do", and now I see why there can be so many Goroutines at once. M has a context, P, which contains the many Goroutines to run. It schedules based on this run-queue of Goroutines to effectively run each Goroutine like it's own thread. This illusion of having "Tens of Thousands" of threads is just like the illusion of having multi-tasking on a single processor. I'm assuming as well that because there's 1 M per P (One OS thread per Processor context), that they are directly mapped to the processor themselves, hence have a much longer time-slice and distribute this among it's many Goroutines?

There is at most one *running* M per P. There can be additional Ms blocked in system calls which don't count against this limit.

Yes, Ms grab a G, run that G until it reaches a blocking event (finishes, does a channel recieve on an empty channel, ...), then grab another G and repeat.

Ms need to acquire a P to get permission to run. When an M goes into a system call, it gives up its P. On return from a system call, it must reacquire a P to continue running. This is how we enforce GOMAXPROCS, by allocating GOMAXPROCS Ps and thus allowing only that many Ms to run (and thus only that many Gs running at once).

Austin Clements

unread,

Jun 17, 2016, 1:42:35 PM6/17/16

to Keith Randall, Kyle Stanly, golang-dev

On Fri, Jun 17, 2016 at 1:31 PM, 'Keith Randall' via golang-dev <golan...@googlegroups.com> wrote:

On Fri, Jun 17, 2016 at 10:08 AM, Kyle Stanly <thei...@gmail.com> wrote:
Okay, I read it, Now I understand...

P -> Processor (Context)
G -> Goroutine (Function)
M -> OS/Machine Thread (Worker thread)

Right.

I also found myself questioning "Why not just use pthreads or the native threads like many other languages do", and now I see why there can be so many Goroutines at once. M has a context, P, which contains the many Goroutines to run. It schedules based on this run-queue of Goroutines to effectively run each Goroutine like it's own thread. This illusion of having "Tens of Thousands" of threads is just like the illusion of having multi-tasking on a single processor. I'm assuming as well that because there's 1 M per P (One OS thread per Processor context), that they are directly mapped to the processor themselves, hence have a much longer time-slice and distribute this among it's many Goroutines?

There is at most one *running* M per P. There can be additional Ms blocked in system calls which don't count against this limit.
Yes, Ms grab a G, run that G until it reaches a blocking event (finishes, does a channel recieve on an empty channel, ...), then grab another G and repeat.
Ms need to acquire a P to get permission to run. When an M goes into a system call, it gives up its P. On return from a system call, it must reacquire a P to continue running. This is how we enforce GOMAXPROCS, by allocating GOMAXPROCS Ps and thus allowing only that many Ms to run (and thus only that many Gs running at once).

Just to add to that, I find it most useful to think of a P as a "resource": holding a P gives you the right to run user Go code. There's a fixed-size pool of them equal in size to GOMAXPROCS. If an M wants to start running user Go code, it must first acquire ownership of a P from this pool. If an M isn't running user Go code (e.g., it's going into a syscall), it can return its P to this pool.

Egon Elbre

unread,

Jun 17, 2016, 1:49:49 PM6/17/16

to golang-dev, thei...@gmail.com

On Friday, 17 June 2016 19:34:26 UTC+3, Kyle Stanly wrote:

Whoa, didn't expect there to be a design document. Is there a Hub on all of these articles and documents I can search? Thanks for the link, I'll read it now.

https://github.com/golang/proposal/tree/master/design

https://github.com/golang/go/wiki/DesignDocuments

Kyle Stanly

unread,

Jun 17, 2016, 2:19:38 PM6/17/16

to golang-dev, thei...@gmail.com

That's actually perfect, although I notice it is missing that document original document Brendan sent. Are they not all uploaded there?

Kyle Stanly

unread,

Jun 17, 2016, 3:43:06 PM6/17/16

to golang-dev

Oh yeah, guys I have one more question, not on the structures mentioned above, but on some terminology. What is a slot? As in, I hear pointer and stack slots a lot, but what are they, conceptually? How are they represented?

Austin Clements

unread,

Jun 17, 2016, 4:00:07 PM6/17/16

to Kyle Stanly, golang-dev

A "slot" just means a pointer-sized, pointer-aligned word in memory. E.g., the garbage collector cares about "stack slots" when it's scanning the stack because a given pointer-sized, pointer-aligned word is either a pointer or a non-pointer. You could have a bunch of byte-typed values packed into a single "stack slot", but the garbage collector would just see that as one non-pointer slot it doesn't have to look at. Likewise, the write barrier talks about the "slot" that you're writing a pointer to, which is just the memory you're writing the pointer to.

Kyle Stanly

unread,

Jun 22, 2016, 9:57:25 AM6/22/16

to golang-dev

Hey guys, once again, appreciate all of the replies, however my next question KIND of pertains to the 'g'.

What is the 'sudog'

I understand it's a wait-list, kind of like a turnstile in FreeBSD (although I am not 100% familiar with those either... they don't teach you this stuff in school, I swear), however what I want to know is...

1) Where do the 'g' get added and/or removed from the 'sudog'? I'm assuming whenever there is a lock to be acquired or released, however I do require a bit of clarification. For example, if I wanted to do a little spin-lock gesture, should I acquire a sudog, or increment m.locks (which IIRC all it does is avoid being preempted). If I did not acquire and use a sudog, what consequences could arise from this? Etc. Pretty much what I need to know is when and why do I need it, and how should it be used.

Internal structure of 'g'

1) Does the compiler or runtime make any assumptions to the 'g'? Would it be safe to add new fields to it?

2) Is the pointer returned from 'getg' always constant? Is the 'goid' always unique? Can these be used to identify one 'g' from another?

On Friday, June 17, 2016 at 12:26:12 PM UTC-4, Kyle Stanly wrote:

Austin Clements

unread,

Jun 22, 2016, 12:54:26 PM6/22/16

to Kyle Stanly, golang-dev

On Wed, Jun 22, 2016 at 9:57 AM, Kyle Stanly <thei...@gmail.com> wrote:

Hey guys, once again, appreciate all of the replies, however my next question KIND of pertains to the 'g'.

What is the 'sudog'

I understand it's a wait-list, kind of like a turnstile in FreeBSD (although I am not 100% familiar with those either... they don't teach you this stuff in school, I swear), however what I want to know is...

1) Where do the 'g' get added and/or removed from the 'sudog'? I'm assuming whenever there is a lock to be acquired or released, however I do require a bit of clarification. For example, if I wanted to do a little spin-lock gesture, should I acquire a sudog, or increment m.locks (which IIRC all it does is avoid being preempted). If I did not acquire and use a sudog, what consequences could arise from this? Etc. Pretty much what I need to know is when and why do I need it, and how should it be used.

A sudog is just a linked list node in the list of Gs blocked on a channel or user lock. You don't need to acquire a sudog for a spin lock (or a runtime-internal lock), since those don't involve the Goroutine scheduler, and there's no list you would put that sudog on.

You're right that incrementing m.locks prevents preemption. You probably shouldn't do this directly, though; use acquirem and releasem.

Internal structure of 'g'

1) Does the compiler or runtime make any assumptions to the 'g'? Would it be safe to add new fields to it?

Various things make assumptions about the first few fields in type g (see the "offset known" comments), but after these it's fine to add new fields.

2) Is the pointer returned from 'getg' always constant? Is the 'goid' always unique? Can these be used to identify one 'g' from another?

What do you mean by "constant"? A given g object doesn't move, so if you're running on that goroutine, getg() will always return the same thing. Though do note that if you use systemstack, that's implemented as a sort of goroutine switch, and getg will return the g0 (the system goroutine) of the current m while you're on the system stack. If you always want the current "user" goroutine whether you're running on the user stack or the system stack, use getg().m.curg. You'll see that all over the runtime.

The goid is unique at any instant. I don't think it's guaranteed to be unique for all time.

On Friday, June 17, 2016 at 12:26:12 PM UTC-4, Kyle Stanly wrote:
I see the terms 'g' and 'p' used a lot, never really 'm' but I may as well get it out of the way now. These structures are defined in runtime2.go. The main reason I'm asking specifically about these structures is that they are used almost everywhere in the runtime, and I need to know exactly how they work before I can proceed.

What is the 'g' structure?

From what I've heard before, the 'g' is a structure used to keep track of a Goroutine's stack (and StackGuard to prevent buffer overflows) and state, kept in TLS. So I'm assuming this structure, pretty much is how a Goroutine is implemented in Go.

However, I can tell whatever this structure does is MASSIVE, and I want to know precisely how it works. It's not like there's an article describing how the runtime works at a low-level (... or is there?), hence I have to ask it here.

1) How precisely does the 'g' interact with the Garbage Collector? From what I have read (and its been a lot so bear with me if I miss a few details), it has stack barriers in place to help during the _GCmark and _GCtermination phase. These stack barriers effectively overwrite the original return address to a "stack barrier trampoline", which after researching what a trampoline is and looking at further documentation, it basically jumps to a function, or a 'thunk' (of which the actual details I am very iffy on) which effectively keeps track of what stack barriers have been hit. This is so the GC knows precisely what areas of the stack to re-scan, for efficiency reasons of course.

Now, all of this revolves around "GC Safe Points", which is pretty much where the GC deems it safe to analyze the "true roots", before allowing the mutators (I.E Goroutines) to continue. However, these "Safe Points" apparently, as described in the synchronization section of runtime/mstkbar.go, are visited only when installing stack barriers(?). Am I correct in that these safe-points are visited from transition to phase GCoff to GCscan only? It seems that "Stop-The-World" is done this way? (Making all 'g' Goroutines stop at safe-points until it is ready to proceed past GCscan phase)

2) Am I right in assuming that the 'g' is quite literally the only thread structure available to Go? In the introduction for the GC, it states "The GC runs concurrently with mutator threads", and the 'g' structures holds data relevant to the GC mutator assist algorithm.

3) How exactly does the 'g' help in race detection? How can you detect race conditions based on the 'g', or with the help of the 'g'? It has a field called 'raceignore', which if true, ignores race detection events. Am I correct in assuming that if a race condition has been detected, it will throw a signal to the specific thread (The 'g')?

4) How are program counters used in general? They are everywhere, but I've no idea how they work. My limited understanding, the PC can be used as a direct offset into the PC-Value table to receive meta-data that makes stack-unwinding easier/possible. However, this seems to only apply to a base pointer (or frame pointer), so I assume that the other PC is used as the return address, or for debugging purposes? I.E, 'gopc' field is the PC of the Go statement that created the goroutine, hence I'm assuming that is just used to debug where it was created. 'startpc' field is PC of the actual function to call, which can be looked up in the PC-Value table.

What is the 'p' structure?

Now its beginning to get a bit more difficult to understand. 'g' could stand for goroutine, but 'p'? Does 'p' stand for 'process'? It contains a lot of information a process should have, so maybe. It has with it a 'p.id' => 'pid', so it starts to click a bit more.

Oh whats this? The 'p' has a run-queue for it's goroutines, so maybe it is a process scheduling it's threads... except on some systems, I.E Linux, there is no difference between a Thread and a Process. Hence would it be a thread scheduling other threads then? Should I have a look further at the scheduler Go uses?

1) Is 'p' a process with it's own scheduler for the threads/Goroutines it spawns?

2) How does this work on systems where there is no difference between a process and a thread besides for whether memory is shared or how "lightweight" it is?

3) What precisely does it cache with mcache? Which leads to my next question...

What is the 'm' structure?

Does the 'm' stand for... memory? It would kind of make sense since ti consists of a lot of buffers, but that's not right... it has 'procid'... hm, maybe this is the actual process? It can block, has directly access to TLS, and... what's this? A 'thread handle'? Field 'thread' is a pointer to a thread... so 'g' can't be a thread in and of itself, right? Otherwise it would use a guintptr...

1) Is the 'm' an actual process/thread? What relation does it have to everything else?

Kyle Stanly

unread,

Jun 22, 2016, 1:20:50 PM6/22/16

to golang-dev, thei...@gmail.com

You're right that incrementing m.locks prevents preemption. You probably shouldn't do this directly, though; use acquirem and releasem.

Now this is a big one here, you probably saved me quite a bit of time debugging some undefined behavior later on down the line. What about the whole "Must be NOSPLIT, must only call NOSPLIT functions, and must not block" comment above it? Does this mean that my function for spinning should use the '//go:nosplit' annotation?

Various things make assumptions about the first few fields in type g (see the "offset known" comments), but after these it's fine to add new fields.

Another big one here. I was almost terrified to modify anything. This would allow me to add another field, preferably, a uintptr for generality, that I can use for some Goroutine-Local Storage (Since TLS naturally isn't safe as a 'g' can be moved across different machine threads).

If you always want the current "user" goroutine whether you're running on the user stack or the system stack, use getg().m.curg. You'll see that all over the runtime.

Golden right here. I did see it all over the runtime, but didn't know precisely what it was for. I knew it was used to detect if gp == gp.m.curg, but could never figure out what that would imply.

Ian Lance Taylor

unread,

Jun 22, 2016, 5:00:05 PM6/22/16

to Kyle Stanly, golang-dev

On Wed, Jun 22, 2016 at 10:20 AM, Kyle Stanly <thei...@gmail.com> wrote:
>> You're right that incrementing m.locks prevents preemption. You probably
>> shouldn't do this directly, though; use acquirem and releasem.
>
>
> Now this is a big one here, you probably saved me quite a bit of time
> debugging some undefined behavior later on down the line. What about the
> whole "Must be NOSPLIT, must only call NOSPLIT functions, and must not
> block" comment above it? Does this mean that my function for spinning should
> use the '//go:nosplit' annotation?

That comment means that acquirem and the other functions in that block
must be nosplit and must only call nosplit functions. If you are not
modifying those functions themselves, it's not something you need to
worry about.

You should use //go:nosplit on your function if it should not check
that stack guard. That is true if it is called directly by a nosplit
function (with some exceptions, namely those functions that set up the
stack guard) or if it is called at program startup or from a signal
handler.

Ian

Reply all

Reply to author

Forward