LockOSThread, switching (Linux kernel) namespaces: what happens to the main thread...?

TheDiveO

unread,

May 26, 2022, 1:34:27 PM5/26/22

to golang-nuts

Even after some research I couldn't find -- or stupidly missed -- what happens in this case; I suspect this to "somehow" be the underlaying cause of some of my applications ending up with their main OS-level thread sitting in the wrong network namespace albeit I made sure to ditch all OS-level threads after being "tainted" by switching namespaces. The problem seems to go away when I immediately lock the main Go routines OS-level thread before doing anything else, but I'm unsure if that is just by chance or a correct cure?

So that's what I'm doing right now:

a goroutine G (which isn't the main Goroutine, the one with ID 1) is scheduled by some arcane Gopher scheduling deity of bad luck onto the main OS-level thread T0 that is also representing (at least on Linux) the process as such. If I remember correctly, this is also termed the "task leader", but I slightly disgress.
now that goroutine G on thread T0 calls runtime.LockOSThread() and say, switches one of its Linux kernel namespaces, maybe its network namespace.
after doing some work, G locked to T0 now simply terminates.

What happens to T0, the initial/leader thread representing the whole program process?

My (albeit limited) understand so far for T OS-level threads that aren't T0 is that when a G terminates while still runtime.LockOSThread'ed, this correctly throws away T as to avoid reusing it on a different unsuspecting G' and with the wrong namespaces set.

But what about T0? Wouldn't throwing away T0 terminate the whole program ... which doesn't seem to be the case? Or am I mistaken here?

Ian Lance Taylor

unread,

May 26, 2022, 1:39:10 PM5/26/22

to TheDiveO, golang-nuts

Once you've created multiple threads in a process, the original thread
(what you are calling T0) can exit without affecting the other
threads.

Ian

TheDiveO

unread,

May 26, 2022, 2:17:38 PM5/26/22

to golang-nuts

On Thursday, May 26, 2022 at 7:39:10 PM UTC+2 Ian Lance Taylor wrote:

> But what about T0? Wouldn't throwing away T0 terminate the whole program ... which doesn't seem to be the case? Or am I mistaken here?

Once you've created multiple threads in a process, the original thread
(what you are calling T0) can exit without affecting the other
threads.

okay, thank you! I've now find something in the Linux procfs man page that seems to support this, as it says that /proc/PID/task becomes unavailable when the task leader/T0 exits.

But this now begs the question: if the task leader T0 has exited and the Go scheduler needs to start a new task: which of the tasks does the scheduler doing the clone run in? Can it happen to be an OS-level locked task? I'm asking because I'm wondering why with my code explicitly locking new goroutines to their OS tasks/threads before switching namespaces and not ever switching back but terminating the goroutine, turning its locked thread into a throw-away task. Yet, I see the T0 task leader suddenly ending up executing non-locked goroutines and with the wrong network namespace. This is on Go 1.18.2, but I might have seen this before on 1.16 or 1.17.

Ian Lance Taylor

unread,

May 26, 2022, 2:51:49 PM5/26/22

to TheDiveO, golang-nuts

If the scheduler is running on a goroutine locked using
runtime.LockOSThread and needs to start a new thread, it does so by
asking a template thread to create it. The template thread doesn't do
anything other than create new threads, so it should be in a clean
state.

Of course, there may be bugs. If you have a reproducible bug please
report it at https://go.dev/issue. Thanks.

Ian

TheDiveO

unread,

May 26, 2022, 3:08:03 PM5/26/22

to golang-nuts

On Thursday, May 26, 2022 at 8:51:49 PM UTC+2 Ian Lance Taylor wrote:

If the scheduler is running on a goroutine locked using
runtime.LockOSThread and needs to start a new thread, it does so by
asking a template thread to create it. The template thread doesn't do
anything other than create new threads, so it should be in a clean
state.

Do you per chance have a link to where I can find the template thread handling in the runtime scheduler?

Ian Lance Taylor

unread,

May 26, 2022, 3:46:26 PM5/26/22

to TheDiveO, golang-nuts

Search for [tT]emplateThread in $GOROOT/src/runtime/proc.go.

Ian

TheDiveO

unread,

May 26, 2022, 3:54:52 PM5/26/22

to golang-nuts

Ian, thank you very much! Found it and am now trying to somehow get my head around it!

TheDiveO

unread,

Jun 1, 2022, 1:44:56 PM6/1/22

to golang-nuts

While exploring more of the proc.go code I noticed that my original question somehow didn't fully reflect what I'm wondering about: what happens in the following situation...? Are all tasks/threads really equal?

a non-main G42 goroutine gets scheduled onto the main thread/leader task, which in Linux represents the whole process. What I called also T0 in the discussion above.
G42 calls runtime.LockOSThread.
G42 terminates/ends.

What now? Terminating T0 doesn't look like a great idea at second look: for instance, as I mentioned above, this causes some problems further down the road, such as things in the procfs for this process becoming inacessible.

Is there a way to trick(?) a non-main goroutine onto T0 as an experiment?

Ian Lance Taylor

unread,

Jun 1, 2022, 9:03:17 PM6/1/22

to TheDiveO, golang-nuts

On Wed, Jun 1, 2022 at 10:45 AM TheDiveO <harald....@gmx.net> wrote:
>
> While exploring more of the proc.go code I noticed that my original question somehow didn't fully reflect what I'm wondering about: what happens in the following situation...? Are all tasks/threads really equal?
>
> a non-main G42 goroutine gets scheduled onto the main thread/leader task, which in Linux represents the whole process. What I called also T0 in the discussion above.
> G42 calls runtime.LockOSThread.
> G42 terminates/ends.
>
> What now? Terminating T0 doesn't look like a great idea at second look: for instance, as I mentioned above, this causes some problems further down the road, such as things in the procfs for this process becoming inacessible.

Can you point to some documentation about this problem, or show a
program where it causes problems.

> Is there a way to trick(?) a non-main goroutine onto T0 as an experiment?

I'm not sure what a "non-main goroutine" is. All goroutines are
basically equivalent. Assuming you mean the initial goroutine, you
can lock that to the initial thread by calling runtime.LockOSThread in
an init function. That should wind up calling the main function with
a goroutine locked to the initial thread. Then you could, for
example, start a new goroutine and then let the initial goroutine
exit.

Ian

TheDiveO

unread,

Jun 2, 2022, 9:19:47 AM6/2/22

to golang-nuts

Hi Ian,

On Thursday, June 2, 2022 at 3:03:17 AM UTC+2 Ian Lance Taylor wrote:

On Wed, Jun 1, 2022 at 10:45 AM TheDiveO wrote:
> What now? Terminating T0 doesn't look like a great idea at second look: for instance, as I mentioned above, this causes some problems further down the road, such as things in the procfs for this process becoming inacessible.

Can you point to some documentation about this problem, or show a
program where it causes problems.

This happens, unfortunately, in a closed source project. However, as I reuse existing OpenSource parts, I can link at least to the basic elements the closed project sits on top of. What happens is that after triggering some service handlers, these handlers fall into one of two types with respect to switching between different network namespaces.

One type uses runtime.LockOSThread, then switches the thread's network namespace, does some things, then switches back to its "saved" original network namespace, and finally runtime.UnlockOSThread ... so that the underlying thread/task can be freely reused as it is untainted (again). The generic namespace-switching functionality can be found here: https://github.com/thediveo/lxkns/blob/develop/ops/switchns.go#L172

Another type uses new goroutines for its tasks which in turn do runtime.LockOSThread, then switch their threads's network namespaces, but never runtime.UnlockOSThread and immediately exit after handing over their results via channels to the waiting service handler. The locking and never unlocking method can be seen here: https://github.com/thediveo/lxkns/blob/develop/ops/switchns.go#L144 respectively https://github.com/thediveo/lxkns/blob/develop/ops/switchns.go#L92.

These basic namespace-switching building blocks are instrumented with test cases that additionally check that all threads/tasks have been properly restored at the end of the tests, using https://github.com/thediveo/namspill. So far, these tests could not trigger the situation I see in the closed project. I've additionally instrumented tests for the closed product, also using the namspill checker in unit tests -- unfortunately, I could never trigger the behavior in the existing tests that gets triggered in production. While I can reproducible trigger the invalid situation in the product, I've unfortunately so far not managed to come up with a corresponding simplified test.

I'm not suspecting that there is a Go runtime bug but I want to understand what is happening in order to take appropriate measures to correctly avoid the situation where the Go service's initial thread/task T0 ends up with a switched network namespace, whereas this should never happen (=famous last words).

> Is there a way to trick(?) a non-main goroutine onto T0 as an experiment?

I'm not sure what a "non-main goroutine" is. All goroutines are
basically equivalent. Assuming you mean the initial goroutine, you
can lock that to the initial thread by calling runtime.LockOSThread in
an init function. That should wind up calling the main function with
a goroutine locked to the initial thread. Then you could, for
example, start a new goroutine and then let the initial goroutine
exit.

You're correct: I was thinking about the initial goroutine.

The initial goroutine also calls main, but this can happen on any thread/task ... so far correct now?

Since from the perspective of Linux the leader task represents a process and thus the process-related (and not task-related) OS-managed resources. Thus I would assume that terminating the leader task should be avoided: what is the process return code when terminating the leader task and leave other tasks running? The exit code of the last task standing?

Thus, I'm wondering how Go's scheduler can keep all threads/tasks fully symmetrical when the Linux kernel enforces a certain asymmetry upon tasks? Might there be a rule that is in the end causing T0 not to be terminated despite being locked to a non-initial goroutine and this non-initial goroutine terminates?

Ian Lance Taylor

unread,

Jun 2, 2022, 11:48:18 AM6/2/22

to TheDiveO, golang-nuts

On Thu, Jun 2, 2022 at 6:20 AM TheDiveO <harald....@gmx.net> wrote:
>
> On Thursday, June 2, 2022 at 3:03:17 AM UTC+2 Ian Lance Taylor wrote:
>>
>> On Wed, Jun 1, 2022 at 10:45 AM TheDiveO wrote:
>> > What now? Terminating T0 doesn't look like a great idea at second look: for instance, as I mentioned above, this causes some problems further down the road, such as things in the procfs for this process becoming inacessible.
>>
>> Can you point to some documentation about this problem, or show a
>> program where it causes problems.
>
>
> This happens, unfortunately, in a closed source project. However, as I reuse existing OpenSource parts, I can link at least to the basic elements the closed project sits on top of. What happens is that after triggering some service handlers, these handlers fall into one of two types with respect to switching between different network namespaces.
>
> One type uses runtime.LockOSThread, then switches the thread's network namespace, does some things, then switches back to its "saved" original network namespace, and finally runtime.UnlockOSThread ... so that the underlying thread/task can be freely reused as it is untainted (again). The generic namespace-switching functionality can be found here: https://github.com/thediveo/lxkns/blob/develop/ops/switchns.go#L172
>
> Another type uses new goroutines for its tasks which in turn do runtime.LockOSThread, then switch their threads's network namespaces, but never runtime.UnlockOSThread and immediately exit after handing over their results via channels to the waiting service handler. The locking and never unlocking method can be seen here: https://github.com/thediveo/lxkns/blob/develop/ops/switchns.go#L144 respectively https://github.com/thediveo/lxkns/blob/develop/ops/switchns.go#L92.
>
> These basic namespace-switching building blocks are instrumented with test cases that additionally check that all threads/tasks have been properly restored at the end of the tests, using https://github.com/thediveo/namspill. So far, these tests could not trigger the situation I see in the closed project. I've additionally instrumented tests for the closed product, also using the namspill checker in unit tests -- unfortunately, I could never trigger the behavior in the existing tests that gets triggered in production. While I can reproducible trigger the invalid situation in the product, I've unfortunately so far not managed to come up with a corresponding simplified test.
>
> I'm not suspecting that there is a Go runtime bug but I want to understand what is happening in order to take appropriate measures to correctly avoid the situation where the Go service's initial thread/task T0 ends up with a switched network namespace, whereas this should never happen (=famous last words).

Thanks.

Can you point to any documentation about the problem. Earlier you
mentioned that there was something in the procfs man page. I didn't
see it, but as the procfs man page is very large I'm sure I just
missed it.

>> > Is there a way to trick(?) a non-main goroutine onto T0 as an experiment?
>>
>> I'm not sure what a "non-main goroutine" is. All goroutines are
>> basically equivalent. Assuming you mean the initial goroutine, you
>> can lock that to the initial thread by calling runtime.LockOSThread in
>> an init function. That should wind up calling the main function with
>> a goroutine locked to the initial thread. Then you could, for
>> example, start a new goroutine and then let the initial goroutine
>> exit.
>
>
> You're correct: I was thinking about the initial goroutine.
>
> The initial goroutine also calls main, but this can happen on any thread/task ... so far correct now?

If you call runtime.LockOSThread in an init function, as I mentioned
above, then the initial goroutine will be the one that calls the main
function.

> Since from the perspective of Linux the leader task represents a process and thus the process-related (and not task-related) OS-managed resources. Thus I would assume that terminating the leader task should be avoided: what is the process return code when terminating the leader task and leave other tasks running? The exit code of the last task standing?
>
> Thus, I'm wondering how Go's scheduler can keep all threads/tasks fully symmetrical when the Linux kernel enforces a certain asymmetry upon tasks? Might there be a rule that is in the end causing T0 not to be terminated despite being locked to a non-initial goroutine and this non-initial goroutine terminates?

I am still trying to understand whether there really is an asymmetry.
It's true that the Go scheduler assumes that there is no such
asymmetry. If that is incorrect, then we need to fix it. But first
we need a test case or at least some documentation.

Ian

TheDiveO

unread,

Jun 2, 2022, 12:21:11 PM6/2/22

to golang-nuts

Hi Ian,

On Thursday, June 2, 2022 at 5:48:18 PM UTC+2 Ian Lance Taylor wrote:

Can you point to any documentation about the problem. Earlier you
mentioned that there was something in the procfs man page. I didn't
see it, but as the procfs man page is very large I'm sure I just
missed it.

You're not the only one ... it took me years to finally really grok the ramifications of /proc/$PID/root but then it simplified some things regarding accessing other mount namespaces significantly and gives security people nightmares.

Alas, examples, as Michael Kerrisk's well-written man pages don't support deep linking, unfortunately. On the quick I found these six examples:

/proc/[pid]/cwd: "In a multithreaded process, the contents of this symbolic link are not available if the main thread has already terminated (typically by calling pthread_exit(3))."
/proc/[pid]/exe: "In a multithreaded process, the contents of this symbolic link are not available if the main thread has already terminated (typically by calling pthread_exit(3))." (Michael is copy and pasting here)
/proc/[pid]/fd/: "In a multithreaded process, the contents of this symbolic link are not available if the main thread has already terminated (typically by calling pthread_exit(3))." (me envisioning Michael starting to wear out Ctrl-V)
(I think he forgot /proc/[pid]/fdinfo/ as one is the other's evil twin; need to poke him in Munich some day)
/proc/[pid]/root: "In a multithreaded process, the contents of this symbolic link are not available if the main thread has already terminated (typically by calling pthread_exit(3))."
/proc/[pid]/task: guess what ... yes ... "In a multithreaded process, the contents of this symbolic link are not available if the main thread has already terminated (typically by calling pthread_exit(3))."

While probably the vast majority of Go programs on Linux will never worry about this, system-related tools to some extend will be affected when needing to access the above parts of the /proc filesystem while using goroutines and LockOSThread without unlocking in order to drop tainted threads/tasks.

If you call runtime.LockOSThread in an init function, as I mentioned
above, then the initial goroutine will be the one that calls the main
function.

Will this be guaranteed to be on my mythical "T0", the leader thread (also termed thread group leader)? Or could this be another thread?

I am still trying to understand whether there really is an asymmetry.
It's true that the Go scheduler assumes that there is no such
asymmetry. If that is incorrect, then we need to fix it. But first
we need a test case or at least some documentation.

There is some asymmetry ... however, I recon the scheduler won't need to consider this as the application developer could ensure proper operation by locking the initial goroutine to the leader task/initial thread -- iff the assumption holds true that the initial goroutine is being executed on the leader task/initial thread.

This looks like a good assumption to me unless proven otherwise; unfortunately, that still doesn't explain why my app in production ends up with the initial goroutine/leader task/thread staying in a wrong network namespace and not the one it started in. Of course, there's a really good chance that the mistake is on my side and I even think that there's no 3rd party module involved because I bypass the namespace switching mechanism of the one well-known rtnetlink Go module and instead wrap it into my own implementation that is unit tested as to such hiccups. Also, I have rigid error reporting and logging in place (famous last words of inflated developer self esteem) as I considered switching back namespaces to be a severe issue right from the beginning. Nevertheless, I might well have totally goofed up.

Ian Lance Taylor

unread,

Jun 2, 2022, 12:28:34 PM6/2/22

to TheDiveO, golang-nuts

On Thu, Jun 2, 2022 at 9:21 AM TheDiveO <harald....@gmx.net> wrote:
>
> On Thursday, June 2, 2022 at 5:48:18 PM UTC+2 Ian Lance Taylor wrote:
>>
>> Can you point to any documentation about the problem. Earlier you
>> mentioned that there was something in the procfs man page. I didn't
>> see it, but as the procfs man page is very large I'm sure I just
>> missed it.
>
>
> You're not the only one ... it took me years to finally really grok the ramifications of /proc/$PID/root but then it simplified some things regarding accessing other mount namespaces significantly and gives security people nightmares.
>
> Alas, examples, as Michael Kerrisk's well-written man pages don't support deep linking, unfortunately. On the quick I found these six examples:
>
> /proc/[pid]/cwd: "In a multithreaded process, the contents of this symbolic link are not available if the main thread has already terminated (typically by calling pthread_exit(3))."
> /proc/[pid]/exe: "In a multithreaded process, the contents of this symbolic link are not available if the main thread has already terminated (typically by calling pthread_exit(3))." (Michael is copy and pasting here)
> /proc/[pid]/fd/: "In a multithreaded process, the contents of this symbolic link are not available if the main thread has already terminated (typically by calling pthread_exit(3))." (me envisioning Michael starting to wear out Ctrl-V)
> (I think he forgot /proc/[pid]/fdinfo/ as one is the other's evil twin; need to poke him in Munich some day)
> /proc/[pid]/root: "In a multithreaded process, the contents of this symbolic link are not available if the main thread has already terminated (typically by calling pthread_exit(3))."
> /proc/[pid]/task: guess what ... yes ... "In a multithreaded process, the contents of this symbolic link are not available if the main thread has already terminated (typically by calling pthread_exit(3))."
>
> While probably the vast majority of Go programs on Linux will never worry about this, system-related tools to some extend will be affected when needing to access the above parts of the /proc filesystem while using goroutines and LockOSThread without unlocking in order to drop tainted threads/tasks.

Thanks for the pointers.

Would you be able to open an issue about this at https://go.dev/issue? Thanks.

>> If you call runtime.LockOSThread in an init function, as I mentioned
>> above, then the initial goroutine will be the one that calls the main
>> function.
>
>
> Will this be guaranteed to be on my mythical "T0", the leader thread (also termed thread group leader)? Or could this be another thread?

Yes, this goroutine will be running on the initial process thread.

Ian

TheDiveO

unread,

Jun 2, 2022, 5:17:10 PM6/2/22

to golang-nuts

I've created https://github.com/golang/go/issues/53210 and the answer is already in! The initial thread m0 is actually handled differently in that it doesn't get terminated but, to use the runtime terminology, "wedged". Now this explains what I'm seeing in production, because by some chance m0 got scheduled onto one of the transient goroutines that switch namespaces, but never unlock and just terminate. What I couldn't see was that the task group leader got "wedged". Following from this is that locking m0 to the initisl goroutine in an init function avoids it ever getting scheduled onto one of the transient namespace switching goroutines.

Reply all

Reply to author

Forward