[PATCH] sched: enforce tls register refresh after thread switch

Waldemar Kozaczuk

unread,

Feb 8, 2021, 12:55:41 AM2/8/21

to osv...@googlegroups.com, Waldemar Kozaczuk

On RISC architecture like aarch64 the thread register tpidr_el0 that
holds an address of thread local storage memory for current thread,
is never accessed directly in one instruction like on x86_64. Instead
compiler generates machone code that copies value of the tpidr_el0
to another generic register when reading and does the opposite
when writing.

As the issue #1121 explains, the reschedule_from_interrupt() method
uses thread local variable 'current_cpu' to detect any pending thread to
de destroyed for current cpu after switching from old thread
to a new one. It must use TLS variable instead of any stack variables
because those would be invalid at this point as they would be on the
stack of the old thread switched from.

This all works fine on x86_64, where every read or write of
thread register fs is performed in one instruction like so:

mov %fs:0xfffffffffffff948,%rax

On aarch64, the value of the thread register tpidr_el0 is copied
once to another register x21 before access of first TLS variable in this
method and never refreshed after that. When reschedule_from_interrupt()
calls switch_to() to switch to new thread, the update of the tpidr_el0
register is done in assembly and compiler has no idea that we are
changing TLS register. So after return to reschedule_from_interrupt()
before it calls cpu::current(), the register x21 still holds old
value of tpidr_el0 register.

In most cases this does not cause any harm, as typically the old
and new thread holds the same current_cpu address. But it leads to very
ugly crashes when sched::pin() is used which calls
reschedule_from_interrupt() after changing its current_cpu to a new one
it is pinning the current thread to.

So in order to fix this problem, this patch delegates the code to
destroy current cpu terminating thread to a new class method
destroy_current_cpu_thread() to enforce fresh reading of the TLS
register.

Fixes #1121

Signed-off-by: Waldemar Kozaczuk <jwkoz...@gmail.com>
---
core/sched.cc | 19 ++++++++++++++-----
include/osv/sched.hh | 1 +
2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/core/sched.cc b/core/sched.cc
index 06f849d1..45427cac 100644
--- a/core/sched.cc
+++ b/core/sched.cc
@@ -229,6 +229,13 @@ void cpu::schedule()
}
}

+void __attribute__ ((noinline)) cpu::destroy_current_cpu_thread() {
+ if (cpu::current()->terminating_thread) {
+ cpu::current()->terminating_thread->destroy();
+ cpu::current()->terminating_thread = nullptr;
+ }
+}
+
void cpu::reschedule_from_interrupt(bool called_from_yield,
thread_runtime::duration preempt_after)
{
@@ -338,15 +345,17 @@ void cpu::reschedule_from_interrupt(bool called_from_yield,
}
n->switch_to();

- // Note: after the call to n->switch_to(), we should no longer use any of
+ // Note 1: after the call to n->switch_to(), we should no longer use any of
// the local variables, nor "this" object, because we just switched to n's
// stack and the values we can access now are those that existed in the
// reschedule call which scheduled n out, and will now be returning.
// So to get the current cpu, we must use cpu::current(), not "this".
- if (cpu::current()->terminating_thread) {
- cpu::current()->terminating_thread->destroy();
- cpu::current()->terminating_thread = nullptr;
- }
+ // Note 2: in addition, in order to destroy any pending thread to be destroyed
+ // for current CPU, we delegate to a class static method destroy_current_cpu_thread
+ // to enforce that TLS register is refreshed before we reference current cpu
+ // using new thread TLS variable we switched to. This is necessary for architectures
+ // like aarch64 where tpidr_el0 register is copied to other one before use.
+ destroy_current_cpu_thread();
}

void cpu::timer_fired()
diff --git a/include/osv/sched.hh b/include/osv/sched.hh
index 0fb44f77..78fed622 100644
--- a/include/osv/sched.hh
+++ b/include/osv/sched.hh
@@ -886,6 +886,7 @@ struct cpu : private timer_base::client {
*/
void reschedule_from_interrupt(bool called_from_yield = false,
thread_runtime::duration preempt_after = thyst);
+ static void destroy_current_cpu_thread();
void enqueue(thread& t);
void init_idle_thread();
virtual void timer_fired() override;
--
2.29.2

Nadav Har'El

unread,

Feb 8, 2021, 9:24:25 AM2/8/21

to Waldemar Kozaczuk, Osv Dev

Nice catch, but it's really sad we need to call another function, which slows down the context switch (even if a bit) also on x86 where this isn't needed at all.

I think there must be a better solution:

You are saying that the problem arises because switch_to() set one register, but didn't set its copy to the same value.

So, can't we change switch_to() - on aarch64 only - to also make this additional copy?

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/20210208055529.1339473-1-jwkozaczuk%40gmail.com.

Waldek Kozaczuk

unread,

Feb 8, 2021, 11:58:56 AM2/8/21

to OSv Development

The problem is that we do not know which generic register compiler is going to use. And it decides it in reschedule_from_interrupt() before this line:

thread* p = thread::current();

which uses TLS access. So switch_to() (which is not even inlined function) would have no way of knowing which register to copy to.

Another alternative is to make this code ARCH dependant with #ifdef so we can keep the x86_64 code and change aarch64 only.

Ideally, if there was a compiler directive to enforce re-read of the tpidr_el0 register.

Reply all

Reply to author

Forward