v2: Make this patch less likely to be really bad on a non-benchmark
workload: limit the spinning to 1,000 instead of 10,000, and stop
spinning if the lock holder is sleeping.
When a thread locks an already-held mutex, it goes to sleep, which incurs
the penalty of a context switch, and on SMP, likely also a slow cross-CPU
wakeup. In many cases, the lock holder will be holding the lock for a very
short time, and it will be beneficial to spin a bit waiting for the lock
to be released, before going to sleep.
This patch implements such spinning, using technique 2 described in
issue #25: we spin after incrementing the mutex's count, but before
adding ourselves to the queue, so unlock() may wake us using the handoff
mechanism.
For uncontended locks, the added spinning does nothing (we only spin when
we find a taken lock). For heavily-contended locks it also usually does
nothing as well, because as soon as the mutex's wait queue becomes
non-empty spinning stops, so if the wait queue is never empty (on a
very heavily contended lock) we never spin. So I believe this patch
will mostly be beneficial for moderatedly-contended mutexes.
In the test/misc-mutex.so benchmarks, this patch didn't change the
uncontended measurements and only sporadically improves the heavily
contended (20 threads constantly trying to obtain the mutex).
Performance is signficantly improved in the less heavily contended
benchmarks where context switches aren't necessary, with 2 or 4
lock-unlock loop threads on 4 cpus: From around 800ns and 1800ns for the
2 and 4 thread cases respectively, with this patch we are down to 33ns
and 153ns, respectively, more or less the same results we obtain with
a spinlock.
Please do not commit this patch, but feel free to try it on your favorite
benchmark. I'm continuing to investigate different ways of doing spinning
and their tradoffs.
Signed-off-by: Nadav Har'El <
n...@cloudius-systems.com>
---
core/lfmutex.cc | 25 +++++++++++++++++++++++++
include/osv/sched.hh | 4 ++++
2 files changed, 29 insertions(+)
--- .before/include/osv/sched.hh 2014-07-28 22:54:45.958268928 +0300
+++ .after/include/osv/sched.hh 2014-07-28 22:54:45.959268934 +0300
@@ -615,6 +615,10 @@ public:
* lot of stack
*/
static void register_exit_notifier(std::function<void (thread *)> &&n);
+ bool running() {
+ return _detached_state.get()->st.load(std::memory_order_relaxed) ==
+ status::running;
+ }
private:
class reaper;
friend class reaper;
--- .before/core/lfmutex.cc 2014-07-28 22:54:46.042269433 +0300
+++ .after/core/lfmutex.cc 2014-07-28 22:54:46.042269433 +0300
@@ -44,6 +44,31 @@ void mutex::lock()
++depth;
return;
}
+ // If we're still here, the lock is owned by a different thread. Before
+ // going to sleep, let's try to spin for a bit, to avoid the context
+ // switch costs when the critical section is very short. We're spinning
+ // without putting ourselves in the waitqueue first, so when the lock
+ // holder unlock()s, we expect to see a handoff.
+ // FIXME: do not spin on UP.
+ // If someone is on the waitqueue, there's no point in continuing to
+ // spin, as it will be woken first, and no handoff will be done.
+ for (int c = 0; c < 1000 && waitqueue.empty(); c++) {
+ // If the lock holder got preempted, we would better be served
+ // by going to sleep and let it run again, than futile spinning.
+ // Especially if the lock holder wants to run on this CPU.
+ auto holder = owner.load(std::memory_order_relaxed);
+ if (holder && !holder->running()) {
+ break;
+ }
+ auto old_handoff = handoff.load();
+ if (old_handoff && handoff.compare_exchange_strong(old_handoff, 0U)) {
+ owner.store(current, std::memory_order_relaxed);
+ depth = 1;
+ return;
+ }
+ asm volatile ("pause");
+ }
+
// If we're here still here the lock is owned by a different thread.
// Put this thread in a waiting queue, so it will eventually be woken