When a thread locks an already-held mutex, it goes to sleep, which incurs
the penalty of a context switch, and on SMP, likely also a slow cross-CPU
wakeup. In many cases, the lock holder will be holding the lock for a very
short time, and it will be beneficial to spin a bit waiting for the lock
to be released, before going to sleep.
This patch implements such spinning, using technique 2 described in
issue #25: we spin after incrementing the mutex's count, but before
adding ourselves to the queue, so unlock() may wake us using the handoff
mechanism.
For uncontended locks, the added spinning does nothing (we only spin when
we find a taken lock). For heavily-contended locks it also usually does
nothing as well, because as soon as the mutex's wait queue becomes
non-empty spinning stops, so if the wait queue is never empty (on a
very heavily contended lock) we never spin. So I believe this patch
will mostly be beneficial for lightly-contended mutexes.
In the test/misc-mutex.so benchmarks, this patch didn't change the
uncontended measurements and only sporadically improves the heavily
contended (20 threads constantly trying to obtain the mutex).
Performance is signficantly improved in the less heavily contended
benchmarks where context switches aren't necessary, with 2 or 4
lock-unlock loop threads on 4 cpus: From around 800ns and 1800ns for the
2 and 4 thread cases respectively, with this patch we are down to 33ns
and 153ns, respectively, more or less the same results we obtain with
a spinlock.
This patch is relatively simple. I tried more changes not included
in this patch which didn't seem to help my tests, but which we might
want in the future. One such heuristic is to stop spinning if the
lock-holding thread is not running. A narrower heuristic is to stop
spinning if the lock-holder is on the same CPU we're on (and
in particular, never spin on single-processor). In the final patch
we'll also need to determine the best amount of spin (or make it a
parameter, or adaptive).
Please do not commit this patch, but feel free to try it on your favorite
benchmark. I'm continuing to investigate different ways of doing spinning
and their tradoffs.
Signed-off-by: Nadav Har'El <
n...@cloudius-systems.com>
---
core/lfmutex.cc | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
--- .before/core/lfmutex.cc 2014-07-28 17:46:46.154625393 +0300
+++ .after/core/lfmutex.cc 2014-07-28 17:46:46.154625393 +0300
@@ -44,6 +44,24 @@ void mutex::lock()
++depth;
return;
}
+ // If we're still here, the lock is owned by a different thread. Before
+ // going to sleep, let's try to spin for a bit, to avoid the context
+ // switch costs when the critical section is very short. We're spinning
+ // without putting ourselves in the waitqueue first, so when the lock
+ // holder unlock()s, we expect to see a handoff.
+ // FIXME: do not spin on UP.
+ // If someone is on the waitqueue, there's no point in continuing to
+ // spin, as it will be woken first, and no handoff will be done.
+ for (int c = 0; c < 10000 && waitqueue.empty(); c++) {
+ auto old_handoff = handoff.load();
+ if(old_handoff && handoff.compare_exchange_strong(old_handoff, 0U)) {
+ owner.store(current, std::memory_order_relaxed);
+ depth = 1;
+ return;
+ }
+ asm volatile ("pause");
+ }
+
// If we're here still here the lock is owned by a different thread.
// Put this thread in a waiting queue, so it will eventually be woken