Visiting again *user* level locking issues.
First, with the standard multi-core scheduling model, contention is very unlikely.
Roughly speaking the probability is:
fCon = nFib * (nThread - 1) * fLock
where
nFib is the total number of fibres running
nThread is the total number of threads running the fibres
fFlock is the fraction of time spent by a fibre holding a lock
Note that spinlocks are seirously BAD on a uniprocessor because a contender
will run UNTIL prempted. Correct operation on a uniprocessor really requires
the lock holder to mask IRQ interrupts to prevent pre-emptions. This would be
nice on a multi-processor too. AFAIK only Windows provides real critical sections.
For a large number of fibres run by a small number of threads, with locks only used
occasionally, and spanning only small amounts of time, the possibility of contention is
very small, and the delay given that the lock is only held briefly also small PROVIDED
a spin lock is used. The worst case is that a lock holder is pre-empted and descheduled
by the OS, whilst there is a condender which is not. The probability of THAT happening
is approximately zero if he total number of threads is less than the number of CPU cores,
but goes up rapidly as the number of threads exceeds the number of cores. note that
pre-empting a condender than becomes highly desirables.
System mutex are better when there are a lot more threads than CPUs, for the
above reason, especially if nFib is small and fLock is large. I would note also that
a spinlock using test-and-set with sequential consistency does impose a significant
burden on the processor cache management protocol. Acquire/release is probably
good enoug but there’s still an overhead synchronising caches.
Second: what can we do in the scope of a spinlock? Basic memory access
should be OK. It is much better if muitiple accesses hit the same cache line.
The worst case with distributed access is a pre-emption based on triggering
a virtual memory paging request. Then we have a spinlock waitiing for the disk!
Allocations are problematic. Standard Felix allocation is strictly forbidden because
the world-stop GC could be deadlocked if a lock holder triggers a world stop,l
then a contended will never notice and spin forever. For this purpose Felix must provide
a GC aware mutex which uses a timeout and spins around a check on the world-stop
flag (suspending as required if it is set).
Felix also provides a flag in Felix heap allocations which prevents a collection
being triggered. The scheduler and most of the RTL code must ensure this flag
is false, inhibiting GC, however Felix code currently has no way to set this flag.
In any case a Felix allocation invokes the GC allocatior which is protected
by a system mutex. It is not a good idea to nest locks. This is a serious difficulty
with locking. Systems usually provide a recursive mutex to supprt this, however
spinlocks cannot fo this directly (because there is no way to test of the flag
of a test-and-set is akready set except by actually setting it, in which case
we’d need to save the result so we can decide whether to clear the lock or not,
which is impossible without also protecting the result .. :-)
Even if we’re just calling malloc(), then, since malloc() is thread safe,
it may be setting a mutext itself. Modern mallocs should be using thread local
storage and spinlock protected shared free lists to minimise the need to actually
call the OS for more memory, but we can’t really know.
SO: we need to give the user a spinlock, but Felix cannot easily ensure it is
only used in a safe way, nor, if it used in an efficient way.
Third, locks work best if localised, meaning, associated closely with the
memory region accessed in the lock scope. Note this also means the
lock memory itself should be physically close to the memory being
protected if possible, preferably in the same cache line.
The simplest lock is a single global lock. In Felix this would be store
in the GC object which is universally accessible. Using it allows cross
thread coroutine migration and arbitrary memory access serialisation.
The chance of contention, however, is raised when there are many clients.
OTOH that also increases the chance the lock itself becomes pinning
in a loaded cache line.
Please recall, however, point one in this article. In a typical heavily concurrent
system, chance of contention is vanishingly small and chance of pre-emption
of a lock holder also very small. This is NOT true in ordinary code using
system threads, it is a key advantage of user space scheduling.
However, there’s a good case for a lock on the scheduler queues.
Normally there’s only one queue, however each spawned pthreaqd
creates a new queue. Helper threads share the parent’s queue.
Such locks will be more efficient, but only really useful if there
are a LOT of pthreads spawned. They’re also dangerous, because
they can’t protect inter-pthread communication.
Its also appropriate to provide a user lock constrictor so only
those routine actually sharing some memory share the lock,
minimising the chance of a useless serialisation a global lock
would lead to, when serialisaing unrelated accesses.
Its also worth considering, that with only a small number of lock
users, we can use a heavier duty lock, and protect allocations
and other things, making these locks conditionally safer and
more capable, provcided they’re correctly shared and used.
A good way to share locks would be to actually send them
on a synchronous channel!
Thus, there are quite a few different locking options, depending on:
(a) what the locks protect
(b) how the locks are shared
The locks are: spinlock, hard system mutex, spinning mutex
A spinning mutex is a system mutex with a time out, which regularly
checks for the world stop flag in the spin loop.
A more efficient but heavier spinning lock uses a condition variable
with a time out instead. Thia kind of lock has to register itself with
the GC, so itr can be signalled to bump it out of the wait state.
Note that the lock must be released to suspend on the GC, and
reaquired before spinning back to the conditional wait on the
condition, which is actually a boolean flag representing the actual lock.
[So we have a lock protecting access to a lock .. :]
In summary: there are some tricky considerations choosing which locks
to give the Felix programmer.
WORSE: we have to pick some syntax. It is tempting to do:
atomic { … };
or even just
atomic statement;
This would be a global spinlock. UNSAFE because the controlled code MUST
NOT ALLOCATE with the Felix GC, and its hard for the programmer to know
when that is. Variants, for example, do heap allocations.
however, since the lock is on the GC, we could use the lock state to turn
off GC triggering on allocation. The catch is, C++ doesn’t allow the state
of a test-and-set flag to be examined. Only two operations are supported,
test-and-set and clear. The flag CAN be examined by testing it but that
also sets it. If the lock was clear, we just set it, so we can now release
it and do the allocation with triggering, otherwise it was set, and
we prevent the triggering. Unfortunately this is not sound because
after deciding that its safe to allow triggering another thread can
acquire the lock, and then another thread contend for it, meanwhile
our allocation trigger a world-stop.
—
John Skaller
ska...@internode.on.net