Felix has some locks already. However there is a special case for coroutine converted to
processes. The system uses a queue local conditional spinlock. The lock is not applied
if the queue is running single threaded. The lock is used to guard system operations
such as fetching the next coroutine to run, channel I/O, etc.
These system operations also set a flag when allocating store which
prevents that allocation triggering a GC. Its basically done by:
new (gcp, shape, flag) T(args)
A side note now of import: the system allocator is protected by an OS mutex,
and in any case, malloc itself is thread safe which probably means the
user space suballocation it does is also protected by a mutex. Malloc is smart,
it also keeps thread local free lists etc I expect, so no mutex is required for
allocation using the thread local store. But it’s hard to know. In any case
the Felix allocator also has to add the allocated object to the global
GC tracking data structure (which is a JudyLArray mapping the address
of the store to the shape).
For Felix system objects, we could by pass all of this by keeping our own
thread local freelists, perhaps up to some bound: these would be allowed
to “leak” at least until the end of the program (stuff in the free lists is not
ordinarily reachable). Im not sure exactly how to do it.
I am thinking to use a spinlock for allocation instead of an OS mutex,
Anyhow this leads to the following issue: suppose I provide the user
with the specialised coroutine spinlock. By say
colock blah; ++p; end
or something.This uses the current queue’s lock so its good for wrapping
around assignments to memory shared between coroutines. If the
coroutines are running serial, then the colcok only needs to check the
flag to tell and then does nothing.
the PROBLEM with doing this is you cannot do an allocation which
would trigger a GC. Since the expression/statements are dynamic,
and can call subroutines, how are we going to set the flag to disable the GC????
If we use a *global* lock instead of a per queue lock it would be easy.
It would also support “cross thread” atomic operations (i.e. between
different coroutine schedulers). For machines iike the core i7 this would
not be a problem, but for a machine with 1024 cores the rate of contention
could be high and cause a bottleneck.
Not that spinlocks have a serious problem. They ultimately fast,
there’s no faster lock. They wait as long as necessary and no longer
PROVIDED the lock holder isn’t pre-empted and descheduled.
With a system mutex, one can expect the lock holder to be
somewhat protected from pre-emption by the OS, and the contender
probably pre-empted (what else could happen? Apart from spinning ..
the OS pretty much HAS to descheduler a lock contender.
What we would do in an embedded system is mask interrupts whilst
holding a fast spinlock, to prevent the holder being pre-empted
and leaving the conteder pointlessly spinning.
So roughly speaking spinlocks only work if the number
of threads is close to the number of CPUs in which case
pre-emptions will be rare.
—
John Skaller
ska...@internode.on.net