[svn:parrot-pdd] r14903 - in trunk: . docs/pdds/clip

Skip to first unread message


Oct 11, 2006, 4:37:09 PM10/11/06
to perl6-i...@perl.org
Author: allison
Date: Wed Oct 11 13:37:08 2006
New Revision: 14903


Changes in other areas also in this revision:
trunk/ (props changed)

r1884@lilal: allison | 2006-10-11 13:18:42 -0700
Partial update of Threads PDD with collected wisdom from prior discussion.

Modified: trunk/docs/pdds/clip/pdd25_threads.pod
--- trunk/docs/pdds/clip/pdd25_threads.pod (original)
+++ trunk/docs/pdds/clip/pdd25_threads.pod Wed Oct 11 13:37:08 2006
@@ -20,7 +20,748 @@


-Description of the subject.
+So we can all talk about things the same way, the following
+definitons apply. Some of these are drawn from the POSIX thread spec,
+and as such we should have a translation section at the end.
+=over 4
+=item THREAD
+An OS level thread. If that makes no sense, neither will any of the
+rest of this, in which case I recommend picking up "Programming with
+POSIX Threads" by Dave Butenhof, and coming back when you have.
+=item MUTEX
+This is a low level, under the hood, not exposed to users, thing that
+can be locked. They're non-recursive, non-read/write, exclusive
+things. When a thread gets a mutex, any other attempt to get that
+mutex will block until the owning thread releases the mutex. The
+platform-native lock construct will be used for this.
+{{ - Nigel Sandever: Will this be macroised to hide the platform native
+implementation from the main body of the code?
+- Dan: Yes. }}
+=item LOCK
+This is an exposed-to-HLL-code thing that can be locked. Only PMCs can
+be locked, and the lock may or may not be recursive or read/write.
+The "sleep until something pings me" construct. Useful for queue
+construction. Not conceptually associated with a MUTEX. (POSIX
+threads require this, so we're going to hide it there behind macros
+and/or functions)
+{{ - Nigel Sandever: Could you elaborate on the nature of what would constitute
+a "ping"?
+- Dan: POSIX has a function cond_signal (and cond_broadcast, which is
+similar). What happens is a thread grabs the condition variable and
+goes to sleep, and sleeps until another thread does a cond_signal,
+which then wakes it up. If there are multiple threads sleeping on the
+condition variable, only one is woken. (cond_broadcast wakes them all
+up) }}
+A HLL version of a condition variable.
+Those bits of the Parrot_Interp structure that are absolutely required
+to be thread-specific. This includes the current register sets and
+stack pointers, as well as security context information. Basically if
+a continuation captures it, it's the interpreter.
+Those bits of the Parrot_Interp structure that aren't required to be
+thread-specific (though I'm not sure there are any) I<PLUS> anything
+pointed to that doesn't have to be thread-specific.
+The environment includes the global namespaces, pads, stack chunks,
+memory allocation regions, arenas, and whatnots. Just because the
+pointer to the current pad is thread-specific doesn't mean the pad
+I<itself> has to be. It can be shared.
+A thread that has no contact I<AT ALL> with the internal data of any
+other thread in the current process. Independent threads need no
+synchronization for anything other than what few global things we
+have. And the fewer the better, though alas we can't have none at all.
+Note that independent threads may still communicate back and forth by
+passing either atomic things (ints, floats, and pointers) or static
+buffers that can become the property of the destination thread.
+A thread that's part of a group of threads sharing a common
+interpreter environment.
+{{ - Leo: I presume that this "group of threads" is one thread pool or
+interpreter pool. Could you expand the definition to cover "pool".
+- Dan: Ah, damn, I needed to fix that up before sending it out. Should've been
+"SHARED INTERPRETER". A shared interpreter is one that's in an interpreter
+An interpreter pool is a set of interpreters that share common
+resources. They're essentially threads in the OS sense, and they have
+shared access to pretty much everything. The memory pools, arenas,
+and global namespace are shared--pretty much everything except what's
+in the interpreter structure itself. }}
+=head2 Supported Models
+=over 4
+=item POSIX threads
+The threading scheme must be sufficient to support a POSIX
+share-everything style of threading, such as is used in perl 5's
+"pthread" model, as well as the thread models for Ruby and Python.
+=item "Process-type" threads
+The scheme must also support the perl 5 "iThreads" threading
+model. In this model no data is shared implicitly, and all sharing
+must be done on purpose and explicitly. It very much resembles the
+Unix fork-process-with-shared-memory-segment model, not a surprise as
+it was originally developed with emulation of Unix's fork system in
+{{ - Nigel Sandever: Pseudo forks?
+- Dan: Yeah. On Win32, when Perl forks it starts a new thread and clones the
+interpreter and its contents. If you later do an exec in that thread it starts
+a new process and grabs hold of it. Definitely a clever trick. }}
+=head2 Guarantees
+{{ -Dave Whipp: Maybe this isn't strictly a threading thing, but are we going
+to make any guarantees about event orderings? For example, will we guarantee
+that a sequence of events send from one thread to another will always be
+received in the order they are sent? (Obviously no guarantees about
+interleaving of events from other threads). This is usually only important in
+distributed environments where multiple paths exists between a pair of
+endpoints, but it would be nice for user-code to be able to rely on it. }}
+=over 4
+=item No Crashes
+The interpreter guarantees that no user program errors of any sort
+will crash the interpreter. This includes threading problems. As
+such, synchronization issues (where multiple interpreters are
+accessing the same shared data) must not crash the interpreter or
+corrupt its internal state.
+=head2 Assumptions
+=over 4
+=item System memory allocation routines are threadsafe
+We are assuming that the memory allocation system of the base OS is
+threadsafe. While some of the C runtime libraries are notoriously
+thread dangerous, memory allocation code generally is threadsafe, and
+we'll assume that on all platforms. (Though we will, in general,
+manage our own memory)
+{{ - Nigel Sandever: Is there any likelyhood that memory allocation will be
+hidden behind macros at two levels:
+ - ALLOCPOOL() for allocating large chunks of memory (ppols) that are
+ later sub-allocated (and managed) by:
+ - SUBALLOC() for sub allocating within a pool
+- Dan: We'll generally hide behind the memory.c routines, if for no other
+reason than to allow the embedder to override the memory functions. We need to
+define that at some point...
+- Gordon Henriksen: Are you wanting something akin to Apache 2 pools, which are
+hierarchical and designed to reduce path length when freeing blocks of objects?
+For instance, all objects and sub-pools allocated during an HTTP request cycle
+can be deallocated just by free()'ing the top-level request pool.
+I don't think parrot could make use of that model, since it can't very
+well guarantee that the user cannot retain references past the lifetime
+of the pool. Apache trusts modules not to make such errors; parrot can't
+trust the byte-code it's executing any further than it can throw it. A
+generational collector is a more likely means by which parrot might
+reduce memory-related overhead.
+- Nigel Sandever: Nothing to do with Apache memory pools.
+I believe that parrot already has the concept of memory pools in it's
+memory management. The idea is that by allocating similarly sized objects
+from separate (large) allocations, you can reduce the fragmentation of
+chunks and reduce the incidences where the memory need to be GC'd and
+Allocating an 8 byte chunk from a common memory pool is quite likely to
+nip a little off from a previously freed large chunk. When it comes time
+reallocate another chunk the same size as that large, freed chunk, although
+there is enough room in the over all freespace chain to accommodate it, the
+largest available chunk is now 8 bytes or so too small for the requirement.
+That induces either a compaction cycle or the need to extend the memory pool
+by the size of the large request.
+Allocating all small requests from the same pool, and large from another
+pool means that you are less likely to fragment the memory and more likely
+to be able to re-use an existing slot in the free-space chain for any
+given request.
+If the allocation of pools, and the allocation of bit-of-a-pool, are
+macroised, it makes it possible for OS's where there are multiple APIs
+for memory allocation to bypass the CRT memory allocation routines and
+use which ever native APis are best suited for the type of allocation.
+Personally, I would like to see memory allocation for each class type
+be managed by the class constructor itself. This would theoretically allow
+each class that has a fixed instance size to manage it's own pool on OS's
+where that makes sense. The class would allocate a pool for itself when
+loaded and then allocate instances from that pool on new() and deallocate
+upon DESTROY. If it's memory pool was exhausted when new was called,
+it would invoke the GC on *it's pool only*.
+This separation would mean that each run of the GC would have a much smaller
+pool of memory to compact and garbage collect when it was invoked. It would
+also be less likely to be called, as each allocation from a pool of fixed
+sized sub allocations will only ever need to call the GC when it's pool is
+entirely exhausted.
+But that is a radical departure :), so if would just like to see separate calls
+for pool allocation/reallocation and element allocation/reallocation, rather
+than having calls to malloc() scattered through out the codebase.
+- Gordon Henrikson: Don't presume that [allocating similarly sized objects
+reduces fragmentation and GC]. General purpose memory allocators tend to handle
+this scenario... generically. Use of custom allocators is usually a premature
+optimization; the best generic memory allocators are very, very good.
+But the ultimate effect of the overuse of custom allocators or pools is
+to move memory fragmentation to a higher level, preventing the generic
+allocator from addressing it even when the memory allocation patterns
+would otherwise allow it.
+Would much rather see [the allocation of pools] abstracted, as they are now,
+beneath an API. This reduces the number of ways in which compiled parrot
+extensions can be incompatible with the parrot core.
+A PMC class is free to [manage the memory allocation for its
+own class constructor]. Only the PMC header cannot be thus managed, and that's
+already pooled.
+Th[e idea that such a separation means a smaller pool to compact and garbage
+collect] is false. The mark phase will still need to run over the entire
+process, else it cannot detect all references into the pool. (Unless you can
+provide a type-safety guarantee that your type can only be referenced by other
+instances of itself. In which case, your type is necessarily "garbage" and the
+best optimization strategy would be dead-code elimination. :)
+I predict the best overall throughput would be with the sweep or copy
+phase run immediately after the mark phase, across the entire process:
+It would be wasteful to do all that marking and yet leave known garbage
+Statistically, multiple pools will individually exhaust themselves
+*MORE* frequently than a global pool. The ideal case for collection
+frequency is that there is only one pool, or that all pools rise to
+capacity concurrently. In these ideal cases, the segregated pools will
+exhaust themselves precisely *AS* frequently as a global pool. In any
+case, there is no possibility for a decrease in collection events
+through the use of pooled memory. As per above, collection events will
+also not become less expensive. Thus, expect garbage collection to have
+a negative net impact on performance if pooled memory is used.
+- Leo: The problem is not in the fixed sized header pools, its with the
+*memory* pool used e.g for string memory.
+During GC we are walking the *header* pools, and if we find its buffer
+memory being used, we move that buffer memory to a new store, thereby
+compacting it. The old memory store(s) are freed then.
+So string memory can move around beyond your code.
+=head1 Proposal
+The proposal is as follows:
+=over 4
+=item All global state shall be protected by mutexes
+Straightforward enough. This allows for independent threads to
+coexist without threatening the state of the proces.
+=item Multiple independent interpreters will be allowed
+Once again, straightforward enough. With threadsafe protected global
+state, there's no issue here.
+=item Only one OS thread in an interpreter at once
+While there is no requirement that any interpreter be tied to an
+underlying OS thread, under no circumstances may multiple OS threads
+use a single interpreter simultaneously.
+=item A Stop-and-copy communication method will be provided
+Parrot will provide a function to make a call into another interpreter
+and wait for that call to complete. This call may pass in data and
+have data returned to it. The interpreter making the call will block
+until the call is complete. The data passed in as parameters will be
+copied into the called interpreter, and any return values will be
+copied back into the calling interpreter. The called interpreter will
+block while the return data is copied back into the calling interpreter.
+=item Inter-interpreter events will be provided
+Interpreters will be able to post events to other interpreters.
+=item Each interpreter will have a unique id
+This ID will be independent of the process or OS thread, and will be
+constant across the lifetime of the interpreter. Interpreter IDs
+I<may> be reused as interpreters are destroyed and recreated, and as
+such are only guaranteed valid while an interpreter is in use.
+(Note that we may decide to relax this requirement, but doing so
+likely means moving to at least 64-bit integers to mark interpreter IDs)
+{{ - Nigel Sandever: The provision of method(s) to obtain the native
+TIDs/HANDLES would make life for those writing implementation specific
+extensions easier.
+- Dan: TIDs, definitely. It'll be a sysinfo or interpinfo function. There's the
+issue of interpreters binding to different threads, but we'll deal with that.
+I'm not sure what the HANDLE is, but explain it and we can work something out.
+:) }}
+=item Each interpreter show the same process id
+All the interpreters within a process will share a process ID. On
+those systems where each thread has its own unique ID (such as many
+versions of Linux) Parrot will still report a single process ID for
+all interpreters.
+This process ID will be the ID of the process that first instantiated
+{{ - Leo: The term "process id" is really misleading. Again I presume that one
+pool is meant here. I'd vote for keeping a process ID what it is and use "pool
+id" or such here.
+- Dan: Nope. It's the process ID. Nothing misleading there, unless you've done
+threading work under linux, since for a while it gave each thread a separate
+- Leo: $ ps | grep [p]arrot
+17472 pts/0 00:00:00 parrot
+17473 pts/0 00:00:00 parrot
+17474 pts/0 00:00:00 parrot
+So the unless applies ;) This is a single parrot interpreter, with main,
+thread-manager, and event-handler thread. 3 PIDs.
+- Liz Mattijsen: This _all_ depends on which version of Linux you're using.
+Older versions don''t do it that way, and newer versions don't do it either
+(the specific versions escape me at the moment, but I know RH9 does _not_ have
+pids for threads).
+I know, because my Thread::Signal module depends on threads having
+pids. But fewer and fewer Linux systems "support" it (and Linux was
+the only one who worked this way to begin with).
+- Dan: Right. Linux is, as I said, arguably broken here, but I don't want to
+argue that point. That's why I specified that the PID is the process id of
+whatever instantiated parrot -- in this case, no matter which thread you're in,
+when you ask for the pid from parrot you should get 17472. (In this case, at
+Parrot needs to grab the pid at global initialization time and store
+it away for later pid queries. }}
+=item Interpreter pools will share allocation pools
+All the interpreters in an interpreter pool will share header and
+memory allocation pools. This means that when there is more than one
+interpreter in a pool the memory allocation and collection system
+needs to be swapped out, as a copying collector is generally
+untenable in a threaded environment.
+{{ - Dan: For a copying collector to work, all the mutators must be
+blocked, and arguably all readers should be blocked as well. (I think
+they need to be, otherwise you may be accessing reclaimed and reused
+memory) That means if we keep a copying collector we need to have
+everything that accesses strings or pmcs to get a lock on the GC
+before every access of every string or PMC. A touch excessive, I
+- Gordon Henriksen: True of non-moving collectors, too. Or, let me put it this
+way: non- copying *GC* (the sweep or copy phase) can be threadsafe, but the
+mark phase is never threadsafe. The method in which marking is not threadsafe
+is a bit more pathological (i.e., it's not the common case as it is with the
+copying collector), but a standard tracing DOD cannot be correct when
+competeting with mutators. It WILL collect non- garbage (those are MUTATORS,
+remember), and the result WILL be Heizenbugs and crashes.
+Some of what I've written up [AR see next comment] addresses why. It's pretty
+simple to demonstrate a single case to prove the point, but I don't feel like
+re-creating the ASCII art right now. :) I'll send that section when I get out
+of the office.
+parrot will have to be able to suspend all threads in the environment.
+Unfortunate, but really quite unavoidable.
+- Gordon Henriksen: Consider this simple object graph:
+ Root set = { &A, &B }
+ [ A=NULL]
+ [ B=&C ] ---> [ C=....]
+Collection begins in thread 1 and marks A as reachable before being
+preempted by thread 2:
+ [xA=NULL]
+ [ B=&C ] ---> [ C=....]
+Thread 2 sets A to &C, and nullifies B before being preempted again by
+thread 1:
+ [xA=&C ] ---> [ C=....]
+ [ B=NULL]
+Thread 1 marks B as reachable:
+ [xA=&C ] ---> [ C=....]
+ [xB=NULL]
+The mark phase complete, thread 1 sweeps:
+ [ A=&???] ---> ???
+ [ B=NULL]
+C was not marked reachable (although it was) and was thus erroneously
+collected, leaving a dangling pointer. This problem applies equally to
+copying and mark-sweep collectors.
+- Dan: Ah, OK, I see. The problem comes in where we've got an object in the
+transient root set (basically the processor stack and registers) that
+gets anchored into the base root set (stash, pads, or whatever) after
+the DOD has traced where it's going into and falls out of the
+transient root set before the DOD traces over to it.
+Race condition. Dammit.
+Okay, I'd not wrapped my brain around that possibility, which will
+make for some interesting DOD tracing, especially on SMP systems. I
+was *really* hoping a single lock on the arena allocation system that
+the DOD held onto while tracing would be sufficient, but I see that
+it isn't.
+That means we're going to have to have either a really forgiving DOD
+system that takes multiple passes before it collects up a PMC or
+buffer (which still isn't safe) or have some way to force a
+low-overhead rendezvous.
+The obvious rendezvous point is the arena lock, but that's going to
+see a lot of contention anyway, and we'd as soon not have a single
+one for speed reasons. Feh.
+Okay, we're going to mandate that we have read/write locks, each
+interpreter pool has one, and mutating vtable entries must get a read
+lock on the pool read/write lock. Pricey (ick) but isolated to
+mutators. The DOD gets a write lock on it, which'll block all
+read/write access so no mutators can be in process while the pool DOD
+I think that'll work. The .1j thread spec requires r/w locks, and we
+can fake them on platforms that don't implement them. Hopefully
+Nigel's got the Windows scoop so we can see if Win32 has anything
+like this (which I'd expect it does)
+- Leo: That is well knowm in the literature as "Tri-Color Invariant": Black are
+the already marked (live) PMCs, grey the PMCs on the next_for_GC list,
+white the not yet reached PMCs. The strong tri-color invariants states
+that no black object may point to a white object, the weak invariant
+states, that at least one path from the black to a white object must
+contain a grey one.
+This can be handled by either stop the world GCs or by intercepting each
+read or write access that would change the color of an object and update
+the color accordingly. This is e.g. used for incremental GC. As soon as
+we have a thread in the background that runs GC, we have to cope with
+these issues.
+- Dan: Yeah, point. And since we want to be able to have an incremental DOD
+at some point we need to get support for it in now.
+- Leo: Stopping all interpreters seems to be cheaper. The rwlock will sooner or
+later stop all interpreters anyway (on first PMC access), so we can omit the
+price for the rwlock and just hold the world(s).
+- Dan: The rwlock only stops all the interpreters when the DOD runs.
+Anything that mutates a PMC gets a *read* lock so that they don't
+interfere with each other, and only pause if the DOD is running. The
+DOD getting a *write* lock will block any read lock attempts, so when
+the DOD is running no mutation can take place. Since mutation doesn't
+require any global exclusion it doesn't need a write lock -- the read
+lock is sufficient.
+- Leo: Sure. But that would still need to aquire a readers rwlock for each PMC
+access. This is more expensive then a mutex. During DOD any PMC access
+will halt the interpreter, so we can do that explicitely too and save
+the cost for the rwlock overhead. Albeit I can imagine, that aggregates
+will need a rwlock anyway.
+- Leo: An alternative would be real background incremental GC, *when* running
+multiple threads. I estimate the overhead to be in regions of a rwlock (with no
+contention of course).
+- Dan: If we have the facilities to do incremental DOD runs then this is
+definitely a possibility except for finalizers. Finalizers make
+things interesting, though if the background thread doing the DOD is
+a member of the interpreter pool then it'd work out OK.
+- Leo: Finalizers and incremental DOD don't play together. The DOD must run to
+end to be sure, that the objects isn't referenced any more.
+- Leo: What about temporary PMCs (or strings)? Evaluating a non-trivial
+expression can have lot of these. Each new temp would need a lock on the
+memory sub-system.
+- Dan: Those won't get marked as shared unless we're unconditionally marking
+things as shared. (Though we may just give 'em a mutex anyway) [New temps would
+need a lock] only on allocation. We could have a per-thread freelist if we
+wanted. Wouldn't be unreasonable.
+- Leo: This needs either one check per PMC, if its shared or not, or additional
+costs for locking temps. Both are rather expensive, compared to the raw
+working functionality of a vtable.
+That already smells like separate memory sub-systems. The freelist has
+to be filled first. During DOD runs, it has to be refilled. To achieve
+some locality of the temp PMCs, you can't just give these to arbitrary
+intpreters. Separate free lists seem also to imply separate DOD runs to
+cleanup the temps.
+- Leo: Combined with the cost of: "All shared PMCs must have a
+threadsafe vtable - The first thing that any vtable function of a shared PMC
+must do is to aquire the mutex of the PMCs in its parameter list "
+i.e. typically 3 mutexes, it could be that the vtable of a shared PMC
+should just grab one interpreter lock (of the owner of that PMC) and
+that not all is shared (especially not the temps).
+- Dan: Yeah, but since PMCs aren't owned by any one interpreter in the pool
+but are rather pool-wide you run into either the global-lock problem
+(which kills performance on SMP systems) or interpreters potentially
+deadlocking on unrelated data access.
+- Leo: Could be. But the performance is the overall throughput. When a lot of
+fine grained locks (plus memory subsystem locks for temps) take much
+longer then one global lock, then that scheme can nethertheless be
+slower on SMP machines. It would scale better though for more CPUs.
+Deadlocks shouldn't be a problem, if exactly one vtable (like in
+SharedRef) just grabs one and only one lock.
+As the allocation and collection system is a black box to user
+programs and much of the interpreter internals, this isn't a big deal
+outside of needing swappable allocation systems, the potential
+issue of COW'd shared memory leaking, and the need to switch
+allocation schemes mid-execution.
+=item Each interpreter has a separate event queue
+Some events, such as timers, may be interpreter-specific and, as
+such, each interpreter has its own event queue.
+=item Each interpreter pool has a shared event queue
+Some events, such as IO callbacks, may not be interpreter-specific,
+and can be serviced by any interpreter in the interpreter pool. For
+these events, there is a pool-wide event queue.
+=item PMCs are the coordination point for threads
+That is, only PMCs are shared as such between threads. Strings,
+specifically, are I<not> shared between interpreters as such
+=item All PMCs shared amongst interpreters in a pool must be marked shared
+A PMC which is not marked shared may not be handed to another
+interpreter. Parrot will prevent this from happening either by
+marking the PMC as shared, or throwing an exception when the PMC is
+placed in a spot where it may be shared but is not shareable.
+=item All shared PMCs must have a threadsafe vtable
+The first thing that any vtable function of a shared PMC must do is to
+acquire the mutex of the PMCs in its parameter list, in ascending
+address order. When the mutexes are released they are not required to
+be released in any order.
+{{ - Uri: why the ascending address order to grab mutexes? is this to help solve
+- Dan: Yes. The recommendation I've always seen for deadlock avoidance is to
+always have all your code grab its mutexes in some fixed order. With source,
+it's generally recommended that you grab mutex variables in lexical order.
+Since we're all binary we need a different order, and ascending addresses are
+reasonably simple to do.
+- Gordon Henriksen: Note that this locking strategy cannot avoid deadlock if
+the user is allowed to acquire these locks--HLL locks must be altogether
+different beasts from automatic PMC locks. That's okay. Just a design
+consequence worth noting for everyone.
+- Dan: Oh, arguably it can't avoid deadlock at all, what with vtable methods
+having access to the full environment. I can live with deadlocks, only because
+there's no real alternative.
+- Gordon Henriksen: But PMC implementations have to fall inside of the trusted
+environment, so that's not really a failure. Of course uncooperative code can
+break a cooperative algorithm.
+- Sam Vilain: RE: "have all your code grab its mutexes in some fixed order."
+Yes; otherwise, you need to back off and start again, if one lock
+acquisition fails.
+Consider these functions; for the purpose of this example, lexical
+lock ordering is implied;
+ func1($AAAA, $CCCC, $FFFF, $GGGG, $KKKK);
+ func2($BBBB, $DDDD, $FFFF, $IIII, $KKKK);
+ func3($FFFF, $KKKK);
+So long as the locks are ramped up from the lowest to the highest,
+there is no chance of func1 acquiring a lock to be held by func2 (eg,
+$KKKK), if that other function already has one of the shared
+dependancies (eg, $FFFF).
+All well and good. But, what about recursive locks?
+ sub func1($one is locked, $two is locked, $three is locked) {
+ for my $x ($one, $two, $three) {
+ func2($x.property) if $x.property;
+ }
+ }
+ sub func2($eins is locked) {
+ if ($eins.property) {
+ func2($eins.property, { });
+ }
+ }
+ $aaa.property = { };
+ $bbb.property.property = $aaa;
+ $ccc = { };
+ if (thork()) { # fork a thread
+ # thread A
+ func1($aaa, $bbb, $ccc);
+ } else {
+ # thread B
+ func2($bbb.property);
+ }
+OK, the execution order that causes the deadlock is:
+ 1. Thread B - func2() acquires a lock on the $bbb.property PMC.
+ 2. Thread A - func() acquires a lock on $aaa, $bbb, $ccc.
+ 3. Thread A - func2() acquires a lock on $aaa.property, which
+ returns quickly
+ 4. Thread A - func2() blocks waiting for a lock on $bbb.property,
+ held by func2() in thread B
+ 5. Thread B - func2() blocks waiting for a lock on
+ $bbb.property.property, held by func() in thread A.
+So, there are still possibilities for deadlocks, as the non-linear
+nature of subroutine calls screws up your nice lock ramping.
+In summary, acquiring mutexes in a defined order as a means to avoid
+deadlocks only works when you are acquiring them all up front. If you
+are acquiring any later, and you detect a deadlock (ie, a loop of
+threads blocking on locks held by each other), you *must* be able to
+tell one of them to "ramp off" to holding no locks at all. ie,
+The bugger is, winding back execution state automatically to when you
+last started acquiring locks is probably an intractable problem from
+an interpreter's perspective...
+Sounds like a job for an exception to me ;-).
+ for (1..50) {
+ eval {
+ func_that_acquires_first_lock($args);
+ };
+ last unless $@ and $@ =~ m/mutex deadlock/i;
+ }
+=item Automatic PMC sharing will be provided
+When a PMC is placed into a container which is shared (including
+lexical pads and global namespaces) then that PMC will automatically
+be marked as shared. It is acceptable for this to trigger an
+exception if for some reason a PMC should not be shared between
+PMCs are, by default, not shared. This avoids sharing overhead for
+PMCs which are only used as temporaries and not shared. (Note that
+this is dangerous, and may end up not being done, due to the sharing
+of continuations)
+{{ - Luke: I don't know why this is dangerous. A continuation is a data structure
+just like an array or a hash. When you share it, everything "inside" it
+gets shared, too. For a continuation, this could be an awful lot of
+stuff, but it's the safe way.
+- Dan: True, but the danger part is if we don't mark everything grabbed by a
+continuation as shared by default. If we do, we might as well mark everything
+as shared, as there'll be less overhead. }}
+=item All interpreter constructs in a pool are shareable
+This means that a PMC or string may be used by any interpreter in a
+pool. It additionally means that, if full sharing is enabled, that
+any interpreter in a pool may invoke a continuation, assuming the
+continuation is valid. (That is, a continuation taken at parrot's top
+level. Continuations taken within vtable functions, user-defined ops,
+or extension code may not be shareable)
+=item The embedding API will allow posting events to a pool
+Many events are interpreter-specific, often caused by one particular
+interpreter requesting an async event that later completes.
+=item The embedding API will allow posting events to an interpreter
+For events that don't have to go to any particular interpreter, they
+can go into the pool's event loop.
+=item The embedding API will allow calling a parrot sub with a pool
+In those cases where there is an interpreter pool, embedders may call
+a parrot sub using the pool as a whole, rather than an individual
+interpreter, to run the sub. In that case Parrot may either choose a
+dormant interpreter (if there is one) or create a new interpreter in
+the pool to run the subroutine.
+When the sub is done, Parrot may either cache the created
+interpreter or destroy it as it needs to, though in no case will
+Parrot ever leave a pool with no interpreters at all.


Leopold Toetsch

Oct 11, 2006, 4:57:53 PM10/11/06
to perl6-i...@perl.org, all...@cvs.perl.org
Am Mittwoch, 11. Oktober 2006 22:37 schrieb all...@cvs.perl.org:
> Log:
>  r1884@lilal:  allison | 2006-10-11 13:18:42 -0700
>  Partial update of Threads PDD with collected wisdom from prior discussion.

I know the pdd is partial, but don't forget STM, which is
a) implemented and
b) solves a lot of described PMC sharing troubles e.g. hierachical locking


Reply all
Reply to author
0 new messages