> That patch, in short, messes with Python's macros:
> ... and the like
To summarize for people who aren't going to look at the patch:
PyINCREF(ob) becomes (after cutting down the abstraction)
++*(ob->ob_prefcnt), which adds a single memory load to every incref.
Similar changes for DECREF.
> Are there other places I'd have to "sync" in unladen?
> Will LLVM pick up those changes in the macros?
It will. We've been manipulating the refcounts by calling to
functions in Python/llvm_inline_functions.c, which are marked
always_inline and wrap the underlying macros. They should pick up
> BTW: here's unladen's perf.py output (it's a somewhat old perf.py, sorry)
> comparing 2.6.1 against the patch. I'm still trying to lower the overhead.
> The benchmarks don't show the BIG improvement in multiprocessing, though -
> the comp.lang.python thread I linked above says a bit about that... I wonder
> if there are enough users of this pattern of sharing data among processes to
> warrant some tests of the sort.
Those are interesting numbers, because they give you an idea of how
much overhead adding a single memory load to every refcount operation
imposes, which is about 16%.
We've recently been discussing the idea of what it would take to move
Python to a more pure GC system to make GIL removal easier, although
we've agreed that any work done in that area should be based on py3k,
not unladen. At first, it seems like this would solve your problem
also, but there's a few problems.
First, GCs usually keep reachable/unreachable bits in the object
header. However, this problem can be solved as you've solved it. We
would pay the penalty for the memory access when doing a collection,
instead of every time the object is passed around.
Second, we would still need to support refcounting to allow C
extension modules to hold references to Python objects. To keep the
objects read-only, you would have to use the same trick that you're
IMO I'd rather go the extra mile of removing the GIL, so that you can
get parallel performance with threads in a single process rather than
doing this complicated dance to keep read-only objects really