remove GIL 这步应该是可期的:
Progress on the Gilectomy
https://lwn.net/Articles/723514/>
More performance> ...
>
The benchmark that he always uses is a "really bad recursive Fibonacci". He showed graphs of how various versions of Gilectomy fare versus CPython. Gilectomy is getting better, but is still well shy of CPython speed in terms of CPU time. But that is not what he is shooting for; when looking at wall time, the latest incarnation of Gilectomy is getting quite close to CPython's graph line. The "next breakthrough" may show Gilectomy as faster than CPython, he said.>
>
Next breakthrough>
> He has some ideas for ways to get that next breakthrough. For one, he could go to a fully per-thread object-allocation scheme. Thomas Wouters suggested looking at Thread-Caching Malloc[1] (TCMalloc), but Hastings was a bit skeptical. The small-block allocator in Python is well tuned for the language, he said. But Wouters said that tests have been done and TCMalloc is no worse than Python's existing allocator, but has better fragmentation performance and is multi-threaded friendly. Hastings concluded that it was "worth considering" TCMalloc going forward.
> He is thinking that storing the reference count separate from the object might be an improvement performance-wise. Changing object locking might also improve things, since most objects never leave the thread they are created in. Objects could be "pre-locked" to the thread they are created in and a mechanism for threads to register their interest in other threads' objects might make sense.
>
> The handbook that he looked in to find buffered reference counts says little about reference counting; it is mostly focused on tracing garbage collection. So one thought he has had is to do a "crazy rewrite" of the Python garbage collector. That would be a major pain and break the C API, but he has ideas on how to fix that as well.
> ...
> But Van Rossum is concerned that all of the C-based Python extensions will be broken in Gilectomy. Hastings thinks that overstates things and has some ideas on how to make things better. Someone had suggested only allowing one thread into a C extension at a time (so, a limited GIL, in effect), which might help.
>
> The adoption of PyPy "has not been swift", Hastings said; he thinks that since CPython is the reference implementation of Python, it will be the winner. He does not know how far he can take Gilectomy, but he is sticking with it; he asked Van Rossum to "let me know if you switch to PyPy". But Van Rossum said that he is happy with CPython as it is. On the other hand, Wouters pointed out one good reason to stick with experimenting with CPython; since the implementation is similar to what the core developers are already knowledgeable about, they will be able to offer thoughts and suggestions.
[1] Thread-Caching Malloc :
http://goog-perftools.sourceforge.net/doc/tcmalloc.htmlGilectomy 的性能越来越好,已经接近使用 GIL 的 CPython 了,下个突破都可能比使用 GIL 的 CPython 快了。