_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Code of Conduct: http://python.org/psf/codeofconduct/
To be clear, Sam’s basic approach is a bit slower for single-threaded code, and he admits that.
To speed-up function calls, the interpreter uses a linear, resizable stack to store function call frames, an idea taken from LuaJIT. The stack stores the interpreter registers (local variables + space for temporaries) plus some extra information per-function call. This avoids the need for allocating PyFrameObjects for each call. For compatibility, the PyFrameObject type still exists, but they are created lazily as-needed (such as for exception handling and for sys._getframe).
The optimized function calls have about an order of magnitude less overhead than the current CPython implementation.
The change also simplifies the use of deferred reference counting with the data that is stored per-call like the function object. The interpreter can usually avoid incrementing the reference count of the function object during a call. Like other objects on the stack, a borrowed reference to the function is indicated by setting the least-significant-bit.
It gets about the same average performance as the “main” branch of CPython 3.11 as of early September 2021.
_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/pytho...@python.org/message/ABR2L6BENNA6UPSPKV474HCS4LWT26GY/
I've been working on changes to CPython to allow it to run without the global interpreter lock.
Before anybody asks: Sam contacted me privately some time ago to pick my brain a little. But honestly, Sam didn't need any help--he'd already taken the project further than I'd ever taken the Gilectomy. I have every confidence in Sam and his work, and I'm excited he's revealed it to the world!
Best wishes,
/arry
I'm a novice C programmer, but I'm unsure about the safety of your
thread-safe collections description.
When you mean "an order of magnitude less overhead than the current CPython implementation" do you mean compared with the main branch? We recently implemented already almost everything is listed in this paragraph.
the "nogil" interpreter stays within the same interpreter loop for many Python function calls, while upstream CPythonrecursively calls into _PyEval_EvalFrameDefault.
Concurrency is *hard*. There's no getting around it, there's no
sugar-coating it. There are concepts that simply have to be learned,
and the failures can be extremely hard to track down. Instantiating an
object on the wrong thread can crash GTK, but maybe not immediately.
Failing to sleep in one thread results in other threads stalling. I
don't think any of this is changed by different modes (with the
exception of process-based parallelism, which fixes a lot of
concurrency at the cost of explicit IPC), and the more work
programmers want their code to do, the more likely that they'll run
into this.
I notice the fb.com address -- is this a personal project or something
facebook is working on? what's the relationship to Cinder, if any?
Regarding the tricky lock-free dict/list reads: I guess the more
straightforward approach would be to use a plain ol' mutex that's
optimized for this kind of fine-grained per-object lock with short
critical sections and minimal contention, like WTF::Lock. Did you try
alternatives like that? If so, I assume they didn't work well -- can
you give more details?
_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/pytho...@python.org/message/V76ZRBM6UMGYU7FTNENMOOW7OYEFYQ5Q/
Is it also slower even when running with PYTHONGIL=1? If it could be made the same speed for single-threaded code when running in GIL-enabled mode, that might be an easier intermediate target while still adding value.
On 11 Oct 2021, at 18:58, Thomas Grainger <tag...@gmail.com> wrote:Is D1.update(D2) still atomic with this implementation? https://docs.python.org/3.11/faq/library.html#what-kinds-of-global-value-mutation-are-thread-safe
I’m unclear what is actually retried. You use this note throughout the document, so I think it would help to clarify exactly what is retried and why that solves the particular problem. I’m confused because, is it the refcount increment that’s retried or the entire sequence of steps (i.e. do you go back and reload the address of the item)? Is there some kind of waiting period before the retry? I would infer that if you’re retrying the refcount incrementing, it’s because you expect subsequent retries to transition from zero to non-zero, but is that guaranteed? Are there possibilities of deadlocks or race conditions?
Is D1.update(D2) still atomic with this implementation? https://docs.python.org/3.11/faq/library.html#what-kinds-of-global-value-mutation-are-thread-safe
It's crude, but you can take a look at `ccbench` in the Tools directory.
The ccbench results look pretty good: about 18.1x speed-up on "pi calculation" and 19.8x speed-up on "regular expression" with 20 threads (turbo off). The latency and throughput results look good too.
JESUS CHRIST
/arry
_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Code of Conduct: http://python.org/psf/codeofconduct/
Did you try running the same code with stock Python?One reason I ask is the IIUC, you are using numpy for the individual vector operations, and numpy already releases the GIL in some circumstances.
It would also be fun to see David Beezley’s example from his seminal talk:
1. I use numpy arrays filled with random values, and the output array is also a numpy array. The vector multiplication is done in a simple for loop in my vecmul() function.
It would also be fun to see David Beezley’s example from his seminal talk:
_______________________________________________
Python-Dev mailing list -- pytho...@python.org
To unsubscribe send an email to python-d...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Code of Conduct: http://python.org/psf/codeofconduct/
As mentioned above, the no-GIL proof-of-concept interpreter is about 10% faster than CPython 3.9 (and 3.10) on the pyperformance benchmark suite.
I think the performance difference is because of different versions of NumPy.
Hello all,
I am very excited about a future multithreaded Python. I managed to postpone some rewrites in the company I work for Rust/Go, precisely because of the potential to have a Python solution in the medium term.
I was wondering. Is Sam Gross' nogil merge being seriously considered by the core Python team?
On Sat, Apr 23, 2022 at 8:31 AM <brata...@gmail.com> wrote:Hello all,
I am very excited about a future multithreaded Python. I managed to postpone some rewrites in the company I work for Rust/Go, precisely because of the potential to have a Python solution in the medium term.
I was wondering. Is Sam Gross' nogil merge being seriously considered by the core Python team?Yes, although we have no timeline as to when we will make a decision about whether we will accept it or not.
The last update we had on the work was Sam was upstreaming the performance improvements he made that were not nogil-specific. The nogil work was also being updated for the `main` branch. Once that's all done we will probably start a serious discussion as to whether we want to accept it.