Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

what's the point of rpython?

10 views
Skip to first unread message

Brendan Miller

unread,
Jan 16, 2009, 7:37:39 PM1/16/09
to pytho...@python.org
So I kind of wanted to ask this question on the pypy mailing list..
but there's only a pypy-dev list, and I don't want to put noise on the
dev list.

What's the point of RPython? By this, I don't mean "What is RPython"?
I get that. I mean, why?

The goals of the pypy project seems to be to create a fast python
implementation. I may be wrong about this, as the goals seem a little
amorphous if you look at their home page.

So, to do that this RPython compiler was created? Except that it
doesn't compile python, it compiles a static subset of python that has
type inference like ML.

This RPython thing seems like a totally unnecessary intermediate step.
Instead of writing an implementation of one language, there's an
implementation of two languages.

Actually, it seems like it harms the ultimate goal. For the
interpreted pyton to be fast, the interpreter has to be written in a
reasonably fast language. ML, and C++ compilers have had a lot of work
put into their optimization steps. A lot of the performance boost you
get from a static language is that knowing the types at compile time
lets you inline code like crazy, unroll loops, and even execute code
at compile time.

RPython is a statically typed language because I guess the developers
associate static languages with speed? Except that they use it to
generate C code, which throws away the type information they need to
get the speed increase. Huh? I thought the goal was to write a fast
dynamic language, not a slow static one?

Is this going anywhere or is this just architecture astronautics?

The RPython project seems kind of interseting to me and I'd like to
see more python implementations, but looking at the project I can't
help but think that they haven't really explained *why* they are doing
the things they are doing.

Anyway, I can tell this is the sort of question that some people will
interpret as rude. Asking hard questions is never polite, but it is
always necessary :)

Thanks,
Brendan

Luis M. González

unread,
Jan 16, 2009, 7:58:30 PM1/16/09
to

Check out this thread:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/1a8c866b4c4d7521/2174f796cd687931?lnk=gst&q=enlighten+pypy#2174f796cd687931

One of pypy's goals is having a more flexible implementation, easier
to experiment with than cpython.
Having all written in the same language facilitates this task.
Why Rpython? Because by avoiding the dinamicity of the whole python
language, it is possible to translate this code to a static one (c,
java, whatever).
The translated rpython interpreter aims to be something similar in
performance to cpython. The only difference is that cpython was
written from scratch in c, while pypy's interpreter was written in
rpython and then translated automatically to c.

The pypy team intends to achieve greater speed and performance by
adding a JIT compiler to this interpreter.
So pypy-c + JIT = faster python.

I hope this helps...

Luis


Matimus

unread,
Jan 16, 2009, 8:16:15 PM1/16/09
to
The goals are listed here: http://codespeak.net/pypy/dist/pypy/doc/architecture.html

Speed is mentioned, but as a secondary concern. The main goal seems to
be to create a vehicle into exploring the concept of dynamic languages
themselves. If that seems amorphous then it is because it is a
research project.

Matt

alex23

unread,
Jan 16, 2009, 11:07:08 PM1/16/09
to
On Jan 17, 10:37 am, "Brendan Miller" <catph...@catphive.net> wrote:
> What's the point of RPython? By this, I don't mean "What is RPython"?
> I get that. I mean, why?

This is more or less covered in the FAQ:

"The restrictions are to ensure that type inference (and so,
ultimately, translation to other languages) of RPython programs is
possible. These restrictions only apply after the full import happens,
so at import time arbitrary Python code can be executed."

http://codespeak.net/pypy/dist/pypy/doc/faq.html#id25

> The goals of the pypy project seems to be to create a fast python
> implementation. I may be wrong about this, as the goals seem a little
> amorphous if you look at their home page.

The home page itself is ambiguous, and does oversell the performance
aspect. The *actual* goal as outlined by their official docs is to
implement Python in Python, at every level. If this means utilising a
less-dynamic subset of Python for the lower levels, its -still- at
least more-Python-than-C.

"PyPy is both:
* a reimplementation of Python in Python, and
* a framework for implementing interpreters and virtual machines for
programming languages, especially dynamic languages."

http://codespeak.net/pypy/dist/pypy/doc/faq.html#id11

The PyPy devs feel that this will allow them to more easily experiment
with optimising the interpreter for greater speeds, but that isn't one
of the stated goals (just, apparently, their 'secret' one).

> This RPython thing seems like a totally unnecessary intermediate step.
> Instead of writing an implementation of one language, there's an
> implementation of two languages.

My understanding is that the 'higher' level Python language features
are implemented on top of the restricted RPython features, so it's
more of an organic growth of one language than two separate
implementations.

> A lot of the performance boost you
> get from a static language is that knowing the types at compile time
> lets you inline code like crazy, unroll loops, and even execute code
> at compile time.
>
> RPython is a statically typed language because I guess the developers
> associate static languages with speed?

Doesn't the first paragraph above state the same thing that you
question in the next?

> Except that they use it to
> generate C code, which throws away the type information they need to
> get the speed increase. Huh? I thought the goal was to write a fast
> dynamic language, not a slow static one?

They're not just generating C code, they currently also target LLVM,
JVM and CLI.

> Is this going anywhere or is this just architecture astronautics?

Well, they actually seem to have achieved some substantial gains, so I
think it's unfair to imply that the project isn't based on pragmatic
objectives. Their initial JIT prototype failed to work as well as
expected, and they've spent some time re-thinking the approach; I'm
happier to wait for a stably performing JIT that can be improved over
time than a short term gain.

> The RPython project seems kind of interseting to me and I'd like to
> see more python implementations, but looking at the project I can't
> help but think that they haven't really explained *why* they are doing
> the things they are doing.

A lot of this is covered in the FAQ. Whether you agree with their
approach or not, they're the ones actively pushing this effort
forward.

It's been a few months since the last update, but the PyPy status blog
may have more information for you. At the very least, it's a venue to
discuss your concerns directly with the PyPy devs.

http://morepypy.blogspot.com

Brendan Miller

unread,
Jan 17, 2009, 6:55:32 PM1/17/09
to alex23, pytho...@python.org
>> The goals of the pypy project seems to be to create a fast python
>> implementation. I may be wrong about this, as the goals seem a little
>> amorphous if you look at their home page.
>
> The home page itself is ambiguous, and does oversell the performance
> aspect. The *actual* goal as outlined by their official docs is to
> implement Python in Python, at every level.

Ok fair enough. In some ways I see that as more of a purely
intellectual exercise than the practical endeavor that I assumed the
project was originally.

However, one of the links I was sent had one of the devs talking about
using the translation process to make C/Java/LLVM implementations out
of the same interpreter code. I'll say that makes a lot of sense.

Another question I was wondering about is whether they plan on
maintaining good C bindings? Will existing bindings for third party
libraries be able to work?

Also, are they going to do away with the GIL? The python devs seem to
consider the GIL a non-issue, though they may change their mind in 3
years when we all have 32 core desktops, until then getting rid of the
GIL would make pypy pretty attractive in some quarters. I know the
scons project was running into GIL issues.

Finally, I'm pretty unclear on what versions of python that pypy is targeting.

alex23

unread,
Jan 17, 2009, 10:03:21 PM1/17/09
to
On Jan 18, 9:55 am, "Brendan Miller" <catph...@catphive.net> wrote:
> > The *actual* goal as outlined by their official docs is to
> > implement Python in Python, at every level.
> Ok fair enough. In some ways I see that as more of a purely
> intellectual exercise than the practical endeavor that I assumed the
> project was originally.

Well, it opens up Python as a whole for experimentation & modification
by Python developers, which seems pretty practical to me :)

> Another question I was wondering about is whether they plan on
> maintaining good C bindings? Will existing bindings for third party
> libraries be able to work?

You really should run through the whole faq:

http://codespeak.net/pypy/dist/pypy/doc/faq.html#can-i-use-cpython-extension-modules

(Spoiler alert: the answer is no.)

> Also, are they going to do away with the GIL?

One FAQ entry before the above:

http://codespeak.net/pypy/dist/pypy/doc/faq.html#do-threads-work-what-are-the-modules-that-work

> The python devs seem to
> consider the GIL a non-issue, though they may change their mind in 3
> years when we all have 32 core desktops, until then getting rid of the
> GIL would make pypy pretty attractive in some quarters. I know the
> scons project was running into GIL issues.

It's not a case of the Python devs digging their heels in and not
giving the world what it desperately wants. Here's an article by Guido
talking about the last attempt to remove the GIL and the performance
issues that arose:

"I'd welcome a set of patches into Py3k *only if* the performance for
a single-threaded program (and for a multi-threaded but I/O-bound
program) *does not decrease*."

http://www.artima.com/weblogs/viewpost.jsp?thread=214235

> Finally, I'm pretty unclear on what versions of python that pypy is targeting.

I wonder where such a frequently asked question could be answered? ;)

http://codespeak.net/pypy/dist/pypy/doc/faq.html#which-python-version-2-x-does-pypy-implement

Paul Rubin

unread,
Jan 17, 2009, 10:57:27 PM1/17/09
to
alex23 <wuw...@gmail.com> writes:
> Here's an article by Guido talking about the last attempt to remove
> the GIL and the performance issues that arose:
>
> "I'd welcome a set of patches into Py3k *only if* the performance for
> a single-threaded program (and for a multi-threaded but I/O-bound
> program) *does not decrease*."

The performance decrease is an artifact of CPython's rather primitive
storage management (reference counts in every object). This is
pervasive and can't really be removed. But a new implementation
(e.g. PyPy) can and should have a real garbage collector that doesn't
suffer from such effects.

andrew cooke

unread,
Jan 18, 2009, 6:56:28 AM1/18/09
to
Since this is a PyPy bashing thread, maybe it's an appropriate place
to suggest that the project has got a little bit waylaid by exploring
cool things instead of releasing a useful final result?

I am not questioning rpython directly - the case for something like
that is obvious. But there's a question of balance. It's possible to
go on building ever more complex systems which are theoretically
justified, but which postpone ever finishing the job. At some point
there has to be a "good enough".

To some extent I am playing devil's advocate here, but as an outside
who looked at PyPy a while back, my uninformed and naive impression
was that the project was suffering from the kid of issues I have
caricatured above....

Andrew

PS I guess you are aware of worse is better etc? I think this may
also be a US/Euro culture issue...

Luis M. González

unread,
Jan 18, 2009, 10:46:16 AM1/18/09
to

It's worth adding that, regarding the goal of having a faster python,
we have had for a long time a solid and useful proof-of-concept in
psyco. Pypy rolls up on this concept and adds many more useful
features, not all of them related to speed but to flexibility.
I have no doubt the project will be succesful. The question is how
long we will have to wait...

Luis

Brendan Miller

unread,
Jan 18, 2009, 6:30:30 PM1/18/09
to pytho...@python.org
> --
> http://mail.python.org/mailman/listinfo/python-list
>

That's interesting, I hadn't heard the reference counting mechanism
was related to the GIL. Is it just that you need to lock the reference
count before mutating it if there's no GIL? Really, that shouldn't be
the case. Reference counting can be done with a lock free algorithm.

Garbage collection is definitely in vogue right now. However, people
tend to treat it more like a religion than an algorithm. Garbage
collection vs reference counting actually has some trade offs both
ways. GC gets you some amortized performance gains, and some space
gains because you don't need to hang on to a bunch of counts. However,
GC also has the problem of having a very loose memory profile and poor
interactive performance during compaction if the heap is large. In
some cases this discussion becomes complicated with python because
python has both reference counting and GC.

Paul Rubin

unread,
Jan 18, 2009, 6:50:07 PM1/18/09
to
"Brendan Miller" <catp...@catphive.net> writes:
> That's interesting, I hadn't heard the reference counting mechanism
> was related to the GIL. Is it just that you need to lock the reference
> count before mutating it if there's no GIL?

Yes. Someone tried inserting such a lock but it slowed down the
single-cpu case unacceptably.

> Really, that shouldn't be
> the case. Reference counting can be done with a lock free algorithm.

That's interesting, got a reference? (no pun intended)

sk...@pobox.com

unread,
Jan 18, 2009, 7:40:06 PM1/18/09
to pytho...@python.org
Paul> That's interesting, got a reference? (no pun intended)

http://letmegooglethatforyou.com/?q=lock+free+reference+counting

<wink>

--
Skip Montanaro - sk...@pobox.com - http://smontanaro.dyndns.org/

Paul Rubin

unread,
Jan 18, 2009, 8:39:40 PM1/18/09
to
sk...@pobox.com writes:
> http://letmegooglethatforyou.com/?q=lock+free+reference+counting

I found a paper by Detlefs et al describing a method which is

a) patented
b) can potentially lock out some threads from ever running, and
c) relies on a hardware instruction (double compare and swap)
that's not available on most processors.

There are well known concurrent and parallel GC techniques that
don't have that problem, see for example Cheng's dissertation:
http://reports-archive.adm.cs.cmu.edu/anon/2001/CMU-CS-01-174.pdf

GHC 6.10 uses a parallel stop-the-world gc that is simpler, I think.

Brendan Miller

unread,
Jan 19, 2009, 11:00:15 PM1/19/09
to pytho...@python.org
Maybe I'm missing something here but a lock free algorithm for
reference counting seems pretty trivial. As long as you can atomically
increment and decrement an integer without locking you are pretty much
done.

For a reference implementation of lock free reference counting on all
common platforms check out boosts implementation of shared_ptr (a
reference counting smart pointer designed around multithreaded use
cases).

>There are well known concurrent and parallel GC techniques that

Hmm... I didn't really mention poor parallelism as a problem of GC. As
I see it the two trade offs that have to be made for GC vs alternative
techniques. The "embarrassing pause" during compaction which makes it
impossible to use for applications like interactive video display that
can't halt to compact a several gigabyte heap without causing stutter,
and the loose memory profile.

Maybe the document you sent me addresses those, and I'd be interested
if it did. It's 150~ pages though so I haven't really had time to read
it yet.

As far as parallelism problems with GC go... the only ones I can
imagine is that if you had a lot of threads going generating lots of
garbage you would need to start to compact more frequently. Since
compaction halts all threads, this could potentially cause very
frequent compactions? Is that what you were getting at? I'd wondered
about that before, but didn't know for a fact whether it came up in
real world scenarios.

Paul Rubin

unread,
Jan 19, 2009, 11:19:26 PM1/19/09
to
"Brendan Miller" <catp...@catphive.net> writes:
> Maybe I'm missing something here but a lock free algorithm for
> reference counting seems pretty trivial. As long as you can atomically
> increment and decrement an integer without locking you are pretty much
> done.

What cpu's do you know of that can atomically increment and decrement
integers without locking?

Carl Banks

unread,
Jan 19, 2009, 11:27:51 PM1/19/09
to
On Jan 19, 8:00 pm, "Brendan Miller" <catph...@catphive.net> wrote:
> Maybe I'm missing something here but a lock free algorithm for
> reference counting seems pretty trivial. As long as you can atomically
> increment and decrement an integer without locking you are pretty much
> done.

You're missing that most of the platforms that Python supports can't
actually do this. Keep in mind that it has to be atomic across all
cores.

(For Python to use such a technique, it would have to be doable on
every platform Python supports. They aren't going to get rid of the
GIL for some platforms and not others.)


> For a reference implementation of lock free reference counting on all
> common platforms check out boosts implementation of shared_ptr (a
> reference counting smart pointer designed around multithreaded use
> cases).

I just looked at the boost documentation, which claims that multiple
asynchronous writes to the same shared_ptr results in undefined
behavior. That will not suffice for Python reference counting.

Carl Banks

Paul Rubin

unread,
Jan 19, 2009, 11:31:36 PM1/19/09
to
"Brendan Miller" <catp...@catphive.net> writes:
> As long as you can atomically increment and decrement an integer
> without locking you are pretty much done.

Most cpu's can't do that.

> For a reference implementation of lock free reference counting on all
> common platforms check out boosts implementation of shared_ptr (a
> reference counting smart pointer designed around multithreaded use
> cases).

That sounds very mysterious to me--are you sure it is intended for
multiprocessing and not just multiple threads on a single processor?
Do you have a url for the code?

> I see it the two trade offs that have to be made for GC vs alternative
> techniques. The "embarrassing pause" during compaction which makes it
> impossible to use for applications like interactive video display that
> can't halt to compact a several gigabyte heap without causing stutter,
> and the loose memory profile.

The embarassing pause is characteristic of a so-called "stop the
world" gc. A gc where any pauses are guaranteed bounded in size
(preferably to a small value) is called a "real time" gc. I believe
Cheng's algorithm is a real time alg but it's been a while since I
read that thesis.

Anyway, nobody programs interactive video in Python, and Python's ref
counting scheme is not real time since it may have to do arbitrarily
many decrefs when you release a large structure.

> As far as parallelism problems with GC go... the only ones I can
> imagine is that if you had a lot of threads going generating lots of
> garbage you would need to start to compact more frequently. Since
> compaction halts all threads, this could potentially cause very
> frequent compactions? Is that what you were getting at?

I'm not sure what you're asking. Parallelism is used for making
things go faster, but making things parallel usually complicates them.
A parallel realtime gc is sort of the holy grail.

Cheng's thesis is pretty readable as I remember, and discusses most of
these issues. There is also a book by Appel about gc, but it's
perhaps a little bit dated by now.

Kay Schluehr

unread,
Jan 20, 2009, 1:55:33 AM1/20/09
to
On 17 Jan., 01:37, "Brendan Miller" <catph...@catphive.net> wrote:

> Is this going anywhere or is this just architecture astronautics?
>
> The RPython project seems kind of interseting to me and I'd like to
> see more python implementations, but looking at the project I can't
> help but think that they haven't really explained *why* they are doing
> the things they are doing.

Remember that the original objective of PyPy was to improve the JIT.
Psyco is limited by the fact that the whole runtime is implemented in
C. The infamous "faster than C" actually refers to work on program
specializers on C code i.e. treating C as a language that is JIT
compiled on a fine grained level ( block structure - whole function
JIT compilation wouldn't obviously yield any advantages ).

So it is not just the application level Python code that shall run
through the JIT but also the interpreter level code. So why not equate
them and think about interpreter level code as Python as well? This
might be the idea. But then the problem of bootstrapping comes up:
there is no interpreter level code with the required properties. Hence
RPython that can serve as a foundation.

I'm also not sure I like the approach. Rather than designing a whole
new runtime for having a fully reflective system I'd model C in Python
( call it CiPy ) and create a bijection between CiPy code and C code.
Once this has been established one can study other translations of
CiPy or Python into CiPy ( not unlike Pyrex/Cython ) doing systematic
refactorings on CiPy code in Python, study properties using runtime
reflection, experiment with macro systems etc. All of this is done in
PyPy as well but sometimes it seems the team has created more new
problems than they solved.

> Anyway, I can tell this is the sort of question that some people will
> interpret as rude. Asking hard questions is never polite, but it is
> always necessary :)

I think it is a good question.

sk...@pobox.com

unread,
Jan 20, 2009, 6:09:41 AM1/20/09
to Carl Banks, pytho...@python.org

Carl> I just looked at the boost documentation, which claims that
Carl> multiple asynchronous writes to the same shared_ptr results in
Carl> undefined behavior. That will not suffice for Python reference
Carl> counting.

Carl, I'm quite unfamiliar with Boost and am not a C++ person, so may have
read what you saw but not recognized it in the C++ punctuation soup. I
couldn't find what you referred to. Can you provide a URL?

Thanks,

Paul Rubin

unread,
Jan 20, 2009, 6:46:10 AM1/20/09
to
sk...@pobox.com writes:
> Carl, I'm quite unfamiliar with Boost and am not a C++ person, so may have
> read what you saw but not recognized it in the C++ punctuation soup. I
> couldn't find what you referred to. Can you provide a URL?

http://www.boost.org/doc/libs/1_37_0/libs/smart_ptr/shared_ptr.htm#ThreadSafety

Note there is also something called "interprocess shared pointers"
(link below) that don't have that caveat. I wonder if they use locks.

http://www.boost.org/doc/libs/1_37_0/doc/html/interprocess/interprocess_smart_ptr.html#interprocess.interprocess_smart_ptr.shared_ptr

Brendan Miller

unread,
Jan 20, 2009, 3:13:15 PM1/20/09
to pytho...@python.org
On Tue, Jan 20, 2009 at 3:46 AM, Paul Rubin
<"http://phr.cx"@nospam.invalid> wrote:
> sk...@pobox.com writes:
>> Carl, I'm quite unfamiliar with Boost and am not a C++ person, so may have
>> read what you saw but not recognized it in the C++ punctuation soup. I
>> couldn't find what you referred to. Can you provide a URL?
>
> http://www.boost.org/doc/libs/1_37_0/libs/smart_ptr/shared_ptr.htm#ThreadSafety

I think you are misreading that. It says that multiple assignments to
different copies of a share_ptr in different threads are fine. This is
the important bit because copies of the same pointer share the same
reference count, and assignments and resets will decrement that ref
count. They say they don't handle mutations of the *same* pointer in
different threads, which is a different issue.

The programmer is responsible for synchronizing access to the pointer,
and the pointed to object, but not the ref count. This may be not be
obvious if you don't use shared_ptr a lot.

You also mentioned in an earlier post that most processors don't
support automic increments... I'm hesitant to dispute you here because
this is outside of my field of expertise. However, a quick google
search for "x86 atomic increment" comes up with this:

XADD

http://www.codemaestro.com/reviews/8
http://siyobik.info/index.php?module=x86&id=159

Again, I'm not an assembly guru, but his sounds like exactly what
you'd want. It gains exclusive access to system memory in a
multi-processor environtment without leaving user space. Thus XADD is
an atomic increment/decrement. It would be educational if someone more
famliar with x86 than me could speak to the performance merits of this
on modern multicore machines.

Ross Ridge

unread,
Jan 20, 2009, 3:19:39 PM1/20/09
to
Carl Banks <pavlove...@gmail.com> wrote:
>I just looked at the boost documentation, which claims that multiple
>asynchronous writes to the same shared_ptr results in undefined
>behavior. That will not suffice for Python reference counting.

If you read the Boost documentation you'll see that while multiple
simulaneous writes to a shared_ptr *instance* isn't supported, multiple
simulataneous updates of the reference counter used by the shared_ptr
implementaiton is supported. Example #1 in the URL that Paul Rubin
gave demonstrates this. The allowed simulanteous assignment of the two
shared_ptr instances "p2" and "p3" from the same shared_ptr instance
"p", requires that a single reference count be incremented simultenously.

On the other hand, the documentation also notes that the implementation
is only lock free on x86, IA-64, and PowerPC systems, though I think it
would also be possible on MIPS CPUs.

Ross Ridge

--
l/ // Ross Ridge -- The Great HTMU
[oo][oo] rri...@csclub.uwaterloo.ca
-()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
db //

Rhamphoryncus

unread,
Jan 20, 2009, 3:36:11 PM1/20/09
to
On Jan 16, 5:37 pm, "Brendan Miller" <catph...@catphive.net> wrote:
> So I kind of wanted to ask this question on the pypy mailing list..
> but there's only a pypy-dev list, and I don't want to put noise on the
> dev list.
>
> What's the point of RPython? By this, I don't mean "What is RPython"?
> I get that. I mean, why?

There are some distinct benefits of RPython:

* starts up as real python code, letting you define globals that later
get "snapshotted" into static code
* can be executed using a real python interpreter, getting much more
reliable debugging (but broken rpython code might run fine under
python)
* provides a clear avenue for extension, by adding more python
features

You might argue just having a python syntax is also a benefit, but the
semantics are so different as to more than counteract it. Add in the
cost of implementing your own compiler... yeah.

Rhamphoryncus

unread,
Jan 20, 2009, 3:52:32 PM1/20/09
to
On Jan 19, 9:00 pm, "Brendan Miller" <catph...@catphive.net> wrote:
> Maybe I'm missing something here but a lock free algorithm for
> reference counting seems pretty trivial. As long as you can atomically
> increment and decrement an integer without locking you are pretty much
> done.

"lock free" is largely meaningless. What it really means is "we use
small hardware locks rather than big software locks, thereby reducing
(but not eliminating!) the contention".

Atomic refcounting is easy. If done sparingly to minimize contention
it works great. Python uses refcounting massively with heavily
contended objects (think about your builtin types, constants, globals,
classes, etc.) It does not perform acceptably for python.

The second issue is the objects themselves, like a list which is
mutable. If you're using it in a single thread or writing from
multiple threads this is a non-trivial constant cost. If your object
is not modified after creation and is read from many threads a lock
would be a point of contention, preventing you from scaling freely.
The dicts used by classes and globals are an import example of this,
and a successful implementation needs something non-contending. I
assume Jython and IronPython do this.

Paul Rubin

unread,
Jan 20, 2009, 4:30:21 PM1/20/09
to
"Brendan Miller" <catp...@catphive.net> writes:
> > http://www.boost.org/doc/libs/1_37_0/libs/smart_ptr/shared_ptr.htm#ThreadSafety
>
> I think you are misreading that. It says that multiple assignments to
> different copies of a share_ptr in different threads are fine.

I'll respond to this later.

> http://www.codemaestro.com/reviews/8
> http://siyobik.info/index.php?module=x86&id=159
>
> Again, I'm not an assembly guru, but his sounds like exactly what
> you'd want. It gains exclusive access to system memory in a
> multi-processor environtment without leaving user space. Thus XADD is
> an atomic increment/decrement. It would be educational if someone more
> famliar with x86 than me could speak to the performance merits of this
> on modern multicore machines.

Those links describe using the LOCK prefix, which as the name implies,
asserts a lock, so it is no longer "lockless reference counting". The
LOCK prefix adds about 100 cycles to the instruction.

Paul Rubin

unread,
Jan 20, 2009, 4:39:39 PM1/20/09
to
Rhamphoryncus <rha...@gmail.com> writes:
> "lock free" is largely meaningless. What it really means is "we use
> small hardware locks rather than big software locks, thereby reducing
> (but not eliminating!) the contention".

At least in the case of Haskell's "software transactional memory",
reads are genuinely lock free in the normal uncontended case. You use
LOCK XCHG to increment a counter in the object when you want to
update, and increment it again when you are done updating. Since the
counter starts at 0, any time it has an odd value, an update is in
process and any other thread can notice that the object is locked and
try again later without asserting any locks itself. If the counter
has an even value, it is unlocked, but there is a chance that a writer
thread can lock and udpate the object while a reader thread has a
lockless access in progress. The reader detects this has occurred
when it finishes its transaction, read the counter a second time, and
notices that the counter has been incremented (maybe more than once)
since the transaction started; it then rolls back its operation and
tries again. I'm not explaining that well so you can read about it in
SPJ's paper or Keir Fraser's.

> The second issue is the objects themselves, like a list which is
> mutable. If you're using it in a single thread or writing from
> multiple threads this is a non-trivial constant cost. If your object
> is not modified after creation and is read from many threads a lock
> would be a point of contention, preventing you from scaling freely.
> The dicts used by classes and globals are an import example of this,
> and a successful implementation needs something non-contending. I
> assume Jython and IronPython do this.

I'm pretty sure Jython makes no attempt at all to mess with ref
counts--it just relies on the underlying Java gc. I have no idea
about IronPython.

Rhamphoryncus

unread,
Jan 20, 2009, 5:02:24 PM1/20/09
to
On Jan 20, 2:39 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> Rhamphoryncus <rha...@gmail.com> writes:
> > "lock free" is largely meaningless.  What it really means is "we use
> > small hardware locks rather than big software locks, thereby reducing
> > (but not eliminating!) the contention".
>
> At least in the case of Haskell's "software transactional memory",
> reads are genuinely lock free in the normal uncontended case.  You use
> LOCK XCHG to increment a counter in the object when you want to
> update, and increment it again when you are done updating.  Since the
> counter starts at 0, any time it has an odd value, an update is in
> process and any other thread can notice that the object is locked and
> try again later without asserting any locks itself.  If the counter
> has an even value, it is unlocked, but there is a chance that a writer
> thread can lock and udpate the object while a reader thread has a
> lockless access in progress.  The reader detects this has occurred
> when it finishes its transaction, read the counter a second time, and
> notices that the counter has been incremented (maybe more than once)
> since the transaction started; it then rolls back its operation and
> tries again.  I'm not explaining that well so you can read about it in
> SPJ's paper or Keir Fraser's.

a) The contended case is the issue, not the uncontended case. An
uncontended lock is just constant overhead, not a barrier to
scalability
b) Locks on linux (since the switch to futexes) are pretty close to
free when uncontended.

Transactions are really for different properties:
a) read patterns can be uncontended (fully scalable)
b) a preempted writer does not block other writers, guaranteeing
forward progress
c) ad-hoc combination of several objects into a single atomic update


> > The second issue is the objects themselves, like a list which is
> > mutable.  If you're using it in a single thread or writing from
> > multiple threads this is a non-trivial constant cost.  If your object
> > is not modified after creation and is read from many threads a lock
> > would be a point of contention, preventing you from scaling freely.
> > The dicts used by classes and globals are an import example of this,
> > and a successful implementation needs something non-contending.  I
> > assume Jython and IronPython do this.
>
> I'm pretty sure Jython makes no attempt at all to mess with ref
> counts--it just relies on the underlying Java gc.  I have no idea
> about IronPython.

The second issue has *nothing* to do with refcounts. It is the
objects themselves. A classic example is trying to do "i += 1", which
breaks down into "i = i + 1", which isn't an atomic operation.

Java's ConcurrentHash map gets this right.

Ross Ridge

unread,
Jan 20, 2009, 5:22:07 PM1/20/09
to
Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
>Those links describe using the LOCK prefix, which as the name implies,
>asserts a lock, so it is no longer "lockless reference counting".

No, it doesn't assert a lock in the sense used in this thread. On modern
Intel systems it's normally handled completely within the cache.

>The LOCK prefix adds about 100 cycles to the instruction.

That's unavoidable. It's still faster than spin lock, which would also
have to use syncronizing instructions.

Rhodri James

unread,
Jan 20, 2009, 8:42:03 PM1/20/09
to pytho...@python.org

x86 (and pretty much any 8080 derivative, come to think of it).

That said, what you actually need is an atomic read-and-increment,
which is a lot harder to find. Even if you could find a platform
supporting it, it doesn't help you on other platforms you may need to
run on. Just do the locking properly and worry about optimisations
later.

--
Rhodri James *-* Wildebeeste Herder to the Masses

sk...@pobox.com

unread,
Jan 20, 2009, 8:48:13 PM1/20/09
to Rhodri James, pytho...@python.org
Rhodri> Just do the locking properly and worry about optimisations
Rhodri> later.

The locking is already done properly (assuming we are discussing CPython's
reference counting). Now is later. People are thinking about lots of
optimizations, this is just one of them.

Paul Rubin

unread,
Jan 20, 2009, 9:29:01 PM1/20/09
to
"Rhodri James" <rho...@wildebst.demon.co.uk> writes:
> > What cpu's do you know of that can atomically increment and decrement
> > integers without locking?
>
> x86 (and pretty much any 8080 derivative, come to think of it).

It would not have occurred to me that "lock inc" increments "without
locking". I understand that's different from a lock value sitting in
the data object but I thought that "lock-free algorithm" meant one
that didn't assert any of these hardware locks either. Maybe I'm
wrong.

> Just do the locking properly and worry about optimisations later.

That has already been tried, and found to be unacceptably slow for the
purpose at hand. Now we're looking for the optimizations.

Rhodri James

unread,
Jan 21, 2009, 12:32:41 AM1/21/09
to pytho...@python.org
On Wed, 21 Jan 2009 02:29:01 -0000, Paul Rubin
<"http://phr.cx"@nospam.invalid> wrote:

> "Rhodri James" <rho...@wildebst.demon.co.uk> writes:
>> > What cpu's do you know of that can atomically increment and decrement
>> > integers without locking?
>>
>> x86 (and pretty much any 8080 derivative, come to think of it).
>
> It would not have occurred to me that "lock inc" increments "without
> locking". I understand that's different from a lock value sitting in
> the data object but I thought that "lock-free algorithm" meant one
> that didn't assert any of these hardware locks either. Maybe I'm
> wrong.

I tend to live in single-core worlds, so "inc" on its lonesome works
just fine.

>> Just do the locking properly and worry about optimisations later.
>
> That has already been tried, and found to be unacceptably slow for the
> purpose at hand. Now we're looking for the optimizations.

In that case I'd second the suggestion of taking a long, hard look
at the Linux core locking and synchronisation primatives.

Brendan Miller

unread,
Jan 21, 2009, 12:35:05 AM1/21/09
to pytho...@python.org
On Tue, Jan 20, 2009 at 6:29 PM, Paul Rubin
<"http://phr.cx"@nospam.invalid> wrote:
> "Rhodri James" <rho...@wildebst.demon.co.uk> writes:
>> > What cpu's do you know of that can atomically increment and decrement
>> > integers without locking?
>>
>> x86 (and pretty much any 8080 derivative, come to think of it).
>
> It would not have occurred to me that "lock inc" increments "without
> locking". I understand that's different from a lock value sitting in
> the data object but I thought that "lock-free algorithm" meant one
> that didn't assert any of these hardware locks either. Maybe I'm
> wrong.

Right... I was wondering about that. Well, any kind of memory access
gets exclusive control of the bus except on NUMA, but I'm wondering
how CMPXCHG
http://en.wikipedia.org/wiki/Compare-and-swap

compares to XADD performance wise.

It seems to me that both of them must pull the old value across the
bus, hang onto the bus, and move the new value in. Maybe since XADD
needs to perform arithmetic there will be a few cycles lag between
getting the old value and pushing the new value? Maybe CMPXCHG doesn't
go through the ALU?

If the bus isn't just sitting idle and you can immediately push out
the new value then there's no real locking. Actually this article
explicitly mentions CMPXCHG as lock free.

http://en.wikipedia.org/wiki/Lock-free_and_wait-free_algorithms

Paul Rubin

unread,
Jan 21, 2009, 12:37:44 AM1/21/09
to
"Rhodri James" <rho...@wildebst.demon.co.uk> writes:
> I tend to live in single-core worlds, so "inc" on its lonesome works
> just fine.

In a single core world we wouldn't be having these endless angsty
conversations about eliminating the GIL.

> > That has already been tried, and found to be unacceptably slow for the
> > purpose at hand. Now we're looking for the optimizations.
>
> In that case I'd second the suggestion of taking a long, hard look
> at the Linux core locking and synchronisation primatives.

Do you understand what the issue is, about CPython's reference counts?
Do you have any idea how often the interpreter updates them? Even
using LOCK INCR (raw machine instruction, not portable C, no operating
system involvement), manipulating the ref counts would be around 100x
slower than it is now.

Rhodri James

unread,
Jan 21, 2009, 12:58:57 AM1/21/09
to Paul Rubin, pytho...@python.org
On Wed, 21 Jan 2009 05:37:44 -0000, Paul Rubin
<"http://phr.cx"@nospam.invalid> wrote:

> "Rhodri James" <rho...@wildebst.demon.co.uk> writes:
>> In that case I'd second the suggestion of taking a long, hard look
>> at the Linux core locking and synchronisation primatives.
>
> Do you understand what the issue is, about CPython's reference counts?
> Do you have any idea how often the interpreter updates them? Even
> using LOCK INCR (raw machine instruction, not portable C, no operating
> system involvement), manipulating the ref counts would be around 100x
> slower than it is now.

You asked a question about CPUs with atomic update, strongly implying
there were none. All I did was supply a counter-example, and observe
that neither this nor the question were in fact helpful.

My inability to spell "primitive", on the other hand, is all my own
fault.

Paul Rubin

unread,
Jan 21, 2009, 1:03:52 AM1/21/09
to
"Rhodri James" <rho...@wildebst.demon.co.uk> writes:
> You asked a question about CPUs with atomic update, strongly implying
> there were none. All I did was supply a counter-example,

Well, more specifically, atomic update without locking, but I counted
the LOCK prefix as locking while other people didn't, and that caused
some confusion. Of course I'm aware of the LOCK prefix but it slows
down the instruction enormously compared with a non-locked instruction.

Brendan Miller

unread,
Jan 21, 2009, 1:19:21 AM1/21/09
to pytho...@python.org

I'm curious about that. I've been looking around for timing
information on the lock signal, but am having some trouble finding
them. Intuitively, given that the processor is much faster than the
bus, and you are just waiting for processor to complete an addition or
comparison before put the new memory value on the bus, it seems like
there should be very little additional bus contention vs a normal add
instruction.

Paul Rubin

unread,
Jan 21, 2009, 5:12:11 AM1/21/09
to
Brendan Miller <catp...@catphive.net> writes:
> I'm curious about that. I've been looking around for timing
> information on the lock signal, but am having some trouble finding
> them. Intuitively, given that the processor is much faster than the
> bus, and you are just waiting for processor to complete an addition or
> comparison before put the new memory value on the bus, it seems like
> there should be very little additional bus contention vs a normal add
> instruction.

The bus is slow compared with the L1 cache. I just looked for figures
and couldn't find any either, but I remember seeing some for the Core
2 saying around 100 cycles, and something similar for the Athlon. I
just came across something saying the Core i7 is considerably better
than the Core 2 at this.

The real solution is to not use so much bus locking. Get rid of the
ref counts and use a real gc, if not in CPython then in PyPy.

sk...@pobox.com

unread,
Jan 21, 2009, 6:31:45 AM1/21/09
to Brendan Miller, pytho...@python.org

Brendan> Intuitively, given that the processor is much
Brendan> faster than the bus, and you are just waiting for processor to
Brendan> complete an addition or comparison before put the new memory
Brendan> value on the bus...

I think in Python's case the reference count will often be in the
processor's local cache. (Presumably, if the current thread is going to
Py_DECREF the object it's been working with the object's state recently.)
The stuff I read suggested that simply locking the local cache would be
significantly faster than having to lock the memory bus.

Scott David Daniels

unread,
Jan 21, 2009, 11:19:04 AM1/21/09
to
Brendan Miller wrote:
> On Tue, Jan 20, 2009 at 10:03 PM, Paul Rubin
> <"http://phr.cx"@nospam.invalid> wrote:
>> .... Of course I'm aware of the LOCK prefix but it slows

>> down the instruction enormously compared with a non-locked instruction.
>
> I'm curious about that. I've been looking around for timing
> information on the lock signal, but am having some trouble finding
> them. Intuitively, given that the processor is much faster than the
> bus, and you are just waiting for processor to complete an addition or
> comparison before put the new memory value on the bus, it seems like
> there should be very little additional bus contention vs a normal add
> instruction.
The opcode cannot simply talk to its cache, it must either go directly
to off-chip memory or communicate to other processors that it (and it
alone) owns the increment target.

--Scott David Daniels
Scott....@Acm.Org

Brendan Miller

unread,
Jan 21, 2009, 1:47:33 PM1/21/09
to Scott David Daniels, pytho...@python.org

Oh, right. *light bulb goes on* I wasn't thinking about cache at all.

MRAB

unread,
Jan 21, 2009, 2:57:21 PM1/21/09
to pytho...@python.org
Brendan Miller wrote:
> On Wed, Jan 21, 2009 at 8:19 AM, Scott David Daniels
> <Scott....@acm.org> wrote:
> Oh, right. *light bulb goes on* I wasn't thinking about cache at all.
>
I'm not sure whether multicore processors share a cache or, if not, have
some other on-chip mechanism. Multiprocessor machines, however, are a
different matter...

Ross Ridge

unread,
Jan 21, 2009, 4:43:43 PM1/21/09
to
Scott David Daniels <Scott....@Acm.Org> wrote:
>The opcode cannot simply talk to its cache, it must either go directly
>to off-chip memory or communicate to other processors that it (and it
>alone) owns the increment target.

In fact all it does simply talk to its cache. From the "Intel 64 and
IA-32 Architectures Software Developer's Manual, Volume 3A: System
Programming Guide, Part 1":

For the P6 and more recent processor families, if the area of
memory being locked during a LOCK operation is cached in the
processor that is performing the LOCK operation as write-back
memory and is completely contained in a cache line, the processor
may not assert the LOCK# signal on the bus. Instead, it will
modify the memory location internally and allow it's cache
coherency mechanism to insure that the operation is carried
out atomically. This operation is called "cache locking." The
cache coherency mechanism automatically prevents two or more
processors that have cached the same area of memory from
simultaneously modifying data in that area.

The same cache coherency mechanism that prevents ordinary "unlocked"
instructions from simulanteously modifying the same cache line on
two different processors also provides the guarantee with "locked"
instructions. There's no additional hardware locks involved, and no
additional communication required.

Rhamphoryncus

unread,
Jan 21, 2009, 5:25:13 PM1/21/09
to
On Jan 21, 12:57 pm, MRAB <goo...@mrabarnett.plus.com> wrote:
> I'm not sure whether multicore processors share a cache or, if not, have
> some other on-chip mechanism. Multiprocessor machines, however, are a
> different matter...

They share some, but also have some exclusive. How much of which
depends entirely on which CPU you have.

Paul Rubin

unread,
Jan 21, 2009, 10:25:13 PM1/21/09
to
Ross Ridge <rri...@csclub.uwaterloo.ca> writes:
> Scott David Daniels <Scott....@Acm.Org> wrote:
> >The opcode cannot simply talk to its cache, it must either go directly
> >to off-chip memory or communicate to other processors that it (and it
> >alone) owns the increment target.

> The cache coherency mechanism automatically prevents two or


> more processors that have cached the same area of memory from
> simultaneously modifying data in that area.
>
> The same cache coherency mechanism that prevents ordinary "unlocked"
> instructions from simulanteously modifying the same cache line on
> two different processors also provides the guarantee with "locked"
> instructions. There's no additional hardware locks involved, and no
> additional communication required.

The cache coherency mechanism is what Scott described as
"communicat[ing] to other processors that it (and it alone) owns the
increment target". The cache coherency mechanism is not a trivial
thing at all. It introduces its own hazards and delays, and it is
getting more complicated all the time as processors and caches get
faster and larger. Some time ago, cpu's hit their megahertz limits
and that's why we're using multicores now. Some PL researchers think
cache coherency is going to be the next limit, and are advocating
languages like Erlang, which avoid use of shared memory and have
separate heaps per thread; or alternatively, approaches like the MS
Singularity research OS which relies on something like a linear type
system to statically ensure that a given object is accessible to only
one thread at a time. (That approach allows transferring objects
between threads with no locks or copying required).

Paul Rubin

unread,
Jan 21, 2009, 11:35:25 PM1/21/09
to
Brendan Miller <catp...@catphive.net> writes:
> Actually this article explicitly mentions CMPXCHG as lock free.
> http://en.wikipedia.org/wiki/Lock-free_and_wait-free_algorithms

I see, that clears up some mysteries. Thanks.

Paul Rubin

unread,
Jan 21, 2009, 11:46:02 PM1/21/09
to
Rhamphoryncus <rha...@gmail.com> writes:
> a) The contended case is the issue, not the uncontended case. An
> uncontended lock is just constant overhead, not a barrier to
> scalability

a1) Really what matters is the actual mix between contended and
uncontended accesses, and the synchronization strategy affects the
amount of contention. For example, the traditional locking strategy
involves acquiring a lock before reading the object, so two
simultaneous read-only accesses would create lock contention. With
STM, only updates acquire a lock, so multiple read-only threads can
access the object simultaneously with no contention.

a2) Constant factors matter!!! If using a different mechanism makes
Python 2x faster, that's a very powerful reason to favor the other
mechanism.

> b) Locks on linux (since the switch to futexes) are pretty close to
> free when uncontended.

I thought futexes use a traditional locked read-modify-write
instruction which is 50-100 cycles, cheap compared to a system call
but quite expensive compared with a normal 1-cycle non-locked
operation.


> > > The second issue is the objects themselves, like a list which is
> > > mutable.  If you're using it in a single thread or writing from

> > > multiple threads this is a non-trivial constant cost.  ...


> The second issue has *nothing* to do with refcounts. It is the
> objects themselves. A classic example is trying to do "i += 1", which
> breaks down into "i = i + 1", which isn't an atomic operation.

Oh, I see what you mean; yes, you're right, it's best to avoid doing
those sorts of operation in shared objects too frequently.

> Java's ConcurrentHash map gets this right.

I'll have to look that up, it sounds interesting. I've been staying
away from Java but a situation is coming up where I may have to (ugh)
start using it. Thanks.

Rhamphoryncus

unread,
Jan 22, 2009, 2:14:00 AM1/22/09
to
On Jan 21, 9:46 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> Rhamphoryncus <rha...@gmail.com> writes:
> > a) The contended case is the issue, not the uncontended case.  An
> > uncontended lock is just constant overhead, not a barrier to
> > scalability
>
> a1) Really what matters is the actual mix between contended and
> uncontended accesses, and the synchronization strategy affects the
> amount of contention.  For example, the traditional locking strategy
> involves acquiring a lock before reading the object, so two
> simultaneous read-only accesses would create lock contention.  With
> STM, only updates acquire a lock, so multiple read-only threads can
> access the object simultaneously with no contention.  

Aye, but my point is really about the myth of lock-free algorithms
being uncontending — it's simply not true, and CAN'T be true. A write
is inherently a mutually exclusive operation. There's all sorts of
ways to avoid contending for reads, spread out the writes and have a
single thread coalesce them, etc, but fundamentally the write will
involve some mutual exclusion.

Aahz

unread,
Jan 22, 2009, 9:00:50 AM1/22/09
to
In article <7xd4ele...@ruckus.brouhaha.com>,

Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
>alex23 <wuw...@gmail.com> writes:
>>
>> Here's an article by Guido talking about the last attempt to remove
>> the GIL and the performance issues that arose:
>>
>> "I'd welcome a set of patches into Py3k *only if* the performance for
>> a single-threaded program (and for a multi-threaded but I/O-bound
>> program) *does not decrease*."
>
>The performance decrease is an artifact of CPython's rather primitive
>storage management (reference counts in every object). This is
>pervasive and can't really be removed. But a new implementation
>(e.g. PyPy) can and should have a real garbage collector that doesn't
>suffer from such effects.

CPython's "primitive" storage management has a lot to do with the
simplicity of interfacing CPython with external libraries. Any solution
that proposes to get rid of the GIL needs to address that.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

Weinberg's Second Law: If builders built buildings the way programmers wrote
programs, then the first woodpecker that came along would destroy civilization.

Aahz

unread,
Jan 22, 2009, 9:09:02 AM1/22/09
to
In article <gl84uf$ju9$1...@rumours.uwaterloo.ca>,

IIRC, it was Bruce Eckel I heard talking about discovering all kinds of
nasty Java thread bugs because cache coherency wasn't working the way the
Java developers thought it did.... This is apparently something very
difficult to get correct.

MRAB

unread,
Jan 22, 2009, 10:13:30 AM1/22/09
to pytho...@python.org
How much difference would it make if the reference counts weren't in
cached memory? I'm thinking that an object could have a pointer to its
reference count, which would be stored elsewhere in some uncached memory.

Ross Ridge

unread,
Jan 22, 2009, 12:01:47 PM1/22/09
to
Ross Ridge <rri...@csclub.uwaterloo.ca> writes:
> The same cache coherency mechanism that prevents ordinary "unlocked"
> instructions from simulanteously modifying the same cache line on
> two different processors also provides the guarantee with "locked"
> instructions. There's no additional hardware locks involved, and no
> additional communication required.

Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
>The cache coherency mechanism is what Scott described as
>"communicat[ing] to other processors that it (and it alone) owns the
>increment target". The cache coherency mechanism is not a trivial
>thing at all. It introduces its own hazards and delays, and it is
>getting more complicated all the time as processors and caches get
>faster and larger.

*sigh* Could you please just read what I wrote above? The LOCK prefix
makes NO DIFFERENCE to anything you mentioned. When current CPython
implementation increments a referernce count, it doesn't use the LOCK
prefix, but the cache line modified by the instruction still has to be
owned by the processor. If it did use the LOCK prefix there would be
no change in how how the cache coherency mechanism worked. There'd be
no additional complications, hazards or delays in how the processor
communicates with other processors.

Carl Banks

unread,
Jan 22, 2009, 11:38:21 PM1/22/09
to
On Jan 22, 6:00 am, a...@pythoncraft.com (Aahz) wrote:
> In article <7xd4ele060....@ruckus.brouhaha.com>,

> Paul Rubin  <http://phr...@NOSPAM.invalid> wrote:
>
> >alex23 <wuwe...@gmail.com> writes:
>
> >> Here's an article by Guido talking about the last attempt to remove
> >> the GIL and the performance issues that arose:
>
> >> "I'd welcome a set of patches into Py3k *only if* the performance for
> >> a single-threaded program (and for a multi-threaded but I/O-bound
> >> program) *does not decrease*."
>
> >The performance decrease is an artifact of CPython's rather primitive
> >storage management (reference counts in every object).  This is
> >pervasive and can't really be removed.  But a new implementation
> >(e.g. PyPy) can and should have a real garbage collector that doesn't
> >suffer from such effects.
>
> CPython's "primitive" storage management has a lot to do with the
> simplicity of interfacing CPython with external libraries.  Any solution
> that proposes to get rid of the GIL needs to address that.

I recently was on a long road trip, and was not driver, and with
nothing better to do thought quite a bit about how this.

I concluded that, aside from one major trap, it wouldn't really be
more difficult to inteface Python to external libraries, just
differently difficult. Here is briefly what I came up with:

1. Change the singular Python type into three metatypes:
immutable_type, mutable_type, and mutable_dict_type. (In the latter
case, the object itself is immutable but the dict can be modified.
This, of course, would be the default metaclass in Python.) Only
mutable_types would require a mutex when accessing.

2. API wouldn't have to change much. All regular API would assume
that objects are unlocked (if mutable) and in a consistent state.
It'll lock any mutable objects it needs to access. There would also
be a low-level API that assumes the objects are locked (if mutable)
and does not require objects to be consistent. I imagine most
extensions would call the standard API most of the time.

3. If you are going to use the low-level API on a mutable object, or
are going to access the object structure directly, you need to acquire
the object's mutex. Macros such as Py_LOCK(), Py_LOCK2(), Py_UNLOCK()
would be provided.

4. Objects would have to define a method, to be called by the GC, that
marks every object it references. This would be a lot like the
current tp_visit, except it has to be defined for any object that
references another object, not just objects that can participate in
cycles. (A conservative garbage collector wouldn't suffice for Python
because Python quite often allocates blocks but sets the pointer to an
offset within the block. In fact, that's true of almost any Python-
defined type.) Unfortunately, references on the stack would need to
be registered as well, so "PyObject* p;" might have to be replaced
with something like "Py_DECLARE_REF(PyObject,p);" which magically
registers it. Ugly.

5. Py_INCREF and Py_DECREF are gone.

6. GIL is gone.

So, you gain the complexity of a two-level API, having to lock mutable
objects sometimes, and defining more visitor methods than before, but
you don't have to keep INCREFs and DECREFs straight, which is no small
thing.

The major trap is the possibily of deadlock. To help minimize the
risk there would be macros to lock multiple objects at once. Py_LOCK2
(a,b), which guarantess that if in another thread is calling Py_LOCK2
(b,a) at the same time, it won't result in a deadlock. What's
disappointing is that the deadlocking possibility is always with you,
much like the reference counts are.


Carl Banks

Paul Rubin

unread,
Jan 23, 2009, 12:31:11 AM1/23/09
to
aa...@pythoncraft.com (Aahz) writes:
> CPython's "primitive" storage management has a lot to do with the
> simplicity of interfacing CPython with external libraries. Any solution
> that proposes to get rid of the GIL needs to address that.

This, I don't understand. Other languages like Lisp and Java and
Haskell have foreign function interfaces that easier to program than
Python's, -and- they don't use reference counts. There's usually some
primitive to protect objects from garbage collection while the foreign
function is using them, etc. The Java Native Interface (JNI) and the
Haskell FFI are pretty well documented. The Emacs Lisp system is not
too hard to figure out from examining the source code, etc.

Rhamphoryncus

unread,
Jan 23, 2009, 12:38:37 AM1/23/09
to

IMO, locking of the object is a secondary problem. Python-safethread
provides one solution, but it's not the only conceivable one. For the
sake of discussion it's easier to assume somebody else is solving it
for you.

Instead, focus on just the garbage collection. What are the practical
issues of modifying CPython to use a tracing GC throughout? It
certainly is possible to write an exact GC in C, but the stack
manipulation would be hideous. It'd also require significant rewrites
of the entire code base. Throw on that the performance is unclear (it
could be far worse for a single-threaded program), with no
straightforward way to make it a compile-time option..

Got any ideas for that?

Kay Schluehr

unread,
Jan 23, 2009, 12:39:46 AM1/23/09
to
Whatever sufficiently sophisticated topic was the initially discussed
it ends all up in a request for removing reference counting and the
GIL.

Chris Rebert

unread,
Jan 23, 2009, 12:43:35 AM1/23/09
to Kay Schluehr, pytho...@python.org

+1 QOTW

- Chris

--
Follow the path of the Iguana...
http://rebertia.com

Carl Banks

unread,
Jan 23, 2009, 1:09:40 AM1/23/09
to

That assumption might be good for the sake of the discussion *you*
want to have, but it's not for discussion I was having, which was to
address Aahz's claim that GIL makes extension writing simple by
presenting a vision of what Python might be like if it had a mark-and-
sweep collector. The details of the GC are a small part of that and
wouldn't affect my main point even if they are quite different than I
described. Also, extension writers would have to worry about locking
issues here, so it's not acceptable to assume somebody else will solve
that problem.


> Instead, focus on just the garbage collection.

[snip rest of threadjack]

You can ignore most of what I was talking about and focus on
technicalities of garbage collection if you want to. I will not be
joining you in that discussion, however.


Carl Banks

Paul Rubin

unread,
Jan 23, 2009, 1:15:24 AM1/23/09
to
Carl Banks <pavlove...@gmail.com> writes:
> 3. If you are going to use the low-level API on a mutable object, or
> are going to access the object structure directly, you need to acquire
> the object's mutex. Macros such as Py_LOCK(), Py_LOCK2(), Py_UNLOCK()
> would be provided.

You mean every time you access a list or dictionary or class instance,
you have to acquire a mutex? That sounds like a horrible slowdown.

Carl Banks

unread,
Jan 23, 2009, 1:40:10 AM1/23/09
to
On Jan 22, 10:15 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:

Yes, and it's never going to happen in CPython any other way. It's
considered a bug if Python code can segfault the interpreter; all
runtime errors are supposed to raise exceptions. The only way to
ensure that won't happen is to make sure that only one thread can can
access the internals of a mutable object at a time.

BTW, class instances are usually immutable and thus don't require a
mutex in the system I described.


Carl Banks

Philip Semanchuk

unread,
Jan 23, 2009, 2:13:17 AM1/23/09
to python-list (General)

Is this a variant of Godwin's Law for Python?

Kay Schluehr

unread,
Jan 23, 2009, 3:50:48 AM1/23/09
to
On 23 Jan., 08:13, Philip Semanchuk <phi...@semanchuk.com> wrote:
> On Jan 23, 2009, at 12:39 AM, Kay Schluehr wrote:
>
> > Whatever sufficiently sophisticated topic was initially discussed

> > it ends all up in a request for removing reference counting and the
> > GIL.
>
> Is this a variant of Godwin's Law for Python?

Definitely. It's a stable fixed point attractor. No matter how often
it was discussed to dead in the past months the likelihood that
someone mentions the GIL or ref-counting approaches 1. This is
particularly remarkable because it is inverse proportional to the
observable activity in this domain so there are really no news.

Other similarly strange phenomena: whenever Xah Lee posts one of his
infamous rants it attracts at least a dozen of newsgroup readers that
try to persuade each other not to respond which will inevitably grow
his thread and keep it alive for a long time.

Steve Holden

unread,
Jan 23, 2009, 7:59:42 AM1/23/09
to pytho...@python.org

Indeed it would, but hey, let's not let that stop us repeating the
thinking that's gone into CPython over the last fifteen years. "Those
who cannot remember the past are condemned to repeat it".

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Tim Rowe

unread,
Jan 23, 2009, 9:50:00 AM1/23/09
to pytho...@python.org
2009/1/23 Kay Schluehr <kay.sc...@gmx.net>:

> Whatever sufficiently sophisticated topic was the initially discussed
> it ends all up in a request for removing reference counting and the
> GIL.

Well, maybe, but it seems to me that the real issue here is that we
need to remove reference counting and the GIL.

--
Tim Rowe

sk...@pobox.com

unread,
Jan 23, 2009, 10:33:08 AM1/23/09
to Steve Holden, pytho...@python.org

>> You mean every time you access a list or dictionary or class
>> instance, you have to acquire a mutex? That sounds like a horrible
>> slowdown.

Steve> Indeed it would, but hey, let's not let that stop us repeating
Steve> the thinking that's gone into CPython over the last fifteen
Steve> years. "Those who cannot remember the past are condemned to
Steve> repeat it".

Also, every object is mutable at some level. Tuples, ints and floats are
definitely mutable at creation time. You need to hold a mutex then, so
Carl's notion of three types of objects breaks down then.

Skip

J. Cliff Dyer

unread,
Jan 23, 2009, 11:30:00 AM1/23/09
to Kay Schluehr, pytho...@python.org
I dub it Schluehr's law.

> --
> http://mail.python.org/mailman/listinfo/python-list
>

Tim Rowe

unread,
Jan 23, 2009, 12:47:18 PM1/23/09
to pytho...@python.org
2009/1/23 Benjamin Kaplan <benjami...@case.edu>:

> BTW, he said sufficiently sophisticated topic. Since there hasn't been an
> extremely long post here yet, I don't know if this counts.

Had I waited until the thread became long enough, somebody else would
have already raised the issue -- you must surely know about Schluehr's
Law? ;-)


--
Tim Rowe

Paul Rubin

unread,
Jan 23, 2009, 12:53:02 PM1/23/09
to
sk...@pobox.com writes:
> Also, every object is mutable at some level. Tuples, ints and floats are
> definitely mutable at creation time. You need to hold a mutex then, so
> Carl's notion of three types of objects breaks down then.

Hopefully, at creation time, they will usually be in a scope where
other threads can't see them.

Rhamphoryncus

unread,
Jan 23, 2009, 2:01:24 PM1/23/09
to

I'm sorry, you're right, I misunderstood your context.

Hrvoje Niksic

unread,
Jan 23, 2009, 2:19:35 PM1/23/09
to
Carl Banks <pavlove...@gmail.com> writes:

> Unfortunately, references on the stack would need to be registered
> as well, so "PyObject* p;" might have to be replaced with something
> like "Py_DECLARE_REF(PyObject,p);" which magically registers it.
> Ugly.

Not only registered at the beginning of the function, but also (since
CPython uses C, not C++) explicitly unregistered at every point of
exit from the function. Emacs implements these as macros called GCPRO
and UNGCPRO, and they're very easy to get wrong. In a way, they are
even worse than the current Python INCREF/DECREF.

See description at, for example,
http://www.xemacs.org/Documentation/beta/html/internals_19.html#SEC78

Carl Banks

unread,
Jan 23, 2009, 2:45:50 PM1/23/09
to

immutable_type objects wouldn't exist at all until their
PyWhatever_New or their tp_new member is called. After that, the
reference exists only on the local stack, which is accessible only to
one thread. As long as you finish initializing the object while it's
still only on the stack, there is no possibility of a conflict.

What about tp_init, then, you ask? Well it's simple: immutable_type
doesn't call it. In fact, it requires that tp_init, tp_setattro,
tp_mapping->mp_setitem, etc., are all null.

immutable_obejcts have no instance dict, so if you want to create
attributes in Python you have to use slots. immutable_object.__new__
accepts keyword arguments and initializes the slots with the value.

class Record(immutable_object,slots=['name','number']):
def __new__(cls,name):
number = db.lookup_number(name)
immutable_object.__new__(cls,name=name,number=number)


Carl Banks

Steve Holden

unread,
Jan 23, 2009, 3:01:39 PM1/23/09
to pytho...@python.org
Rhamphoryncus wrote:
[... eighty-eight quoted lines ...]

>
> I'm sorry, you're right, I misunderstood your context.

Perhaps you could trim your posts to quote only the relevant context?
Thanks.

Martin P. Hellwig

unread,
Jan 23, 2009, 3:46:57 PM1/23/09
to
Perhaps, but you could also say that if it was that widely discussed and
still not changed that it was for a *good* reason, perhaps the reason
may change over time but at this moment it is still a good reason.

Or you can argue that even when an argument is repeated indefinitely it
doesn't make it suddenly right.

--
mph

Terry Reedy

unread,
Jan 23, 2009, 4:06:00 PM1/23/09
to pytho...@python.org
Tim Rowe wrote:
> 2009/1/23 Kay Schluehr <kay.sc...@gmx.net>:
>> Whatever sufficiently sophisticated topic was the initially discussed
>> it ends all up in a request for removing reference counting and the
>> GIL.
>
> Well, maybe, but it seems to me that the real issue here is that we
> need to remove reference counting and the GIL.

Guido will give up reference counting and the consequent immediate gc
when but only when *someone* presents an implementation that does and
and that runs faster. Given the current ferment in dynamic language
implementation, he just today wrote on pydev list "CPython's
implementation strategy *will* evolve". Whether that means R.C. goes
away or not remains to be seen.


Tim Rowe

unread,
Jan 23, 2009, 7:04:09 PM1/23/09
to Martin P. Hellwig, pytho...@python.org
2009/1/23 Martin P. Hellwig <martin....@dcuktec.org>:

> Or you can argue that even when an argument is repeated indefinitely it
> doesn't make it suddenly right.

No, but it makes for a confirmation of Schluehr's law :-)

--
Tim Rowe

Bryan Olson

unread,
Jan 23, 2009, 8:48:44 PM1/23/09
to
Carl Banks wrote:
[...]

> BTW, class instances are usually immutable and thus don't require a
> mutex in the system I described.

Then you are describing a language radically different from Python.


--
--Bryan

Carl Banks

unread,
Jan 23, 2009, 9:54:18 PM1/23/09
to

Bzzt.

Hint: aside from the reference count, most class instances are
immutable in Python *today*.


Carl Banks

Paul Rubin

unread,
Jan 23, 2009, 10:19:42 PM1/23/09
to
Bryan Olson <fakea...@nowhere.org> writes:
> > BTW, class instances are usually immutable and thus don't require a
> > mutex in the system I described.
> Then you are describing a language radically different from Python.

That one threw me for a minute too, but I think the idea is that the
class instance itself is immutable, while its slots (specifically the
attribute dictionary) point to mutable objects.

Steven D'Aprano

unread,
Jan 23, 2009, 10:35:06 PM1/23/09
to


That seems so utterly wrong that either you're an idiot or you're talking
at cross purposes to what Bryan and I think you're saying. Since I know
you're not an idiot, I can only imagine you have a different
understanding of what it means to be immutable than I do.

For example... is this instance immutable?

class Foo:
bar = None

f = Foo()
f.baz = True

If so, what do you mean by immutable?

--
Steven

Paul Rubin

unread,
Jan 23, 2009, 10:42:20 PM1/23/09
to
Steven D'Aprano <st...@REMOVE-THIS-cybersource.com.au> writes:
> For example... is this instance immutable?
>
> class Foo:
> bar = None
>
> f = Foo()
> f.baz = True
> If so, what do you mean by immutable?

If I understand Carl, yes, f is immutable. When you set f.bar, the
contents of f.__dict__ changes but f itself does not change. It still
points to the same dictionary, etc.

Bryan Olson

unread,
Jan 23, 2009, 11:22:31 PM1/23/09
to
Paul Rubin wrote:

The meaning of 'immutable' is well-established in the Python literature.
Python's immutable types include tuple, frozenset, and various kinds of
numbers and strings. Class instances, not so much.

What's more, this matters when considering a GIL-less implementation.
Typical method calls can traverse lots of mutable stuff just to find the
function to invoke.


--
--Bryan

Carl Banks

unread,
Jan 23, 2009, 11:28:01 PM1/23/09
to
On Jan 23, 7:19 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:

Correct, and, getting back to the point, an instance itself would not
require a mutex. The dict would need it, of course.

It's customary to gloss over this technicality for convenience's sake
in most discussions, but it matters in this case.


Carl Banks

Paul Rubin

unread,
Jan 23, 2009, 11:32:22 PM1/23/09
to
Bryan Olson <fakea...@nowhere.org> writes:
> The meaning of 'immutable' is well-established in the Python
> literature. Python's immutable types include tuple, frozenset, and
> various kinds of numbers and strings. Class instances, not so much.

But we are talking about objects as they live in the C implementation,
not at the level where Python code deals with them.

Carl Banks

unread,
Jan 24, 2009, 12:36:25 AM1/24/09
to
On Jan 23, 8:22 pm, Bryan Olson <fakeaddr...@nowhere.org> wrote:
> Paul Rubin wrote:
> > Bryan Olson writes:
> >>> BTW, class instances are usually immutable and thus don't require a
> >>> mutex in the system I described.
> >> Then you are describing a language radically different from Python.
>
> > That one threw me for a minute too, but I think the idea is that the
> > class instance itself is immutable, while its slots (specifically the
> > attribute dictionary) point to mutable objects.
>
> The meaning of 'immutable' is well-established in the Python literature.
> Python's immutable types include tuple, frozenset, and various kinds of
> numbers and strings. Class instances, not so much.

Of course class instances aren't immutable types: they're not even
types. Let me suggest that there is a distinction between an
immutable type and an immutable object.

Immutable types are what you are talking about: it means that the type
provides usable mutator methods. (Whether they mutate the object
itself or some associated object doesn't matter.) Immutable objects
are a different thing: it means the object cannot change in memory.

Classes in Python are mutable types, usually. Class instances are
(except for the refcount) immutable objects, usually.

We usually talk about mutability of types, but mutability of objects
is appropriate for discussion as well. So I can't really agree with
your assessment that I wrong to call class instances immutable objects
aside from refcounts.

BTW, here's a minor brain bender: immutable types are mutable objects.


> What's more, this matters when considering a GIL-less implementation.
> Typical method calls can traverse lots of mutable stuff just to find the
> function to invoke.

Now that doesn't make sense at all. What is all this mutable stuff
you have to go through, and what does it have to do with the GIL-less
implementation? Can you explain further? Or are you just saying
it'll be slow.


Carl Banks

Paul Rubin

unread,
Jan 24, 2009, 12:40:42 AM1/24/09
to
Carl Banks <pavlove...@gmail.com> writes:
> > What's more, this matters when considering a GIL-less implementation.
> > Typical method calls can traverse lots of mutable stuff just to find the
> > function to invoke.
>
> Now that doesn't make sense at all. What is all this mutable stuff
> you have to go through, and what does it have to do with the GIL-less
> implementation?

foo.bar() has to look up bar in foo's attribute dictionary.

Bryan Olson

unread,
Jan 24, 2009, 1:55:32 AM1/24/09
to
Carl Banks wrote:
> Paul Rubin wrote:
>> Bryan Olson writes:
>>>> BTW, class instances are usually immutable and thus don't require a
>>>> mutex in the system I described.
>>> Then you are describing a language radically different from Python.

>> That one threw me for a minute too, but I think the idea is that the
>> class instance itself is immutable, while its slots (specifically the
>> attribute dictionary) point to mutable objects.
>
> Correct, and, getting back to the point, an instance itself would not
> require a mutex. The dict would need it, of course.

The dict is part of the object and some important slots are mutable.
What's more, if your point was to do away with the GIL without changing
Python semantics nor requiring heaping masses of locking, I fear you've
not fully grasped the problem.

Languages such as Java, C++, and C# do not require nearly as much
locking as Python because they are not nearly as dynamic. Consider how a
method is invoked. Java / C++ / C# can always resolve the method with no
locking; the data they need is fixed at link time. Python is much more
dynamic. A demo:


from __future__ import print_function

# A simple class hierarchy:

class Foo (object):
title = "Mr. Foo"

def identify(self):
print("I'm called", self.title)

class Bar (Foo):
title = "Ms. Bar"

class Jafo (Bar):
title = "Major Jafo"

dude = Jafo()


# Searches 5 dicts to find the function to call:

dude.identify()


# Class dicts are mutable:

def id(self):
print("I'm still called", self.title)

Jafo.identify = id
dude.identify()


# An object's class can change:

dude.__class__ = Bar
dude.identify()


# A class's base classes can change:

class Fu (object):
def identify(self):
print("Call me", self.title)

Bar.__bases__ = (Fu,)
dude.identify()


Result:
>>>
I'm called Major Jafo
I'm still called Major Jafo
I'm called Ms. Bar
Call me Ms. Bar
>>>


In that first simple call of dude.identify(), Python looked up "dude" in
the module's (mutable) dict to find the object. Then it looked in
object's (mutable) dict, and did not find "identify". So it looked at
the object's (mutable) __class__ slot, and in that class's (mutable)
dict. It still did not find "identify", so it looked in the class's
(mutable) __bases__ slot, following Python's depth-first "object
protocol" and thus looking in what other (mutable) class dicts and
(mutable) __bases__ slots were required.

An object's __dict__ slot is *not* mutable; thus we could gain some
efficiency by protecting the object and its dict with the same lock. I
do not see a major win in Mr. Banks' point that we do not need to lock
the object, just its dict.


--
--Bryan

Paul Rubin

unread,
Jan 24, 2009, 2:04:19 AM1/24/09
to
Bryan Olson <fakea...@nowhere.org> writes:
> An object's __dict__ slot is *not* mutable; thus we could gain some
> efficiency by protecting the object and its dict with the same lock. I
> do not see a major win in Mr. Banks' point that we do not need to lock
> the object, just its dict.

If the dict contents don't change often, maybe we could use an
STM-like approach to eliminate locks when reading. That would of
course require rework to just about every C function that accesses
Python objects.

Carl Banks

unread,
Jan 24, 2009, 2:44:31 AM1/24/09
to
On Jan 23, 10:55 pm, Bryan Olson <fakeaddr...@nowhere.org> wrote:
> Carl Banks wrote:
> > Paul Rubin wrote:
> >> Bryan Olson writes:
> >>>> BTW, class instances are usually immutable and thus don't require a
> >>>> mutex in the system I described.
> >>> Then you are describing a language radically different from Python.
> >> That one threw me for a minute too, but I think the idea is that the
> >> class instance itself is immutable, while its slots (specifically the
> >> attribute dictionary) point to mutable objects.
>
> > Correct, and, getting back to the point, an instance itself would not
> > require a mutex.  The dict would need it, of course.
>
> The dict is part of the object and some important slots are mutable.
> What's more, if your point was to do away with the GIL without changing
> Python semantics nor requiring heaping masses of locking, I fear you've
> not fully grasped the problem.

If that's what you think I thought, I fear you haven't read anything
I've written.

[snip]


> An object's __dict__ slot is *not* mutable; thus we could gain some
> efficiency by protecting the object and its dict with the same lock. I
> do not see a major win in Mr. Banks' point that we do not need to lock
> the object, just its dict.

I'm not sure where you got the idea that I was claiming this was a
major win. I'm not sure where you got the idea that I claimed that
having to lock all mutable objects wouldn't be slow. For Pete's sake,
you followed up to a post where I *agreed* that it would be slow.


Carl Banks

Bryan Olson

unread,
Jan 24, 2009, 2:45:51 AM1/24/09
to
Carl Banks wrote:

> Bryan Olson wrote:
>> Paul Rubin wrote:
>>> Bryan Olson writes:
>>>>> BTW, class instances are usually immutable and thus don't require a
>>>>> mutex in the system I described.

>>>> Then you are describing a language radically different from Python.

>>> That one threw me for a minute too, but I think the idea is that the
>>> class instance itself is immutable, while its slots (specifically the
>>> attribute dictionary) point to mutable objects.

>> The meaning of 'immutable' is well-established in the Python literature.
>> Python's immutable types include tuple, frozenset, and various kinds of
>> numbers and strings. Class instances, not so much.
>
> Of course class instances aren't immutable types: they're not even
> types.

Class instances my or may not be types, but that has nothing to do with
any point at issue here. I'm saying that class instances are usually,
mutable, contrary to your claim, "class instances are usually immutable".

> Let me suggest that there is a distinction between an
> immutable type and an immutable object.

Let me further suggest that Python's documentation is entirely clear:
instances of immutable types are immutable objects. Instances of mutable
types are generally mutable objects. For example, tuple is an immutable
type, and thus tuples are immutable; list is a mutable type, and thus
lists are mutable.

> Immutable types are what you are talking about: it means that the type
> provides usable mutator methods. (Whether they mutate the object
> itself or some associated object doesn't matter.) Immutable objects
> are a different thing: it means the object cannot change in memory.
>
> Classes in Python are mutable types, usually. Class instances are
> (except for the refcount) immutable objects, usually.

There's where we disagree. I assert that class instances are usually
mutable objects.

> We usually talk about mutability of types, but mutability of objects
> is appropriate for discussion as well. So I can't really agree with
> your assessment that I wrong to call class instances immutable objects
> aside from refcounts.

That confusion disappears once one grasps that instances of immutable
types are immutable objects.

> BTW, here's a minor brain bender: immutable types are mutable objects.

Some brains are too easily bent. Python is one of the many
object-oriented languages that reifies types as run-time objects. I see
no point in going through Python's immutable types to examine if there
is any way to mutate the corresponding type objects.


>> What's more, this matters when considering a GIL-less implementation.
>> Typical method calls can traverse lots of mutable stuff just to find the
>> function to invoke.
>
> Now that doesn't make sense at all. What is all this mutable stuff
> you have to go through, and what does it have to do with the GIL-less
> implementation? Can you explain further? Or are you just saying
> it'll be slow.

I elaborated at some length in another strand of this thread.


--
--Bryan

Carl Banks

unread,
Jan 24, 2009, 3:06:02 AM1/24/09
to
On Jan 23, 11:45 pm, Bryan Olson <fakeaddr...@nowhere.org> wrote:

> Carl Banks wrote:
> > Classes in Python are mutable types, usually.  Class instances are
> > (except for the refcount) immutable objects, usually.
>
> There's where we disagree. I assert that class instances are usually
> mutable objects.

Nope, you're dead wrong, nothing more to it. The bits of a class
instance never change. The __dict__ is a mutable object. The class
instance itself isn't. It's not reasonable to call an object whose
bits can't change a mutable obect.

Anyway, all you're doing is distracting attention from my claim that
instance objects wouldn't need to be locked. They wouldn't, no matter
how mutable you insist these objects whose bits would never change
are.


> > BTW, here's a minor brain bender: immutable types are mutable objects.
>
> Some brains are too easily bent.

[Snip attempt to take this comment seriously]

And some brains are so stodgy they can't even take a lighthearted
comment lightheartedly.


Carl Banks

Hrvoje Niksic

unread,
Jan 24, 2009, 3:33:58 AM1/24/09
to
Carl Banks <pavlove...@gmail.com> writes:

> On Jan 23, 11:45 pm, Bryan Olson <fakeaddr...@nowhere.org> wrote:
>> Carl Banks wrote:
>> > Classes in Python are mutable types, usually.  Class instances are
>> > (except for the refcount) immutable objects, usually.
>>
>> There's where we disagree. I assert that class instances are usually
>> mutable objects.
>
> Nope, you're dead wrong, nothing more to it. The bits of a class
> instance never change. The __dict__ is a mutable object. The class
> instance itself isn't. It's not reasonable to call an object whose
> bits can't change a mutable obect.

The "bits" of class instances can very well change.

>>> class X(object): pass
...
>>> x = X()
>>> d = x.__dict__
>>> x.__dict__ = {}
>>> map(id, [d, x.__dict__])
[170329876, 170330012]

The Python cookbook even describes patterns that depend on this
operation working. Class instance's contents can also change if
__slots__ is in use, when its __class__ is assigned to (admittedly the
latter being a rare operation, but still).

> Anyway, all you're doing is distracting attention from my claim that
> instance objects wouldn't need to be locked. They wouldn't, no
> matter how mutable you insist these objects whose bits would never
> change are.

Only if you're not implementing Python, but another language that
doesn't support __slots__ and assignment to instance.__dict__.

Gabriel Genellina

unread,
Jan 24, 2009, 3:40:10 AM1/24/09
to pytho...@python.org
En Sat, 24 Jan 2009 06:06:02 -0200, Carl Banks <pavlove...@gmail.com>
escribió:

> On Jan 23, 11:45 pm, Bryan Olson <fakeaddr...@nowhere.org> wrote:
>> Carl Banks wrote:
>> > Classes in Python are mutable types, usually.  Class instances are
>> > (except for the refcount) immutable objects, usually.
>>
>> There's where we disagree. I assert that class instances are usually
>> mutable objects.
>
> Nope, you're dead wrong, nothing more to it. The bits of a class
> instance never change. The __dict__ is a mutable object. The class
> instance itself isn't. It's not reasonable to call an object whose
> bits can't change a mutable obect.
>
> Anyway, all you're doing is distracting attention from my claim that
> instance objects wouldn't need to be locked. They wouldn't, no matter
> how mutable you insist these objects whose bits would never change
> are.

Me too, I don't get what you mean. Consider a list instance, it contains a
count of allocated elements, and a pointer to some memory block. They
change when the list is resized. This counts as "mutable" to me. I really
don't understand your claim.

--
Gabriel Genellina

Paul Rubin

unread,
Jan 24, 2009, 3:43:23 AM1/24/09
to
Hrvoje Niksic <hni...@xemacs.org> writes:
> Not only registered at the beginning of the function, but also (since
> CPython uses C, not C++) explicitly unregistered at every point of
> exit from the function. Emacs implements these as macros called GCPRO
> and UNGCPRO, and they're very easy to get wrong. In a way, they are
> even worse than the current Python INCREF/DECREF.

That's a fairly natural style in Lisp implementation and it is not
that difficult to code in. I've hacked inside Emacs and have written
another interpreter with a similar setup; it's certainly easier than
keeping track of refcounts in my experience. For one thing, you can
raise exceptions anywhere you want, and the stack unwind can clean up
the gc protection, but it can't know nearly as easily which refcounts
to adjust, unless you record all the increfs the same way as the
GCPROs.

Steve Holden

unread,
Jan 24, 2009, 5:23:38 AM1/24/09
to pytho...@python.org
OK, so we have recently discussed whether objects are values, whether
function arguments are passed by reference, whether names are
references, and now we are, I suspect, about to have a huge further
discussion on the meaning of "immutable".

Sometimes I start to find this eternal pedantry a little tedious. I
suspect it's time I once more dropped out of c.l.py for a while.

regards
Steve

Hendrik van Rooyen

unread,
Jan 24, 2009, 3:29:19 AM1/24/09
to pytho...@python.org
"Martin P. Hellwig" <martin....@dcuktec.org> wrote:

> Or you can argue that even when an argument is repeated indefinitely it
> doesn't make it suddenly right.

This is no good.
It's a well known fact that anything I tell you three times is true.

To demonstrate:

Tim Rowe's post earlier in this thread was the funniest one in a long time.
Tim Rowe's post earlier in this thread was the funniest one in a long time.
Tim Rowe's post earlier in this thread was the funniest one in a long time.

see - it's true!

- Hendrik


Carl Banks

unread,
Jan 24, 2009, 3:05:22 PM1/24/09
to
On Jan 24, 12:40 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
> En Sat, 24 Jan 2009 06:06:02 -0200, Carl Banks <pavlovevide...@gmail.com>  


Yeah, yeah, I know that, and in the bickering that ensued some aspects
of the original context were lost. I should really not have been
pulled into Bryan's strawman over the definition of immutable, since
it's just a label, I oughtn't give a damn what it's called, I only
care what it does. I didn't handle this repartee very well.

Anyway, it goes back to the original vision for a mark-and-sweep
Python language as I presented what seems like a long time ago.

I presented the type system that had three base metatypes instead of
the one base metatype we have now: immutable_type, mutable_type, and
mutable_dict_type. The default metatype for Python classes would be
mutable_dict_type, which is a type wherein the object itself would be
mutable but it would still have all the mutator methods __init__,
__setattr__, etc., but they could only act on the __dict__.
mutable_dict_types would not be allowed to define any slots, and
__dict__ wouldn't be reassignable. (However, it seems reasonable to
allow the base tp_new to accept a dict argument.)

OTOTH, list's metatype would be mutable_type, so the type object
itself would be mutable.

Bryan claimed that that would be a very different language from
Python, apparently because it hadn't occurred to him that by-and-
large, the instance itself doesn't change, only the dict does.
Perhaps Bryan was thinking of __dict__'s reassignability (that
certainly didn't occur to me); if he was I apologize for my snideness.

HAVING SAID THAT, I still still say what I proposed would not be a
radically different language from Python. A little different, of
course. Much slower, almost certainly.


Carl Banks

Carl Banks

unread,
Jan 24, 2009, 3:24:49 PM1/24/09
to
On Jan 24, 12:33 am, Hrvoje Niksic <hnik...@xemacs.org> wrote:

> Carl Banks <pavlovevide...@gmail.com> writes:
> > Anyway, all you're doing is distracting attention from my claim that
> > instance objects wouldn't need to be locked.  They wouldn't, no
> > matter how mutable you insist these objects whose bits would never
> > change are.
>
> Only if you're not implementing Python, but another language that
> doesn't support __slots__ and assignment to instance.__dict__.

I am only going to say all Python types prior to 3.0 support classes
without __slots__, so while I agree that this would be a different
language, it wouldn't necessarily be "not Python".

(Python, of course, is what GvR says Python is, and he isn't going to
say that the language I presented is Python. No worries there! :)
I'm only saying that it is conceivably similar enough to be a
different version of Python. It would be a different language in the
same way that Python 2.6 is a different language from Python 3.0.)

Incidentally, the proposal does allow slots to be defined, but only
for actual mutable types, not for ordinary class instances.


Carl Banks

Carl Banks

unread,
Jan 24, 2009, 3:27:32 PM1/24/09
to
On Jan 24, 12:24 pm, Carl Banks <pavlovevide...@gmail.com> wrote:
> On Jan 24, 12:33 am, Hrvoje Niksic <hnik...@xemacs.org> wrote:
>
> > Carl Banks <pavlovevide...@gmail.com> writes:
> > > Anyway, all you're doing is distracting attention from my claim that
> > > instance objects wouldn't need to be locked.  They wouldn't, no
> > > matter how mutable you insist these objects whose bits would never
> > > change are.
>
> > Only if you're not implementing Python, but another language that
> > doesn't support __slots__ and assignment to instance.__dict__.
>
> I am only going to say all Python types prior to 3.0 support classes
> without __slots__,

I made a mistake, and I don't want to risk confusion at this point.

"all Python ***versions** prior to 3.0"

and I am talking about old-style classes, of course. Prior to 2.2 no
classes at all supported slots.


Carl Banks

Carl Banks

unread,
Jan 24, 2009, 3:32:00 PM1/24/09
to
On Jan 24, 12:05 pm, Carl Banks <pavlovevide...@gmail.com> wrote:
> The default metatype for Python classes would be
> mutable_dict_type, which is a type wherein the object itself would be
> mutable but it would still have all the mutator methods __init__,
> __setattr__, etc., but they could only act on the __dict__.


Not wanting to risk confusion.

"The default metatype for Python classes would be mutable_dict_type,

which is a type wherein the object itself would be ***immutable*** but


it would still have all the mutator methods __init__, __setattr__,
etc., but they could only act on the __dict__."


Carl Banks

Bryan Olson

unread,
Jan 27, 2009, 10:35:46 AM1/27/09
to

I'm a fan of lock-free data structure and software transactional memory,
but I'm also a realist. Heck, I'm one of this group's outspoken
advocates of threaded architectures. Theoretical breakthroughs will
happen, but in real world of today, threads are great but GIL-less
Python is a loser.

Wherever Python is going, let's recognize that a scripting language that
rocks is better than any other kind of language that sucks.


--
--Bryan

It is loading more messages.
0 new messages