What's the point of RPython? By this, I don't mean "What is RPython"?
I get that. I mean, why?
The goals of the pypy project seems to be to create a fast python
implementation. I may be wrong about this, as the goals seem a little
amorphous if you look at their home page.
So, to do that this RPython compiler was created? Except that it
doesn't compile python, it compiles a static subset of python that has
type inference like ML.
This RPython thing seems like a totally unnecessary intermediate step.
Instead of writing an implementation of one language, there's an
implementation of two languages.
Actually, it seems like it harms the ultimate goal. For the
interpreted pyton to be fast, the interpreter has to be written in a
reasonably fast language. ML, and C++ compilers have had a lot of work
put into their optimization steps. A lot of the performance boost you
get from a static language is that knowing the types at compile time
lets you inline code like crazy, unroll loops, and even execute code
at compile time.
RPython is a statically typed language because I guess the developers
associate static languages with speed? Except that they use it to
generate C code, which throws away the type information they need to
get the speed increase. Huh? I thought the goal was to write a fast
dynamic language, not a slow static one?
Is this going anywhere or is this just architecture astronautics?
The RPython project seems kind of interseting to me and I'd like to
see more python implementations, but looking at the project I can't
help but think that they haven't really explained *why* they are doing
the things they are doing.
Anyway, I can tell this is the sort of question that some people will
interpret as rude. Asking hard questions is never polite, but it is
always necessary :)
Thanks,
Brendan
Check out this thread:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/1a8c866b4c4d7521/2174f796cd687931?lnk=gst&q=enlighten+pypy#2174f796cd687931
One of pypy's goals is having a more flexible implementation, easier
to experiment with than cpython.
Having all written in the same language facilitates this task.
Why Rpython? Because by avoiding the dinamicity of the whole python
language, it is possible to translate this code to a static one (c,
java, whatever).
The translated rpython interpreter aims to be something similar in
performance to cpython. The only difference is that cpython was
written from scratch in c, while pypy's interpreter was written in
rpython and then translated automatically to c.
The pypy team intends to achieve greater speed and performance by
adding a JIT compiler to this interpreter.
So pypy-c + JIT = faster python.
I hope this helps...
Luis
Speed is mentioned, but as a secondary concern. The main goal seems to
be to create a vehicle into exploring the concept of dynamic languages
themselves. If that seems amorphous then it is because it is a
research project.
Matt
This is more or less covered in the FAQ:
"The restrictions are to ensure that type inference (and so,
ultimately, translation to other languages) of RPython programs is
possible. These restrictions only apply after the full import happens,
so at import time arbitrary Python code can be executed."
http://codespeak.net/pypy/dist/pypy/doc/faq.html#id25
> The goals of the pypy project seems to be to create a fast python
> implementation. I may be wrong about this, as the goals seem a little
> amorphous if you look at their home page.
The home page itself is ambiguous, and does oversell the performance
aspect. The *actual* goal as outlined by their official docs is to
implement Python in Python, at every level. If this means utilising a
less-dynamic subset of Python for the lower levels, its -still- at
least more-Python-than-C.
"PyPy is both:
* a reimplementation of Python in Python, and
* a framework for implementing interpreters and virtual machines for
programming languages, especially dynamic languages."
http://codespeak.net/pypy/dist/pypy/doc/faq.html#id11
The PyPy devs feel that this will allow them to more easily experiment
with optimising the interpreter for greater speeds, but that isn't one
of the stated goals (just, apparently, their 'secret' one).
> This RPython thing seems like a totally unnecessary intermediate step.
> Instead of writing an implementation of one language, there's an
> implementation of two languages.
My understanding is that the 'higher' level Python language features
are implemented on top of the restricted RPython features, so it's
more of an organic growth of one language than two separate
implementations.
> A lot of the performance boost you
> get from a static language is that knowing the types at compile time
> lets you inline code like crazy, unroll loops, and even execute code
> at compile time.
>
> RPython is a statically typed language because I guess the developers
> associate static languages with speed?
Doesn't the first paragraph above state the same thing that you
question in the next?
> Except that they use it to
> generate C code, which throws away the type information they need to
> get the speed increase. Huh? I thought the goal was to write a fast
> dynamic language, not a slow static one?
They're not just generating C code, they currently also target LLVM,
JVM and CLI.
> Is this going anywhere or is this just architecture astronautics?
Well, they actually seem to have achieved some substantial gains, so I
think it's unfair to imply that the project isn't based on pragmatic
objectives. Their initial JIT prototype failed to work as well as
expected, and they've spent some time re-thinking the approach; I'm
happier to wait for a stably performing JIT that can be improved over
time than a short term gain.
> The RPython project seems kind of interseting to me and I'd like to
> see more python implementations, but looking at the project I can't
> help but think that they haven't really explained *why* they are doing
> the things they are doing.
A lot of this is covered in the FAQ. Whether you agree with their
approach or not, they're the ones actively pushing this effort
forward.
It's been a few months since the last update, but the PyPy status blog
may have more information for you. At the very least, it's a venue to
discuss your concerns directly with the PyPy devs.
Ok fair enough. In some ways I see that as more of a purely
intellectual exercise than the practical endeavor that I assumed the
project was originally.
However, one of the links I was sent had one of the devs talking about
using the translation process to make C/Java/LLVM implementations out
of the same interpreter code. I'll say that makes a lot of sense.
Another question I was wondering about is whether they plan on
maintaining good C bindings? Will existing bindings for third party
libraries be able to work?
Also, are they going to do away with the GIL? The python devs seem to
consider the GIL a non-issue, though they may change their mind in 3
years when we all have 32 core desktops, until then getting rid of the
GIL would make pypy pretty attractive in some quarters. I know the
scons project was running into GIL issues.
Finally, I'm pretty unclear on what versions of python that pypy is targeting.
Well, it opens up Python as a whole for experimentation & modification
by Python developers, which seems pretty practical to me :)
> Another question I was wondering about is whether they plan on
> maintaining good C bindings? Will existing bindings for third party
> libraries be able to work?
You really should run through the whole faq:
http://codespeak.net/pypy/dist/pypy/doc/faq.html#can-i-use-cpython-extension-modules
(Spoiler alert: the answer is no.)
> Also, are they going to do away with the GIL?
One FAQ entry before the above:
http://codespeak.net/pypy/dist/pypy/doc/faq.html#do-threads-work-what-are-the-modules-that-work
> The python devs seem to
> consider the GIL a non-issue, though they may change their mind in 3
> years when we all have 32 core desktops, until then getting rid of the
> GIL would make pypy pretty attractive in some quarters. I know the
> scons project was running into GIL issues.
It's not a case of the Python devs digging their heels in and not
giving the world what it desperately wants. Here's an article by Guido
talking about the last attempt to remove the GIL and the performance
issues that arose:
"I'd welcome a set of patches into Py3k *only if* the performance for
a single-threaded program (and for a multi-threaded but I/O-bound
program) *does not decrease*."
http://www.artima.com/weblogs/viewpost.jsp?thread=214235
> Finally, I'm pretty unclear on what versions of python that pypy is targeting.
I wonder where such a frequently asked question could be answered? ;)
http://codespeak.net/pypy/dist/pypy/doc/faq.html#which-python-version-2-x-does-pypy-implement
The performance decrease is an artifact of CPython's rather primitive
storage management (reference counts in every object). This is
pervasive and can't really be removed. But a new implementation
(e.g. PyPy) can and should have a real garbage collector that doesn't
suffer from such effects.
I am not questioning rpython directly - the case for something like
that is obvious. But there's a question of balance. It's possible to
go on building ever more complex systems which are theoretically
justified, but which postpone ever finishing the job. At some point
there has to be a "good enough".
To some extent I am playing devil's advocate here, but as an outside
who looked at PyPy a while back, my uninformed and naive impression
was that the project was suffering from the kid of issues I have
caricatured above....
Andrew
PS I guess you are aware of worse is better etc? I think this may
also be a US/Euro culture issue...
It's worth adding that, regarding the goal of having a faster python,
we have had for a long time a solid and useful proof-of-concept in
psyco. Pypy rolls up on this concept and adds many more useful
features, not all of them related to speed but to flexibility.
I have no doubt the project will be succesful. The question is how
long we will have to wait...
Luis
That's interesting, I hadn't heard the reference counting mechanism
was related to the GIL. Is it just that you need to lock the reference
count before mutating it if there's no GIL? Really, that shouldn't be
the case. Reference counting can be done with a lock free algorithm.
Garbage collection is definitely in vogue right now. However, people
tend to treat it more like a religion than an algorithm. Garbage
collection vs reference counting actually has some trade offs both
ways. GC gets you some amortized performance gains, and some space
gains because you don't need to hang on to a bunch of counts. However,
GC also has the problem of having a very loose memory profile and poor
interactive performance during compaction if the heap is large. In
some cases this discussion becomes complicated with python because
python has both reference counting and GC.
Yes. Someone tried inserting such a lock but it slowed down the
single-cpu case unacceptably.
> Really, that shouldn't be
> the case. Reference counting can be done with a lock free algorithm.
That's interesting, got a reference? (no pun intended)
http://letmegooglethatforyou.com/?q=lock+free+reference+counting
<wink>
--
Skip Montanaro - sk...@pobox.com - http://smontanaro.dyndns.org/
I found a paper by Detlefs et al describing a method which is
a) patented
b) can potentially lock out some threads from ever running, and
c) relies on a hardware instruction (double compare and swap)
that's not available on most processors.
There are well known concurrent and parallel GC techniques that
don't have that problem, see for example Cheng's dissertation:
http://reports-archive.adm.cs.cmu.edu/anon/2001/CMU-CS-01-174.pdf
GHC 6.10 uses a parallel stop-the-world gc that is simpler, I think.
For a reference implementation of lock free reference counting on all
common platforms check out boosts implementation of shared_ptr (a
reference counting smart pointer designed around multithreaded use
cases).
>There are well known concurrent and parallel GC techniques that
Hmm... I didn't really mention poor parallelism as a problem of GC. As
I see it the two trade offs that have to be made for GC vs alternative
techniques. The "embarrassing pause" during compaction which makes it
impossible to use for applications like interactive video display that
can't halt to compact a several gigabyte heap without causing stutter,
and the loose memory profile.
Maybe the document you sent me addresses those, and I'd be interested
if it did. It's 150~ pages though so I haven't really had time to read
it yet.
As far as parallelism problems with GC go... the only ones I can
imagine is that if you had a lot of threads going generating lots of
garbage you would need to start to compact more frequently. Since
compaction halts all threads, this could potentially cause very
frequent compactions? Is that what you were getting at? I'd wondered
about that before, but didn't know for a fact whether it came up in
real world scenarios.
What cpu's do you know of that can atomically increment and decrement
integers without locking?
You're missing that most of the platforms that Python supports can't
actually do this. Keep in mind that it has to be atomic across all
cores.
(For Python to use such a technique, it would have to be doable on
every platform Python supports. They aren't going to get rid of the
GIL for some platforms and not others.)
> For a reference implementation of lock free reference counting on all
> common platforms check out boosts implementation of shared_ptr (a
> reference counting smart pointer designed around multithreaded use
> cases).
I just looked at the boost documentation, which claims that multiple
asynchronous writes to the same shared_ptr results in undefined
behavior. That will not suffice for Python reference counting.
Carl Banks
Most cpu's can't do that.
> For a reference implementation of lock free reference counting on all
> common platforms check out boosts implementation of shared_ptr (a
> reference counting smart pointer designed around multithreaded use
> cases).
That sounds very mysterious to me--are you sure it is intended for
multiprocessing and not just multiple threads on a single processor?
Do you have a url for the code?
> I see it the two trade offs that have to be made for GC vs alternative
> techniques. The "embarrassing pause" during compaction which makes it
> impossible to use for applications like interactive video display that
> can't halt to compact a several gigabyte heap without causing stutter,
> and the loose memory profile.
The embarassing pause is characteristic of a so-called "stop the
world" gc. A gc where any pauses are guaranteed bounded in size
(preferably to a small value) is called a "real time" gc. I believe
Cheng's algorithm is a real time alg but it's been a while since I
read that thesis.
Anyway, nobody programs interactive video in Python, and Python's ref
counting scheme is not real time since it may have to do arbitrarily
many decrefs when you release a large structure.
> As far as parallelism problems with GC go... the only ones I can
> imagine is that if you had a lot of threads going generating lots of
> garbage you would need to start to compact more frequently. Since
> compaction halts all threads, this could potentially cause very
> frequent compactions? Is that what you were getting at?
I'm not sure what you're asking. Parallelism is used for making
things go faster, but making things parallel usually complicates them.
A parallel realtime gc is sort of the holy grail.
Cheng's thesis is pretty readable as I remember, and discusses most of
these issues. There is also a book by Appel about gc, but it's
perhaps a little bit dated by now.
> Is this going anywhere or is this just architecture astronautics?
>
> The RPython project seems kind of interseting to me and I'd like to
> see more python implementations, but looking at the project I can't
> help but think that they haven't really explained *why* they are doing
> the things they are doing.
Remember that the original objective of PyPy was to improve the JIT.
Psyco is limited by the fact that the whole runtime is implemented in
C. The infamous "faster than C" actually refers to work on program
specializers on C code i.e. treating C as a language that is JIT
compiled on a fine grained level ( block structure - whole function
JIT compilation wouldn't obviously yield any advantages ).
So it is not just the application level Python code that shall run
through the JIT but also the interpreter level code. So why not equate
them and think about interpreter level code as Python as well? This
might be the idea. But then the problem of bootstrapping comes up:
there is no interpreter level code with the required properties. Hence
RPython that can serve as a foundation.
I'm also not sure I like the approach. Rather than designing a whole
new runtime for having a fully reflective system I'd model C in Python
( call it CiPy ) and create a bijection between CiPy code and C code.
Once this has been established one can study other translations of
CiPy or Python into CiPy ( not unlike Pyrex/Cython ) doing systematic
refactorings on CiPy code in Python, study properties using runtime
reflection, experiment with macro systems etc. All of this is done in
PyPy as well but sometimes it seems the team has created more new
problems than they solved.
> Anyway, I can tell this is the sort of question that some people will
> interpret as rude. Asking hard questions is never polite, but it is
> always necessary :)
I think it is a good question.
Carl, I'm quite unfamiliar with Boost and am not a C++ person, so may have
read what you saw but not recognized it in the C++ punctuation soup. I
couldn't find what you referred to. Can you provide a URL?
Thanks,
http://www.boost.org/doc/libs/1_37_0/libs/smart_ptr/shared_ptr.htm#ThreadSafety
Note there is also something called "interprocess shared pointers"
(link below) that don't have that caveat. I wonder if they use locks.
I think you are misreading that. It says that multiple assignments to
different copies of a share_ptr in different threads are fine. This is
the important bit because copies of the same pointer share the same
reference count, and assignments and resets will decrement that ref
count. They say they don't handle mutations of the *same* pointer in
different threads, which is a different issue.
The programmer is responsible for synchronizing access to the pointer,
and the pointed to object, but not the ref count. This may be not be
obvious if you don't use shared_ptr a lot.
You also mentioned in an earlier post that most processors don't
support automic increments... I'm hesitant to dispute you here because
this is outside of my field of expertise. However, a quick google
search for "x86 atomic increment" comes up with this:
XADD
http://www.codemaestro.com/reviews/8
http://siyobik.info/index.php?module=x86&id=159
Again, I'm not an assembly guru, but his sounds like exactly what
you'd want. It gains exclusive access to system memory in a
multi-processor environtment without leaving user space. Thus XADD is
an atomic increment/decrement. It would be educational if someone more
famliar with x86 than me could speak to the performance merits of this
on modern multicore machines.
If you read the Boost documentation you'll see that while multiple
simulaneous writes to a shared_ptr *instance* isn't supported, multiple
simulataneous updates of the reference counter used by the shared_ptr
implementaiton is supported. Example #1 in the URL that Paul Rubin
gave demonstrates this. The allowed simulanteous assignment of the two
shared_ptr instances "p2" and "p3" from the same shared_ptr instance
"p", requires that a single reference count be incremented simultenously.
On the other hand, the documentation also notes that the implementation
is only lock free on x86, IA-64, and PowerPC systems, though I think it
would also be possible on MIPS CPUs.
Ross Ridge
--
l/ // Ross Ridge -- The Great HTMU
[oo][oo] rri...@csclub.uwaterloo.ca
-()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
db //
There are some distinct benefits of RPython:
* starts up as real python code, letting you define globals that later
get "snapshotted" into static code
* can be executed using a real python interpreter, getting much more
reliable debugging (but broken rpython code might run fine under
python)
* provides a clear avenue for extension, by adding more python
features
You might argue just having a python syntax is also a benefit, but the
semantics are so different as to more than counteract it. Add in the
cost of implementing your own compiler... yeah.
"lock free" is largely meaningless. What it really means is "we use
small hardware locks rather than big software locks, thereby reducing
(but not eliminating!) the contention".
Atomic refcounting is easy. If done sparingly to minimize contention
it works great. Python uses refcounting massively with heavily
contended objects (think about your builtin types, constants, globals,
classes, etc.) It does not perform acceptably for python.
The second issue is the objects themselves, like a list which is
mutable. If you're using it in a single thread or writing from
multiple threads this is a non-trivial constant cost. If your object
is not modified after creation and is read from many threads a lock
would be a point of contention, preventing you from scaling freely.
The dicts used by classes and globals are an import example of this,
and a successful implementation needs something non-contending. I
assume Jython and IronPython do this.
I'll respond to this later.
> http://www.codemaestro.com/reviews/8
> http://siyobik.info/index.php?module=x86&id=159
>
> Again, I'm not an assembly guru, but his sounds like exactly what
> you'd want. It gains exclusive access to system memory in a
> multi-processor environtment without leaving user space. Thus XADD is
> an atomic increment/decrement. It would be educational if someone more
> famliar with x86 than me could speak to the performance merits of this
> on modern multicore machines.
Those links describe using the LOCK prefix, which as the name implies,
asserts a lock, so it is no longer "lockless reference counting". The
LOCK prefix adds about 100 cycles to the instruction.
At least in the case of Haskell's "software transactional memory",
reads are genuinely lock free in the normal uncontended case. You use
LOCK XCHG to increment a counter in the object when you want to
update, and increment it again when you are done updating. Since the
counter starts at 0, any time it has an odd value, an update is in
process and any other thread can notice that the object is locked and
try again later without asserting any locks itself. If the counter
has an even value, it is unlocked, but there is a chance that a writer
thread can lock and udpate the object while a reader thread has a
lockless access in progress. The reader detects this has occurred
when it finishes its transaction, read the counter a second time, and
notices that the counter has been incremented (maybe more than once)
since the transaction started; it then rolls back its operation and
tries again. I'm not explaining that well so you can read about it in
SPJ's paper or Keir Fraser's.
> The second issue is the objects themselves, like a list which is
> mutable. If you're using it in a single thread or writing from
> multiple threads this is a non-trivial constant cost. If your object
> is not modified after creation and is read from many threads a lock
> would be a point of contention, preventing you from scaling freely.
> The dicts used by classes and globals are an import example of this,
> and a successful implementation needs something non-contending. I
> assume Jython and IronPython do this.
I'm pretty sure Jython makes no attempt at all to mess with ref
counts--it just relies on the underlying Java gc. I have no idea
about IronPython.
a) The contended case is the issue, not the uncontended case. An
uncontended lock is just constant overhead, not a barrier to
scalability
b) Locks on linux (since the switch to futexes) are pretty close to
free when uncontended.
Transactions are really for different properties:
a) read patterns can be uncontended (fully scalable)
b) a preempted writer does not block other writers, guaranteeing
forward progress
c) ad-hoc combination of several objects into a single atomic update
> > The second issue is the objects themselves, like a list which is
> > mutable. If you're using it in a single thread or writing from
> > multiple threads this is a non-trivial constant cost. If your object
> > is not modified after creation and is read from many threads a lock
> > would be a point of contention, preventing you from scaling freely.
> > The dicts used by classes and globals are an import example of this,
> > and a successful implementation needs something non-contending. I
> > assume Jython and IronPython do this.
>
> I'm pretty sure Jython makes no attempt at all to mess with ref
> counts--it just relies on the underlying Java gc. I have no idea
> about IronPython.
The second issue has *nothing* to do with refcounts. It is the
objects themselves. A classic example is trying to do "i += 1", which
breaks down into "i = i + 1", which isn't an atomic operation.
Java's ConcurrentHash map gets this right.
No, it doesn't assert a lock in the sense used in this thread. On modern
Intel systems it's normally handled completely within the cache.
>The LOCK prefix adds about 100 cycles to the instruction.
That's unavoidable. It's still faster than spin lock, which would also
have to use syncronizing instructions.
x86 (and pretty much any 8080 derivative, come to think of it).
That said, what you actually need is an atomic read-and-increment,
which is a lot harder to find. Even if you could find a platform
supporting it, it doesn't help you on other platforms you may need to
run on. Just do the locking properly and worry about optimisations
later.
--
Rhodri James *-* Wildebeeste Herder to the Masses
The locking is already done properly (assuming we are discussing CPython's
reference counting). Now is later. People are thinking about lots of
optimizations, this is just one of them.
It would not have occurred to me that "lock inc" increments "without
locking". I understand that's different from a lock value sitting in
the data object but I thought that "lock-free algorithm" meant one
that didn't assert any of these hardware locks either. Maybe I'm
wrong.
> Just do the locking properly and worry about optimisations later.
That has already been tried, and found to be unacceptably slow for the
purpose at hand. Now we're looking for the optimizations.
> "Rhodri James" <rho...@wildebst.demon.co.uk> writes:
>> > What cpu's do you know of that can atomically increment and decrement
>> > integers without locking?
>>
>> x86 (and pretty much any 8080 derivative, come to think of it).
>
> It would not have occurred to me that "lock inc" increments "without
> locking". I understand that's different from a lock value sitting in
> the data object but I thought that "lock-free algorithm" meant one
> that didn't assert any of these hardware locks either. Maybe I'm
> wrong.
I tend to live in single-core worlds, so "inc" on its lonesome works
just fine.
>> Just do the locking properly and worry about optimisations later.
>
> That has already been tried, and found to be unacceptably slow for the
> purpose at hand. Now we're looking for the optimizations.
In that case I'd second the suggestion of taking a long, hard look
at the Linux core locking and synchronisation primatives.
Right... I was wondering about that. Well, any kind of memory access
gets exclusive control of the bus except on NUMA, but I'm wondering
how CMPXCHG
http://en.wikipedia.org/wiki/Compare-and-swap
compares to XADD performance wise.
It seems to me that both of them must pull the old value across the
bus, hang onto the bus, and move the new value in. Maybe since XADD
needs to perform arithmetic there will be a few cycles lag between
getting the old value and pushing the new value? Maybe CMPXCHG doesn't
go through the ALU?
If the bus isn't just sitting idle and you can immediately push out
the new value then there's no real locking. Actually this article
explicitly mentions CMPXCHG as lock free.
http://en.wikipedia.org/wiki/Lock-free_and_wait-free_algorithms
In a single core world we wouldn't be having these endless angsty
conversations about eliminating the GIL.
> > That has already been tried, and found to be unacceptably slow for the
> > purpose at hand. Now we're looking for the optimizations.
>
> In that case I'd second the suggestion of taking a long, hard look
> at the Linux core locking and synchronisation primatives.
Do you understand what the issue is, about CPython's reference counts?
Do you have any idea how often the interpreter updates them? Even
using LOCK INCR (raw machine instruction, not portable C, no operating
system involvement), manipulating the ref counts would be around 100x
slower than it is now.
> "Rhodri James" <rho...@wildebst.demon.co.uk> writes:
>> In that case I'd second the suggestion of taking a long, hard look
>> at the Linux core locking and synchronisation primatives.
>
> Do you understand what the issue is, about CPython's reference counts?
> Do you have any idea how often the interpreter updates them? Even
> using LOCK INCR (raw machine instruction, not portable C, no operating
> system involvement), manipulating the ref counts would be around 100x
> slower than it is now.
You asked a question about CPUs with atomic update, strongly implying
there were none. All I did was supply a counter-example, and observe
that neither this nor the question were in fact helpful.
My inability to spell "primitive", on the other hand, is all my own
fault.
Well, more specifically, atomic update without locking, but I counted
the LOCK prefix as locking while other people didn't, and that caused
some confusion. Of course I'm aware of the LOCK prefix but it slows
down the instruction enormously compared with a non-locked instruction.
I'm curious about that. I've been looking around for timing
information on the lock signal, but am having some trouble finding
them. Intuitively, given that the processor is much faster than the
bus, and you are just waiting for processor to complete an addition or
comparison before put the new memory value on the bus, it seems like
there should be very little additional bus contention vs a normal add
instruction.
The bus is slow compared with the L1 cache. I just looked for figures
and couldn't find any either, but I remember seeing some for the Core
2 saying around 100 cycles, and something similar for the Athlon. I
just came across something saying the Core i7 is considerably better
than the Core 2 at this.
The real solution is to not use so much bus locking. Get rid of the
ref counts and use a real gc, if not in CPython then in PyPy.
I think in Python's case the reference count will often be in the
processor's local cache. (Presumably, if the current thread is going to
Py_DECREF the object it's been working with the object's state recently.)
The stuff I read suggested that simply locking the local cache would be
significantly faster than having to lock the memory bus.
--Scott David Daniels
Scott....@Acm.Org
Oh, right. *light bulb goes on* I wasn't thinking about cache at all.
In fact all it does simply talk to its cache. From the "Intel 64 and
IA-32 Architectures Software Developer's Manual, Volume 3A: System
Programming Guide, Part 1":
For the P6 and more recent processor families, if the area of
memory being locked during a LOCK operation is cached in the
processor that is performing the LOCK operation as write-back
memory and is completely contained in a cache line, the processor
may not assert the LOCK# signal on the bus. Instead, it will
modify the memory location internally and allow it's cache
coherency mechanism to insure that the operation is carried
out atomically. This operation is called "cache locking." The
cache coherency mechanism automatically prevents two or more
processors that have cached the same area of memory from
simultaneously modifying data in that area.
The same cache coherency mechanism that prevents ordinary "unlocked"
instructions from simulanteously modifying the same cache line on
two different processors also provides the guarantee with "locked"
instructions. There's no additional hardware locks involved, and no
additional communication required.
They share some, but also have some exclusive. How much of which
depends entirely on which CPU you have.
> The cache coherency mechanism automatically prevents two or
> more processors that have cached the same area of memory from
> simultaneously modifying data in that area.
>
> The same cache coherency mechanism that prevents ordinary "unlocked"
> instructions from simulanteously modifying the same cache line on
> two different processors also provides the guarantee with "locked"
> instructions. There's no additional hardware locks involved, and no
> additional communication required.
The cache coherency mechanism is what Scott described as
"communicat[ing] to other processors that it (and it alone) owns the
increment target". The cache coherency mechanism is not a trivial
thing at all. It introduces its own hazards and delays, and it is
getting more complicated all the time as processors and caches get
faster and larger. Some time ago, cpu's hit their megahertz limits
and that's why we're using multicores now. Some PL researchers think
cache coherency is going to be the next limit, and are advocating
languages like Erlang, which avoid use of shared memory and have
separate heaps per thread; or alternatively, approaches like the MS
Singularity research OS which relies on something like a linear type
system to statically ensure that a given object is accessible to only
one thread at a time. (That approach allows transferring objects
between threads with no locks or copying required).
I see, that clears up some mysteries. Thanks.
a1) Really what matters is the actual mix between contended and
uncontended accesses, and the synchronization strategy affects the
amount of contention. For example, the traditional locking strategy
involves acquiring a lock before reading the object, so two
simultaneous read-only accesses would create lock contention. With
STM, only updates acquire a lock, so multiple read-only threads can
access the object simultaneously with no contention.
a2) Constant factors matter!!! If using a different mechanism makes
Python 2x faster, that's a very powerful reason to favor the other
mechanism.
> b) Locks on linux (since the switch to futexes) are pretty close to
> free when uncontended.
I thought futexes use a traditional locked read-modify-write
instruction which is 50-100 cycles, cheap compared to a system call
but quite expensive compared with a normal 1-cycle non-locked
operation.
> > > The second issue is the objects themselves, like a list which is
> > > mutable. If you're using it in a single thread or writing from
> > > multiple threads this is a non-trivial constant cost. ...
> The second issue has *nothing* to do with refcounts. It is the
> objects themselves. A classic example is trying to do "i += 1", which
> breaks down into "i = i + 1", which isn't an atomic operation.
Oh, I see what you mean; yes, you're right, it's best to avoid doing
those sorts of operation in shared objects too frequently.
> Java's ConcurrentHash map gets this right.
I'll have to look that up, it sounds interesting. I've been staying
away from Java but a situation is coming up where I may have to (ugh)
start using it. Thanks.
Aye, but my point is really about the myth of lock-free algorithms
being uncontending — it's simply not true, and CAN'T be true. A write
is inherently a mutually exclusive operation. There's all sorts of
ways to avoid contending for reads, spread out the writes and have a
single thread coalesce them, etc, but fundamentally the write will
involve some mutual exclusion.
CPython's "primitive" storage management has a lot to do with the
simplicity of interfacing CPython with external libraries. Any solution
that proposes to get rid of the GIL needs to address that.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/
Weinberg's Second Law: If builders built buildings the way programmers wrote
programs, then the first woodpecker that came along would destroy civilization.
IIRC, it was Bruce Eckel I heard talking about discovering all kinds of
nasty Java thread bugs because cache coherency wasn't working the way the
Java developers thought it did.... This is apparently something very
difficult to get correct.
Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
>The cache coherency mechanism is what Scott described as
>"communicat[ing] to other processors that it (and it alone) owns the
>increment target". The cache coherency mechanism is not a trivial
>thing at all. It introduces its own hazards and delays, and it is
>getting more complicated all the time as processors and caches get
>faster and larger.
*sigh* Could you please just read what I wrote above? The LOCK prefix
makes NO DIFFERENCE to anything you mentioned. When current CPython
implementation increments a referernce count, it doesn't use the LOCK
prefix, but the cache line modified by the instruction still has to be
owned by the processor. If it did use the LOCK prefix there would be
no change in how how the cache coherency mechanism worked. There'd be
no additional complications, hazards or delays in how the processor
communicates with other processors.
I recently was on a long road trip, and was not driver, and with
nothing better to do thought quite a bit about how this.
I concluded that, aside from one major trap, it wouldn't really be
more difficult to inteface Python to external libraries, just
differently difficult. Here is briefly what I came up with:
1. Change the singular Python type into three metatypes:
immutable_type, mutable_type, and mutable_dict_type. (In the latter
case, the object itself is immutable but the dict can be modified.
This, of course, would be the default metaclass in Python.) Only
mutable_types would require a mutex when accessing.
2. API wouldn't have to change much. All regular API would assume
that objects are unlocked (if mutable) and in a consistent state.
It'll lock any mutable objects it needs to access. There would also
be a low-level API that assumes the objects are locked (if mutable)
and does not require objects to be consistent. I imagine most
extensions would call the standard API most of the time.
3. If you are going to use the low-level API on a mutable object, or
are going to access the object structure directly, you need to acquire
the object's mutex. Macros such as Py_LOCK(), Py_LOCK2(), Py_UNLOCK()
would be provided.
4. Objects would have to define a method, to be called by the GC, that
marks every object it references. This would be a lot like the
current tp_visit, except it has to be defined for any object that
references another object, not just objects that can participate in
cycles. (A conservative garbage collector wouldn't suffice for Python
because Python quite often allocates blocks but sets the pointer to an
offset within the block. In fact, that's true of almost any Python-
defined type.) Unfortunately, references on the stack would need to
be registered as well, so "PyObject* p;" might have to be replaced
with something like "Py_DECLARE_REF(PyObject,p);" which magically
registers it. Ugly.
5. Py_INCREF and Py_DECREF are gone.
6. GIL is gone.
So, you gain the complexity of a two-level API, having to lock mutable
objects sometimes, and defining more visitor methods than before, but
you don't have to keep INCREFs and DECREFs straight, which is no small
thing.
The major trap is the possibily of deadlock. To help minimize the
risk there would be macros to lock multiple objects at once. Py_LOCK2
(a,b), which guarantess that if in another thread is calling Py_LOCK2
(b,a) at the same time, it won't result in a deadlock. What's
disappointing is that the deadlocking possibility is always with you,
much like the reference counts are.
Carl Banks
This, I don't understand. Other languages like Lisp and Java and
Haskell have foreign function interfaces that easier to program than
Python's, -and- they don't use reference counts. There's usually some
primitive to protect objects from garbage collection while the foreign
function is using them, etc. The Java Native Interface (JNI) and the
Haskell FFI are pretty well documented. The Emacs Lisp system is not
too hard to figure out from examining the source code, etc.
IMO, locking of the object is a secondary problem. Python-safethread
provides one solution, but it's not the only conceivable one. For the
sake of discussion it's easier to assume somebody else is solving it
for you.
Instead, focus on just the garbage collection. What are the practical
issues of modifying CPython to use a tracing GC throughout? It
certainly is possible to write an exact GC in C, but the stack
manipulation would be hideous. It'd also require significant rewrites
of the entire code base. Throw on that the performance is unclear (it
could be far worse for a single-threaded program), with no
straightforward way to make it a compile-time option..
Got any ideas for that?
That assumption might be good for the sake of the discussion *you*
want to have, but it's not for discussion I was having, which was to
address Aahz's claim that GIL makes extension writing simple by
presenting a vision of what Python might be like if it had a mark-and-
sweep collector. The details of the GC are a small part of that and
wouldn't affect my main point even if they are quite different than I
described. Also, extension writers would have to worry about locking
issues here, so it's not acceptable to assume somebody else will solve
that problem.
> Instead, focus on just the garbage collection.
[snip rest of threadjack]
You can ignore most of what I was talking about and focus on
technicalities of garbage collection if you want to. I will not be
joining you in that discussion, however.
Carl Banks
You mean every time you access a list or dictionary or class instance,
you have to acquire a mutex? That sounds like a horrible slowdown.
Yes, and it's never going to happen in CPython any other way. It's
considered a bug if Python code can segfault the interpreter; all
runtime errors are supposed to raise exceptions. The only way to
ensure that won't happen is to make sure that only one thread can can
access the internals of a mutable object at a time.
BTW, class instances are usually immutable and thus don't require a
mutex in the system I described.
Carl Banks
Is this a variant of Godwin's Law for Python?
Definitely. It's a stable fixed point attractor. No matter how often
it was discussed to dead in the past months the likelihood that
someone mentions the GIL or ref-counting approaches 1. This is
particularly remarkable because it is inverse proportional to the
observable activity in this domain so there are really no news.
Other similarly strange phenomena: whenever Xah Lee posts one of his
infamous rants it attracts at least a dozen of newsgroup readers that
try to persuade each other not to respond which will inevitably grow
his thread and keep it alive for a long time.
Indeed it would, but hey, let's not let that stop us repeating the
thinking that's gone into CPython over the last fifteen years. "Those
who cannot remember the past are condemned to repeat it".
regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
Well, maybe, but it seems to me that the real issue here is that we
need to remove reference counting and the GIL.
--
Tim Rowe
Steve> Indeed it would, but hey, let's not let that stop us repeating
Steve> the thinking that's gone into CPython over the last fifteen
Steve> years. "Those who cannot remember the past are condemned to
Steve> repeat it".
Also, every object is mutable at some level. Tuples, ints and floats are
definitely mutable at creation time. You need to hold a mutex then, so
Carl's notion of three types of objects breaks down then.
Skip
> BTW, he said sufficiently sophisticated topic. Since there hasn't been an
> extremely long post here yet, I don't know if this counts.
Had I waited until the thread became long enough, somebody else would
have already raised the issue -- you must surely know about Schluehr's
Law? ;-)
--
Tim Rowe
Hopefully, at creation time, they will usually be in a scope where
other threads can't see them.
I'm sorry, you're right, I misunderstood your context.
> Unfortunately, references on the stack would need to be registered
> as well, so "PyObject* p;" might have to be replaced with something
> like "Py_DECLARE_REF(PyObject,p);" which magically registers it.
> Ugly.
Not only registered at the beginning of the function, but also (since
CPython uses C, not C++) explicitly unregistered at every point of
exit from the function. Emacs implements these as macros called GCPRO
and UNGCPRO, and they're very easy to get wrong. In a way, they are
even worse than the current Python INCREF/DECREF.
See description at, for example,
http://www.xemacs.org/Documentation/beta/html/internals_19.html#SEC78
immutable_type objects wouldn't exist at all until their
PyWhatever_New or their tp_new member is called. After that, the
reference exists only on the local stack, which is accessible only to
one thread. As long as you finish initializing the object while it's
still only on the stack, there is no possibility of a conflict.
What about tp_init, then, you ask? Well it's simple: immutable_type
doesn't call it. In fact, it requires that tp_init, tp_setattro,
tp_mapping->mp_setitem, etc., are all null.
immutable_obejcts have no instance dict, so if you want to create
attributes in Python you have to use slots. immutable_object.__new__
accepts keyword arguments and initializes the slots with the value.
class Record(immutable_object,slots=['name','number']):
def __new__(cls,name):
number = db.lookup_number(name)
immutable_object.__new__(cls,name=name,number=number)
Carl Banks
Perhaps you could trim your posts to quote only the relevant context?
Thanks.
Or you can argue that even when an argument is repeated indefinitely it
doesn't make it suddenly right.
--
mph
Guido will give up reference counting and the consequent immediate gc
when but only when *someone* presents an implementation that does and
and that runs faster. Given the current ferment in dynamic language
implementation, he just today wrote on pydev list "CPython's
implementation strategy *will* evolve". Whether that means R.C. goes
away or not remains to be seen.
> Or you can argue that even when an argument is repeated indefinitely it
> doesn't make it suddenly right.
No, but it makes for a confirmation of Schluehr's law :-)
--
Tim Rowe
Then you are describing a language radically different from Python.
--
--Bryan
Bzzt.
Hint: aside from the reference count, most class instances are
immutable in Python *today*.
Carl Banks
That one threw me for a minute too, but I think the idea is that the
class instance itself is immutable, while its slots (specifically the
attribute dictionary) point to mutable objects.
That seems so utterly wrong that either you're an idiot or you're talking
at cross purposes to what Bryan and I think you're saying. Since I know
you're not an idiot, I can only imagine you have a different
understanding of what it means to be immutable than I do.
For example... is this instance immutable?
class Foo:
bar = None
f = Foo()
f.baz = True
If so, what do you mean by immutable?
--
Steven
If I understand Carl, yes, f is immutable. When you set f.bar, the
contents of f.__dict__ changes but f itself does not change. It still
points to the same dictionary, etc.
The meaning of 'immutable' is well-established in the Python literature.
Python's immutable types include tuple, frozenset, and various kinds of
numbers and strings. Class instances, not so much.
What's more, this matters when considering a GIL-less implementation.
Typical method calls can traverse lots of mutable stuff just to find the
function to invoke.
--
--Bryan
Correct, and, getting back to the point, an instance itself would not
require a mutex. The dict would need it, of course.
It's customary to gloss over this technicality for convenience's sake
in most discussions, but it matters in this case.
Carl Banks
But we are talking about objects as they live in the C implementation,
not at the level where Python code deals with them.
Of course class instances aren't immutable types: they're not even
types. Let me suggest that there is a distinction between an
immutable type and an immutable object.
Immutable types are what you are talking about: it means that the type
provides usable mutator methods. (Whether they mutate the object
itself or some associated object doesn't matter.) Immutable objects
are a different thing: it means the object cannot change in memory.
Classes in Python are mutable types, usually. Class instances are
(except for the refcount) immutable objects, usually.
We usually talk about mutability of types, but mutability of objects
is appropriate for discussion as well. So I can't really agree with
your assessment that I wrong to call class instances immutable objects
aside from refcounts.
BTW, here's a minor brain bender: immutable types are mutable objects.
> What's more, this matters when considering a GIL-less implementation.
> Typical method calls can traverse lots of mutable stuff just to find the
> function to invoke.
Now that doesn't make sense at all. What is all this mutable stuff
you have to go through, and what does it have to do with the GIL-less
implementation? Can you explain further? Or are you just saying
it'll be slow.
Carl Banks
foo.bar() has to look up bar in foo's attribute dictionary.
>> That one threw me for a minute too, but I think the idea is that the
>> class instance itself is immutable, while its slots (specifically the
>> attribute dictionary) point to mutable objects.
>
> Correct, and, getting back to the point, an instance itself would not
> require a mutex. The dict would need it, of course.
The dict is part of the object and some important slots are mutable.
What's more, if your point was to do away with the GIL without changing
Python semantics nor requiring heaping masses of locking, I fear you've
not fully grasped the problem.
Languages such as Java, C++, and C# do not require nearly as much
locking as Python because they are not nearly as dynamic. Consider how a
method is invoked. Java / C++ / C# can always resolve the method with no
locking; the data they need is fixed at link time. Python is much more
dynamic. A demo:
from __future__ import print_function
# A simple class hierarchy:
class Foo (object):
title = "Mr. Foo"
def identify(self):
print("I'm called", self.title)
class Bar (Foo):
title = "Ms. Bar"
class Jafo (Bar):
title = "Major Jafo"
dude = Jafo()
# Searches 5 dicts to find the function to call:
dude.identify()
# Class dicts are mutable:
def id(self):
print("I'm still called", self.title)
Jafo.identify = id
dude.identify()
# An object's class can change:
dude.__class__ = Bar
dude.identify()
# A class's base classes can change:
class Fu (object):
def identify(self):
print("Call me", self.title)
Bar.__bases__ = (Fu,)
dude.identify()
Result:
>>>
I'm called Major Jafo
I'm still called Major Jafo
I'm called Ms. Bar
Call me Ms. Bar
>>>
In that first simple call of dude.identify(), Python looked up "dude" in
the module's (mutable) dict to find the object. Then it looked in
object's (mutable) dict, and did not find "identify". So it looked at
the object's (mutable) __class__ slot, and in that class's (mutable)
dict. It still did not find "identify", so it looked in the class's
(mutable) __bases__ slot, following Python's depth-first "object
protocol" and thus looking in what other (mutable) class dicts and
(mutable) __bases__ slots were required.
An object's __dict__ slot is *not* mutable; thus we could gain some
efficiency by protecting the object and its dict with the same lock. I
do not see a major win in Mr. Banks' point that we do not need to lock
the object, just its dict.
--
--Bryan
If the dict contents don't change often, maybe we could use an
STM-like approach to eliminate locks when reading. That would of
course require rework to just about every C function that accesses
Python objects.
If that's what you think I thought, I fear you haven't read anything
I've written.
[snip]
> An object's __dict__ slot is *not* mutable; thus we could gain some
> efficiency by protecting the object and its dict with the same lock. I
> do not see a major win in Mr. Banks' point that we do not need to lock
> the object, just its dict.
I'm not sure where you got the idea that I was claiming this was a
major win. I'm not sure where you got the idea that I claimed that
having to lock all mutable objects wouldn't be slow. For Pete's sake,
you followed up to a post where I *agreed* that it would be slow.
Carl Banks
>>>> Then you are describing a language radically different from Python.
>>> That one threw me for a minute too, but I think the idea is that the
>>> class instance itself is immutable, while its slots (specifically the
>>> attribute dictionary) point to mutable objects.
>> The meaning of 'immutable' is well-established in the Python literature.
>> Python's immutable types include tuple, frozenset, and various kinds of
>> numbers and strings. Class instances, not so much.
>
> Of course class instances aren't immutable types: they're not even
> types.
Class instances my or may not be types, but that has nothing to do with
any point at issue here. I'm saying that class instances are usually,
mutable, contrary to your claim, "class instances are usually immutable".
> Let me suggest that there is a distinction between an
> immutable type and an immutable object.
Let me further suggest that Python's documentation is entirely clear:
instances of immutable types are immutable objects. Instances of mutable
types are generally mutable objects. For example, tuple is an immutable
type, and thus tuples are immutable; list is a mutable type, and thus
lists are mutable.
> Immutable types are what you are talking about: it means that the type
> provides usable mutator methods. (Whether they mutate the object
> itself or some associated object doesn't matter.) Immutable objects
> are a different thing: it means the object cannot change in memory.
>
> Classes in Python are mutable types, usually. Class instances are
> (except for the refcount) immutable objects, usually.
There's where we disagree. I assert that class instances are usually
mutable objects.
> We usually talk about mutability of types, but mutability of objects
> is appropriate for discussion as well. So I can't really agree with
> your assessment that I wrong to call class instances immutable objects
> aside from refcounts.
That confusion disappears once one grasps that instances of immutable
types are immutable objects.
> BTW, here's a minor brain bender: immutable types are mutable objects.
Some brains are too easily bent. Python is one of the many
object-oriented languages that reifies types as run-time objects. I see
no point in going through Python's immutable types to examine if there
is any way to mutate the corresponding type objects.
>> What's more, this matters when considering a GIL-less implementation.
>> Typical method calls can traverse lots of mutable stuff just to find the
>> function to invoke.
>
> Now that doesn't make sense at all. What is all this mutable stuff
> you have to go through, and what does it have to do with the GIL-less
> implementation? Can you explain further? Or are you just saying
> it'll be slow.
I elaborated at some length in another strand of this thread.
--
--Bryan
Nope, you're dead wrong, nothing more to it. The bits of a class
instance never change. The __dict__ is a mutable object. The class
instance itself isn't. It's not reasonable to call an object whose
bits can't change a mutable obect.
Anyway, all you're doing is distracting attention from my claim that
instance objects wouldn't need to be locked. They wouldn't, no matter
how mutable you insist these objects whose bits would never change
are.
> > BTW, here's a minor brain bender: immutable types are mutable objects.
>
> Some brains are too easily bent.
[Snip attempt to take this comment seriously]
And some brains are so stodgy they can't even take a lighthearted
comment lightheartedly.
Carl Banks
> On Jan 23, 11:45 pm, Bryan Olson <fakeaddr...@nowhere.org> wrote:
>> Carl Banks wrote:
>> > Classes in Python are mutable types, usually. Class instances are
>> > (except for the refcount) immutable objects, usually.
>>
>> There's where we disagree. I assert that class instances are usually
>> mutable objects.
>
> Nope, you're dead wrong, nothing more to it. The bits of a class
> instance never change. The __dict__ is a mutable object. The class
> instance itself isn't. It's not reasonable to call an object whose
> bits can't change a mutable obect.
The "bits" of class instances can very well change.
>>> class X(object): pass
...
>>> x = X()
>>> d = x.__dict__
>>> x.__dict__ = {}
>>> map(id, [d, x.__dict__])
[170329876, 170330012]
The Python cookbook even describes patterns that depend on this
operation working. Class instance's contents can also change if
__slots__ is in use, when its __class__ is assigned to (admittedly the
latter being a rare operation, but still).
> Anyway, all you're doing is distracting attention from my claim that
> instance objects wouldn't need to be locked. They wouldn't, no
> matter how mutable you insist these objects whose bits would never
> change are.
Only if you're not implementing Python, but another language that
doesn't support __slots__ and assignment to instance.__dict__.
Me too, I don't get what you mean. Consider a list instance, it contains a
count of allocated elements, and a pointer to some memory block. They
change when the list is resized. This counts as "mutable" to me. I really
don't understand your claim.
--
Gabriel Genellina
That's a fairly natural style in Lisp implementation and it is not
that difficult to code in. I've hacked inside Emacs and have written
another interpreter with a similar setup; it's certainly easier than
keeping track of refcounts in my experience. For one thing, you can
raise exceptions anywhere you want, and the stack unwind can clean up
the gc protection, but it can't know nearly as easily which refcounts
to adjust, unless you record all the increfs the same way as the
GCPROs.
Sometimes I start to find this eternal pedantry a little tedious. I
suspect it's time I once more dropped out of c.l.py for a while.
regards
Steve
> Or you can argue that even when an argument is repeated indefinitely it
> doesn't make it suddenly right.
This is no good.
It's a well known fact that anything I tell you three times is true.
To demonstrate:
Tim Rowe's post earlier in this thread was the funniest one in a long time.
Tim Rowe's post earlier in this thread was the funniest one in a long time.
Tim Rowe's post earlier in this thread was the funniest one in a long time.
see - it's true!
- Hendrik
Yeah, yeah, I know that, and in the bickering that ensued some aspects
of the original context were lost. I should really not have been
pulled into Bryan's strawman over the definition of immutable, since
it's just a label, I oughtn't give a damn what it's called, I only
care what it does. I didn't handle this repartee very well.
Anyway, it goes back to the original vision for a mark-and-sweep
Python language as I presented what seems like a long time ago.
I presented the type system that had three base metatypes instead of
the one base metatype we have now: immutable_type, mutable_type, and
mutable_dict_type. The default metatype for Python classes would be
mutable_dict_type, which is a type wherein the object itself would be
mutable but it would still have all the mutator methods __init__,
__setattr__, etc., but they could only act on the __dict__.
mutable_dict_types would not be allowed to define any slots, and
__dict__ wouldn't be reassignable. (However, it seems reasonable to
allow the base tp_new to accept a dict argument.)
OTOTH, list's metatype would be mutable_type, so the type object
itself would be mutable.
Bryan claimed that that would be a very different language from
Python, apparently because it hadn't occurred to him that by-and-
large, the instance itself doesn't change, only the dict does.
Perhaps Bryan was thinking of __dict__'s reassignability (that
certainly didn't occur to me); if he was I apologize for my snideness.
HAVING SAID THAT, I still still say what I proposed would not be a
radically different language from Python. A little different, of
course. Much slower, almost certainly.
Carl Banks
I am only going to say all Python types prior to 3.0 support classes
without __slots__, so while I agree that this would be a different
language, it wouldn't necessarily be "not Python".
(Python, of course, is what GvR says Python is, and he isn't going to
say that the language I presented is Python. No worries there! :)
I'm only saying that it is conceivably similar enough to be a
different version of Python. It would be a different language in the
same way that Python 2.6 is a different language from Python 3.0.)
Incidentally, the proposal does allow slots to be defined, but only
for actual mutable types, not for ordinary class instances.
Carl Banks
I made a mistake, and I don't want to risk confusion at this point.
"all Python ***versions** prior to 3.0"
and I am talking about old-style classes, of course. Prior to 2.2 no
classes at all supported slots.
Carl Banks
Not wanting to risk confusion.
"The default metatype for Python classes would be mutable_dict_type,
which is a type wherein the object itself would be ***immutable*** but
it would still have all the mutator methods __init__, __setattr__,
etc., but they could only act on the __dict__."
Carl Banks
I'm a fan of lock-free data structure and software transactional memory,
but I'm also a realist. Heck, I'm one of this group's outspoken
advocates of threaded architectures. Theoretical breakthroughs will
happen, but in real world of today, threads are great but GIL-less
Python is a loser.
Wherever Python is going, let's recognize that a scripting language that
rocks is better than any other kind of language that sucks.
--
--Bryan