The future of "frozen" types as the number of CPU cores increases

John Nagle

unread,

Feb 16, 2010, 3:15:05 PM2/16/10

to

In the beginning, Python had some types which were "frozen", and some which
weren't. Now, there's a trend towards having both "frozen" and "unfrozen"
versions of built-in types. We now have "frozen" and "unfrozen" sets,
dictionaries, and byte arrays. It's becoming clear that that the concept of
"frozen" is separate from the type involved.

Maybe "frozen" should be generalized, so that all "unfrozen" objects also
have "frozen" versions.

I'd suggest

p = freeze(q)

which would generate a "frozen" copy of q. (For lists, this would
return a tuple. For unfrozen sets, a frozen set. Etc.)

p = deepfreeze(q)

would generate "frozen" deep copy of q, for which each object it references
is also a "frozen" copy.

For objects,

class c(frozenobject) :
def __init__(self, someparam) :
self.p = someparam
...

would result in a class which returns frozen objects from the constructor.

In concurrent programs, frozen objects can be shared without locking.
This simplifies locking considerably. This may provide a way to get out
from the Global Interpreter Lock. One possible implementation would be
to have unfrozen objects managed by reference counting and locking as
in CPython. Frozen objects would live in a different memory space and be
garbage collected by a concurrent garbage collector.

If we add "synchronized" objects with built-in locking, like Java,
threading becomes much cleaner.

class d(synchronized) :
...

A typical "synchronized" object would be a queue. Anything with
state shared across threads has to be synchronized. Only one
thread at a time can be active within a synchronized object,
due to a built-in implicit lock.

Semantics of synchronized objects:

"Synchronized" objects cannot be frozen. Trying to freeze
them just returns the object.

If a thread blocks on an explicit "wait", the synchronized
object is temporarily unlocked during the "wait". This
allows threads to wait for, say, a queue to get another
entry from another thread.

If only synchronized objects and frozen objects can be shared across
threads, the GIL becomes unnecessary. Big performance win on multicore
CPUs.

"Synchronized" was a pain in Java because Java doesn't have "frozen"
objects. Too much effort went into synchronizing little stuff. But
in a language with frozen objects, the little stuff would be frozen,
and wouldn't need its own locks.

Threaded programs that already use queue objects to communicate with
each other are almost ready for this kind of architecture now. If
the queue class made a frozen copy of any unfrozen, unsynchronized
copy placed on a queue, conversion to this approach could be almost
automatic.

What would probably happen in practice is that potential race conditions in
existing programs would raise "unshareable object" exceptions,
indicating that something needed to be synchronized. That's a good thing.
This approach makes threading much easier for the typical programmer.
Instead of race conditions and random errors, you get error messages.

And we get rid of the GIL.

I look at this as Python's answer to multicore CPUs and "Go".

John Nagle

Terry Reedy

unread,

Feb 16, 2010, 5:57:28 PM2/16/10

to pytho...@python.org

On 2/16/2010 3:15 PM, John Nagle wrote:
> In the beginning, Python had some types which were "frozen",
> and some which weren't.

In the beginning, there was only one 'frozen' general purpose collection
type, the tuple. And Guido resisted the suggestion that lists and tuple
were mutable and frozen versions of each other.

However, things have changed, and lists and tuple *are* effectively
mutable and hashable versions of each other, and we have the three other
pairs you mentioned. The idea of freeze() may have been floated (and
punctured) during Py3 discussions, but I think it a fairly good one.

Terry Jan Reedy

John Nagle

unread,

Feb 17, 2010, 12:09:27 AM2/17/10

to

Yes, we're now at the point where all the built-in mutable types
have "frozen" versions. But we don't have that for objects. It's
generally considered a good thing in language design to offer,
for user defined types, most of the functionality of built-in ones.

It's the concurrency aspect of this that interests me, though.
A language with immutable objects can potentially handle concurrency
more safely than one where everything is potentially mutable.
The language knows what can't change, which simplifies locking.

John Nagle

Paul Rubin

unread,

Feb 17, 2010, 12:52:04 AM2/17/10

to

John Nagle <na...@animats.com> writes:
>> However, things have changed, and lists and tuple *are* effectively

>> mutable and hashable versions of each other...

> It's the concurrency aspect of this that interests me, though.
> A language with immutable objects can potentially handle concurrency
> more safely than one where everything is potentially mutable.
> The language knows what can't change, which simplifies locking.

I wonder how well that applies to tuples containing mutable objects,
e.g. t = ([1,2], [3,4])

Alf P. Steinbach

unread,

Feb 17, 2010, 1:17:25 AM2/17/10

to

* Paul Rubin:

One would need a collection type that's immutable "all the way".

Current frozen types don't have that.

However, there's also the question of the immutability of what names refer to.
It seems to me that this means that the compiler can't really assume very much
without very deep analysis. And so, if it boils down to the programmer applying
assumptions about mutability/constantness, then types may not be The Answer.

Cheers,

- Alf

Steven D'Aprano

unread,

Feb 17, 2010, 2:35:49 AM2/17/10

to

On Tue, 16 Feb 2010 21:09:27 -0800, John Nagle wrote:

> Yes, we're now at the point where all the built-in mutable types
> have "frozen" versions. But we don't have that for objects. It's
> generally considered a good thing in language design to offer, for user
> defined types, most of the functionality of built-in ones.

It's not hard to build immutable user-defined types. Whether they're
immutable enough is another story :)

>>> class FrozenMeta(type):
... def __new__(meta, classname, bases, classDict):
... def __setattr__(*args):
... raise TypeError("can't change immutable class")
... classDict['__setattr__'] = __setattr__
... classDict['__delattr__'] = __setattr__
... return type.__new__(meta, classname, bases, classDict)
...
>>>
>>> class Thingy(object):
... __metaclass__ = FrozenMeta
... def __init__(self, x):
... self.__dict__['x'] = x
...
>>>
>>>
>>> t = Thingy(45)
>>> t.x
45
>>> t.x = 42
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in __setattr__
TypeError: can't change immutable class

It's a bit ad hoc, but it seems to work for me. Unfortunately there's no
way to change __dict__ to a "write once, read many" dict.

--
Steven

John Nagle

unread,

Feb 17, 2010, 12:26:46 PM2/17/10

to

Daniel Stutzbach wrote:

> On Tue, Feb 16, 2010 at 3:15 PM, John Nagle <na...@animats.com> wrote:
>
>> One possible implementation would be
>> to have unfrozen objects managed by reference counting and locking as
>> in CPython. Frozen objects would live in a different memory space and be
>> garbage collected by a concurrent garbage collector.
>>
>

> A concurrent garbage collector for frozen object would still need to walk
> unfrozen objects, to find references to frozen objects.

Right. But the reverse is not the case. Reference count updating
wouldn't need to look at frozen object space.

One way to implement locking is something like this:

Mutable objects have a reference count field, a lock field,
and an "owner" field. Initially, the owner of an object is its thread.
If an object's only reference is a field of a synchronized object, the
owner is the synchronized object. Mutable objects with an owner can
be manipulated by that owner without locking.

When an object reference is passed to another thread or
another synchronized object, the object becomes multi-owner and
the "owner" field is set to null. Thereafter, the object must
be locked during updates.

The headache with this is that when an object becomes multi-owner,
so does everything it points to. So there's a transient as the
system runs down the references, locking objects momentarily and
clearing owner fields.

John Nagle

sjde...@yahoo.com

unread,

Feb 17, 2010, 5:17:47 PM2/17/10

to

On Feb 17, 2:35 am, Steven D'Aprano

Which makes it not really immutable, as does the relative ease of
using a normal setattr:

... t.__dict__['x'] = "foo"
... print t.x
foo
... object.__setattr__(t, "x", 42)
... print t.x
42

John Nagle

unread,

Feb 18, 2010, 2:58:32 PM2/18/10

to

John Nagle wrote:

> I look at this as Python's answer to multicore CPUs and "Go".

On that note, I went to a talk at Stanford yesterday by one of the
designers of Intel's Nelahem core. The four-core, eight thread
version is out now. The six-core, twelve thread version is working;
the speaker has one in his lab. The eight-core, sixteen thread version
is some months away. This isn't an expensive CPU; this is Intel's
"converged" mainstream product. (Although there will be a whole range
of "economy" and "performance" versions, all with the same core but
with some stuff turned off.)

Python isn't ready for this. Not with the GIL.

Multiple processes are not the answer. That means loading multiple
copies of the same code into different areas of memory. The cache
miss rate goes up accordingly.

John Nagle

Ethan Furman

unread,

Feb 18, 2010, 4:19:16 PM2/18/10

to pytho...@python.org

Will the new GIL in 3.2 make this workable? It would still be one
thread at a time, though, wouldn't it.

~Ethan~

Chris Rebert

unread,

Feb 18, 2010, 5:11:24 PM2/18/10

to John Nagle, pytho...@python.org

On Thu, Feb 18, 2010 at 11:58 AM, John Nagle <na...@animats.com> wrote:
<snip>

> On that note, I went to a talk at Stanford yesterday by one of the
> designers of Intel's Nelahem core. The four-core, eight thread
> version is out now. The six-core, twelve thread version is working;
> the speaker has one in his lab. The eight-core, sixteen thread version
> is some months away. This isn't an expensive CPU; this is Intel's
> "converged" mainstream product. (Although there will be a whole range
> of "economy" and "performance" versions, all with the same core but
> with some stuff turned off.)
>
> Python isn't ready for this. Not with the GIL.

Is any language, save perhaps Erlang, really ready for it?

Cheers,
Chris
--
http://blog.rebertia.com

Steven D'Aprano

unread,

Feb 18, 2010, 5:12:02 PM2/18/10

to

On Thu, 18 Feb 2010 11:58:32 -0800, John Nagle wrote:

> John Nagle wrote:
>
>> I look at this as Python's answer to multicore CPUs and "Go".
>
> On that note, I went to a talk at Stanford yesterday by one of the
> designers of Intel's Nelahem core. The four-core, eight thread version

> is out now. [...]

>
> Python isn't ready for this. Not with the GIL.

Pardon me, but Python is perfectly ready for this. Why aren't you using
Jython or IronPython, if the GIL is such a drag on your use-case?

--
Steven

Rhodri James

unread,

Feb 18, 2010, 7:12:10 PM2/18/10

to

occam :-)

--
Rhodri James *-* Wildebeeste Herder to the Masses

Gregory Ewing

unread,

Feb 19, 2010, 12:23:30 AM2/19/10

to

John Nagle wrote:

> One way to implement locking is something like this:
>
> Mutable objects have a reference count field, a lock field,
> and an "owner" field. Initially, the owner of an object is its thread.
> If an object's only reference is a field of a synchronized object, the
> owner is the synchronized object.

The trouble with that is that there will hardly ever be
"only one reference" to any object that you do anything
with, even just looking at it, because temporary references
come and go as the interpreter goes about its business.

A simple demonstration of this:

Python 2.5 (r25:51908, Apr 8 2007, 22:22:18)
[GCC 3.3 20030304 (Apple Computer, Inc. build 1809)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> foo = (1, 2, 3)
>>> import sys
>>> sys.getrefcount(foo)
2

Even though you might think that there is only
one reference to foo, the very act of passing it
as a parameter to getrefcount() causes another
reference to come into existence temporarily.

--
Greg

Tim Roberts

unread,

Feb 20, 2010, 7:49:57 PM2/20/10

to

Chris Rebert <cl...@rebertia.com> wrote:

>On Thu, Feb 18, 2010 at 11:58 AM, John Nagle <na...@animats.com> wrote:
>>

>> � Python isn't ready for this. �Not with the GIL.

>
>Is any language, save perhaps Erlang, really ready for it?

F# is. I only wish the syntax was a little less Perl-like. Too many
special characters.
--
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.

sjde...@yahoo.com

unread,

Feb 20, 2010, 9:00:38 PM2/20/10

to

On Feb 18, 2:58 pm, John Nagle <na...@animats.com> wrote:
> Multiple processes are not the answer. That means loading multiple
> copies of the same code into different areas of memory. The cache
> miss rate goes up accordingly.

A decent OS will use copy-on-write with forked processes, which should
carry through to the cache for the code.

John Nagle

unread,

Feb 20, 2010, 9:58:42 PM2/20/10

to

That doesn't help much if you're using the subprocess module. The
C code of the interpreter is shared, but all the code generated from
Python is not.

This will get even worse when Unladen Swallow starts generating vast
swaths of unshared x86 instructions.

John Nagle

Paul Rubin

unread,

Feb 20, 2010, 10:46:33 PM2/20/10

to

John Nagle <na...@animats.com> writes:
>> A decent OS will use copy-on-write with forked processes, which should
>> carry through to the cache for the code.
>
> That doesn't help much if you're using the subprocess module. The
> C code of the interpreter is shared, but all the code generated from
> Python is not.

Emacs in days of yore used a hack called "unexec" to bring its Lisp code
into the pure text segment. I think it's not used any more because
machines are now big enough that the benefits aren't that great.
Basically in the Emacs build process, you'd start up the C portion of
Emacs, then tell it to load all its Lisp code, then call "unexec".
Unexec basically performed a core dump of the process, then ran
something that manipulated the original Emacs image and the core dump
into a new executable that had the static Lisp code and data now marked
as part of the (immutable) program area. Unexec necessarily had
platform dependencies, but this was managed with a bunch of ifdefs in a
reasonably clean and maintainable way. I could imagine doing something
similar with a large Python program.

sjde...@yahoo.com

unread,

Feb 21, 2010, 2:45:16 AM2/21/10

to

On Feb 20, 9:58 pm, John Nagle <na...@animats.com> wrote:

> sjdevn...@yahoo.com wrote:
> > On Feb 18, 2:58 pm, John Nagle <na...@animats.com> wrote:
> >> Multiple processes are not the answer. That means loading multiple
> >> copies of the same code into different areas of memory. The cache
> >> miss rate goes up accordingly.
>
> > A decent OS will use copy-on-write with forked processes, which should
> > carry through to the cache for the code.
>
> That doesn't help much if you're using the subprocess module. The
> C code of the interpreter is shared, but all the code generated from
> Python is not.

Of course. Multithreading also fails miserably if the threads all try
to call exec() or the equivalent.

It works fine if you use os.fork().

Paul Boddie

unread,

Feb 21, 2010, 1:28:46 PM2/21/10

to

True, but the principal issue with CPython and copy-on-write is the
reference counting. Consider a large list shared with the child
processes which is to be processed (either in part or in its entirety)
by each of them. As soon as the child processes start referencing each
of the elements, the reference count for each element object will need
to be changed and pages will start to be copied for each child
process. That's why John wants to avoid the reference counting of
shared data and why the Unladen Swallow project has apparently
considered storing reference counts in separate regions of memory.

Such shared data is really "owned" by the parent process, so it makes
little sense for the child processes to manage it with a view to
collecting it later. After all, such data should have no cost to each
of the child processes (except, perhaps, for the size of the address
space referenced by each process, and any resource issues arising from
managing that).

Paul

John Nagle

unread,

Feb 22, 2010, 9:24:04 PM2/22/10

to

Forking in multithreaded programs is iffy. What happens depends
on the platform, and it's usually not what you wanted to happen.
Solaris before version 10 forks all threads, and both new processes
have all the threads running. POSIX semantics are to fork only the
thread making the request. The problem is that this leaves the other
Python threads in the new process with Python state, locks and reference
counts set, but no OS thread to run them. See

"http://bugs.python.org/issue1683"
"http://bugs.python.org/issue874900"
"http://bugs.python.org/issue6721"
"http://bugs.python.org/issue7242"

There are at least two open bug reports in this area.

If you fork and "exec", which discards the state of Python in the child
process, there's less trouble. But if you're actually forking a
Python environment and running it, you're going to have at least a
memory leak, and probably further difficulties.

John Nagle

sjde...@yahoo.com

unread,

Feb 23, 2010, 1:27:54 AM2/23/10

to

On Feb 22, 9:24 pm, John Nagle <na...@animats.com> wrote:
> sjdevn...@yahoo.com wrote:
> > On Feb 20, 9:58 pm, John Nagle <na...@animats.com> wrote:
> >> sjdevn...@yahoo.com wrote:
> >>> On Feb 18, 2:58 pm, John Nagle <na...@animats.com> wrote:
> >>>> Multiple processes are not the answer. That means loading multiple
> >>>> copies of the same code into different areas of memory. The cache
> >>>> miss rate goes up accordingly.
> >>> A decent OS will use copy-on-write with forked processes, which should
> >>> carry through to the cache for the code.
> >> That doesn't help much if you're using the subprocess module. The
> >> C code of the interpreter is shared, but all the code generated from
> >> Python is not.
>
> > Of course. Multithreading also fails miserably if the threads all try
> > to call exec() or the equivalent.
>
> > It works fine if you use os.fork().
>
> Forking in multithreaded programs is iffy. What happens depends
> on the platform, and it's usually not what you wanted to happen.

Well, yeah. And threading in multiprocess apps is iffy. In the real
world, though, multiprocessing is much more likely to result in a
decent app than multithreading--and if you're not skilled at either,
starting with multiprocessing is by far the smarter way to begin.

Basically, multiprocessing is always hard--but it's less hard to start
without shared everything. Going with the special case (sharing
everything, aka threading) is by far the stupider and more complex way
to approach multiprocssing.

And really, for real-world apps, it's much, much more likely that
fork() will be sufficient than that you'll need to explore the
vagueries of a multithreaded solution. Protected memory rocks, and in
real life it's probably 95% of the time where threads are only even
considered if the OS can't fork() and otherwise use processes well.

sjde...@yahoo.com

unread,

Feb 23, 2010, 1:31:12 AM2/23/10

to

On Feb 22, 9:24 pm, John Nagle <na...@animats.com> wrote:
> sjdevn...@yahoo.com wrote:
> > On Feb 20, 9:58 pm, John Nagle <na...@animats.com> wrote:
> >> sjdevn...@yahoo.com wrote:
> >>> On Feb 18, 2:58 pm, John Nagle <na...@animats.com> wrote:
> >>>> Multiple processes are not the answer. That means loading multiple
> >>>> copies of the same code into different areas of memory. The cache
> >>>> miss rate goes up accordingly.
> >>> A decent OS will use copy-on-write with forked processes, which should
> >>> carry through to the cache for the code.
> >> That doesn't help much if you're using the subprocess module. The
> >> C code of the interpreter is shared, but all the code generated from
> >> Python is not.
>
> > Of course. Multithreading also fails miserably if the threads all try
> > to call exec() or the equivalent.
>
> > It works fine if you use os.fork().
>
> Forking in multithreaded programs is iffy.

One more thing: the above statement ("forking in multithreaded
programs is iffy"), is absolutely true, but it's also completely
meaningless in modern multiprocessing programs--it's like saying
"gotos in structured programs are iffy". That's true, but it also has
almost no bearing on decently constructed modern programs.

mk

unread,

Feb 23, 2010, 10:38:28 AM2/23/10

to pytho...@python.org

What about just using subprocess module to run system commands in worker
threads? Is this likely to have problems?

Regards,
mk

John Nagle

unread,

Feb 23, 2010, 1:35:38 PM2/23/10

to

The discussion above was about using "fork" to avoid duplicating the
entire Python environment for each subprocess. If you use the subprocess
module, you load a new program, so you don't get much memory sharing.
This increases the number of cache misses, which is a major bottleneck
in many-CPU systems with shared caches.

The issue being discussed was scaling Python for CPUs with many cores.
With Intel shipping 4 cores/8 hyperthread CPUs, the 6/12 part working,
and the 8/16 part coming along, this is more than a theoretical
issue.

John Nagle

Nobody

unread,

Feb 23, 2010, 8:03:23 PM2/23/10

to

On Mon, 22 Feb 2010 22:27:54 -0800, sjde...@yahoo.com wrote:

> Basically, multiprocessing is always hard--but it's less hard to start
> without shared everything. Going with the special case (sharing
> everything, aka threading) is by far the stupider and more complex way
> to approach multiprocssing.

Multi-threading hardly shares anything (only dynamically-allocated
and global data), while everything else (the entire stack) is per-thread.

Yeah, I'm being facetious. Slightly. If you're going to write
multi-threaded code, it's a good idea to wean yourself off of using global
variables and shared mutable state.

Steven D'Aprano

unread,

Feb 23, 2010, 8:30:17 PM2/23/10

to

On Tue, 23 Feb 2010 10:35:38 -0800, John Nagle wrote:

> The issue being discussed was scaling Python for CPUs with many cores.
> With Intel shipping 4 cores/8 hyperthread CPUs, the 6/12 part working,
> and the 8/16 part coming along, this is more than a theoretical issue.

Have you tried parallelpython?

http://www.parallelpython.com/

--
Steven

sste...@gmail.com

unread,

Feb 23, 2010, 8:53:37 PM2/23/10

to Steven D'Aprano, pytho...@python.org

I had not, have you used this successfully?

Thanks,

S

Steven D'Aprano

unread,

Feb 23, 2010, 9:21:41 PM2/23/10

to

On Tue, 23 Feb 2010 20:53:37 -0500, sste...@gmail.com wrote:

>> Have you tried parallelpython?
>>
>> http://www.parallelpython.com/
>
> I had not, have you used this successfully?

Not personally, no.

--
Steven

sjde...@yahoo.com

unread,

Feb 24, 2010, 1:31:59 AM2/24/10

to

On Feb 23, 8:03 pm, Nobody <nob...@nowhere.com> wrote:

> On Mon, 22 Feb 2010 22:27:54 -0800, sjdevn...@yahoo.com wrote:
> > Basically, multiprocessing is always hard--but it's less hard to start
> > without shared everything. Going with the special case (sharing
> > everything, aka threading) is by far the stupider and more complex way
> > to approach multiprocssing.
>
> Multi-threading hardly shares anything (only dynamically-allocated
> and global data), while everything else (the entire stack) is per-thread.
>
> Yeah, I'm being facetious. Slightly.

I'm afraid I may be missing the facetiousness here.

The only definitional difference between threads and processes is that
threads share memory, while processes don't.

There are often other important practical implementation details, but
sharing memory vs not sharing memory is the key theoretical
distinction between threads and processes. On most platforms, whether
or not you want to share memory (and abandon memory protection wrt the
rest of the program) is the key factor a programmer should consider
when deciding between threads and processes--the only time that's not
true is when the implementation forces ancillary details upon you.