Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

The future of Python immutability

120 views
Skip to first unread message

John Nagle

unread,
Sep 3, 2009, 2:03:13 PM9/3/09
to
Python's concept of immutability is useful, but it could be more
general.

In the beginning, strings, tuples, and numbers were immutable, and
everything else was mutable. That was simple enough. But over time,
Python has acquired more immutable types - immutable sets and immutable
byte arrays. Each of these is a special case.

Python doesn't have immutable objects as a general concept, but
it may be headed in that direction. There was some fooling around
with an "immmutability API" associated with NumPy back in 2007, but
that was removed. As more immutable types are added, a more general
approach may be useful.

Suppose, for discussion purposes, we had general "immutable objects".
Objects inherited from "immutableobject" instead of "object" would be
unchangeable once "__init__" had returned. Where does this take us?

Immutability is interesting for threaded programs, because
immutable objects can be shared without risk. Consider a programming
model where objects shared between threads must be either immutable or
"synchronized" in the sense that Java uses the term. Such programs
are free of most race conditions, without much programmer effort to
make them so.

Java "synchronized" turned out to be a headache partly because trying to
figure out how to lock all the "little stuff" being passed around
a headache. But Java doesn't have immutable objects. Python does,
and that can be exploited to make thread-based programming cleaner.

The general idea is that synchronized objects would have built in
locks which would lock at entry to any function of the object and
unlock at exit. The lock would also unlock at explicit waits. A
"Queue" object would be a good example of a synchronized object.

With this mechanism, multi-thread programs with shared data
structures can be written with little or no explicit locking by
the programmer. If the restrictions are made a bit stricter,
strict enough that threads cannot share mutable unsynchronized data,
removal of the "global interpreter lock" is potentially possible.
This is a route to improved performance on modern multi-core CPUs.

John Nagle

Nigel Rantor

unread,
Sep 3, 2009, 3:07:21 PM9/3/09
to John Nagle, pytho...@python.org

John Nagle wrote:
> Python's concept of immutability is useful, but it could be more
> general.
>
> In the beginning, strings, tuples, and numbers were immutable, and
> everything else was mutable. That was simple enough. But over time,
> Python has acquired more immutable types - immutable sets and immutable
> byte arrays. Each of these is a special case.
>
> Python doesn't have immutable objects as a general concept, but
> it may be headed in that direction. There was some fooling around
> with an "immmutability API" associated with NumPy back in 2007, but
> that was removed. As more immutable types are added, a more general
> approach may be useful.
>
> Suppose, for discussion purposes, we had general "immutable objects".
> Objects inherited from "immutableobject" instead of "object" would be
> unchangeable once "__init__" had returned. Where does this take us?
>
> Immutability is interesting for threaded programs, because
> immutable objects can be shared without risk. Consider a programming
> model where objects shared between threads must be either immutable or
> "synchronized" in the sense that Java uses the term.

Yes, this is one of the reasons I am currently learning Haskell, I am
not yet anywhwere near proficient but the reason I am looking into FP is
because of some of the claims of the FP community, particularly Erlang,
regarding the benefits of pure FP with respect to multi-threading.

It's a shame this post came right now since I'm not really up-to-speed
enough with Haskell to comment on it with repsect to multi-threading.

<context>
I program Perl, Java and C++ for my day job, I've spent a lot of time
making multithreaded programs work correctly and have even experienced
the POE on a large project. So my comments below are based on experience
of these languages.
</context>

> Such programs are free of most race conditions, without much
> programmer effort to make them so.

I disagree. They are not free of most race conditions, and it still
takes a lot of effort. Where did you get this idea from? Have you been
reading some Java primer that attempts to make it sound easy?

> Java "synchronized" turned out to be a headache partly because
> trying to
> figure out how to lock all the "little stuff" being passed around
> a headache. But Java doesn't have immutable objects. Python does,
> and that can be exploited to make thread-based programming cleaner.

This is nothing to do with Java, any multithreaded language that has
mutable shared state has exactly the same problems. Can we talk about
threading rather than Java please? Additionally Java provides a lot more
than monitors (synchronized) for controlling multiple threads.

Java does have immutable objects. Strings in Java are immutable for
example. As are the object-based numeric types, Bytes, Characters etc.

There are lots and lots of immutable types in Java and you can make your
own by creating a class with no mutator methods and declaring it "final".

> The general idea is that synchronized objects would have built in
> locks which would lock at entry to any function of the object and
> unlock at exit. The lock would also unlock at explicit waits. A
> "Queue" object would be a good example of a synchronized object.
>
> With this mechanism, multi-thread programs with shared data
> structures can be written with little or no explicit locking by
> the programmer. If the restrictions are made a bit stricter,
> strict enough that threads cannot share mutable unsynchronized data,
> removal of the "global interpreter lock" is potentially possible.
> This is a route to improved performance on modern multi-core CPUs.

Right, this is where I would love to have had more experience with Haksell.

Yes, as soon as you get to a situation where no thread can access shared
state that is mutable your problems go away, you're also getting no work
done becasue the threads, whilst they may be performing lots of
interesting calculations have no way of allowing the rest of the
program, or operating system, know about it.

You can, today, in any language that provides threads, make any number
of threaded programs that do not contain any race conditions, it's just
that most of them are terribly dull and uninteresting.

I'd love for someone from the Haskell/Erlang/<other> pure FP community
provide some canonical example of how this is acheived in pure FP. I'll
get there soon but I'm not going to skip ahead in my reading, I'm still
trying to learn the basics.

So, in response to your point of trying to get an immutable API so that
Python can easily have multi-threaded programs that do not present race
conditions I would say the following:

That is not the challenge, that's the easy part. The challenge is
getting useful information out of a system that has only been fed
immutable objects.

Regards,

Nigel

Stefan Behnel

unread,
Sep 3, 2009, 3:21:58 PM9/3/09
to
Nigel Rantor wrote:

>
> John Nagle wrote:
>> Immutability is interesting for threaded programs, because
>> immutable objects can be shared without risk. Consider a programming
>> model where objects shared between threads must be either immutable or
>> "synchronized" in the sense that Java uses the term.
>> Such programs are free of most race conditions, without much
>> programmer effort to make them so.
>
> I disagree. They are not free of most race conditions, and it still
> takes a lot of effort. Where did you get this idea from? Have you been
> reading some Java primer that attempts to make it sound easy?

Read again what he wrote. In a language with only immutable data types
(which doesn't mean that you can't efficiently create modified versions of
a data container), avoiding race conditions is trivial. The most well known
example is clearly Erlang. Adding "synchronised" data structures to that
will not make writing race conditions much easier.

Stefan

Nigel Rantor

unread,
Sep 3, 2009, 3:30:40 PM9/3/09
to Stefan Behnel, pytho...@python.org

My comment you quoted was talking about Java and the use of
synchronized. I fthat was unclear I apologise.

Please feel free to read the entirety of my post before replying.

n

Stefan Behnel

unread,
Sep 3, 2009, 3:37:03 PM9/3/09
to
Nigel Rantor wrote:
> My comment you quoted was talking about Java and the use of
> synchronized. I fthat was unclear I apologise.

Well, it was clear. But it was also unrelated to what the OP wrote. He was
talking about the semantics of "synchronized" in Java, not the use.

Stefan

Stefan Behnel

unread,
Sep 3, 2009, 3:45:54 PM9/3/09
to
John Nagle wrote:
> With this mechanism, multi-thread programs with shared data
> structures can be written with little or no explicit locking by
> the programmer. If the restrictions are made a bit stricter,
> strict enough that threads cannot share mutable unsynchronized data,
> removal of the "global interpreter lock" is potentially possible.
> This is a route to improved performance on modern multi-core CPUs.

The main problem with this is that the existing data structures are, well,
there. You can't replace them without breaking all existing Python code,
i.e. without basically inventing a new language. If that's required for
removing the GIL, I doubt that it will ever be done.

Stefan

Paul Rubin

unread,
Sep 3, 2009, 6:07:51 PM9/3/09
to
Stefan Behnel <stef...@behnel.de> writes:
> Read again what he wrote. In a language with only immutable data types
> (which doesn't mean that you can't efficiently create modified versions of
> a data container), avoiding race conditions is trivial. The most well known
> example is clearly Erlang. Adding "synchronised" data structures to that
> will not make writing race conditions much easier.

Nonetheless, Erlang is subject to all kinds of traditional threading
problems such as deadlocks. Haskell's use of software transactional
memory may(?) avoid some of the problems but at a performance cost.

Gabriel Genellina

unread,
Sep 3, 2009, 6:38:05 PM9/3/09
to pytho...@python.org
En Thu, 03 Sep 2009 15:03:13 -0300, John Nagle <na...@animats.com>
escribi�:

> Python's concept of immutability is useful, but it could be more
> general.
>

> Immutability is interesting for threaded programs, because
> immutable objects can be shared without risk. Consider a programming
> model where objects shared between threads must be either immutable or
> "synchronized" in the sense that Java uses the term. Such programs
> are free of most race conditions, without much programmer effort to
> make them so.

In the current CPython implementation, every object has a reference count,
even immutable ones. This must be a writable field - and here you have
your race condition, even for immutable objects.

--
Gabriel Genellina

John Nagle

unread,
Sep 4, 2009, 12:20:55 AM9/4/09
to

That's an implementation problem with CPython. I'm looking at this as
a language design issue. Shed Skin, which is garbage-collected, doesn't
have that problem. Look ahead to a new generation of Python implementations
that go fast and use multiprocessors effectively.

John Nagle

r

unread,
Sep 4, 2009, 3:22:08 AM9/4/09
to
"""The future of Python immutability"""

Define future:
The future is a time period commonly understood to contain all
events that have yet to occur. It is the opposite of the past, and is
the time after the present

Define immutability:
Not subject or susceptible to change. In object-oriented and
functional programming, an immutable object is an object whose state
cannot be modified after it is created. This is in contrast to a
mutable object, which can be modified after it is created.

hmm, applying this logic i'd have to say about the same really.

;-)

Francesco Bochicchio

unread,
Sep 4, 2009, 3:40:45 AM9/4/09
to
On Sep 3, 9:07 pm, Nigel Rantor <wig...@wiggly.org> wrote:
>
> Right, this is where I would love to have had more experience with Haksell.
>
> Yes, as soon as you get to a situation where no thread can access shared
> state that is mutable your problems go away, you're also getting no work
> done becasue the threads, whilst they may be performing lots of
> interesting calculations have no way of allowing the rest of the
> program, or operating system, know about it.
>

Threads could communicate only with channels, message queue, or
equivalent. Is what
I do that as much as I can, to avoid the headache of sharing data
between threads. It is
less efficient than the shared data model and adds latency, but ensure
that each thread
is self-contained, making for safer programming and opening the way to
better parallelization.
AFAIK erlang Processes and scala Actors implement a similar model at
language level.

In python, there is kamaelia that implements a similar paradigm,
although it is more concerned
with logical parallelization than with multitheading performance
issue.

I believe this kind of paradigms will bring us to the massive
multicore world easier than FP.
Consider that most FP languages have accepted a compromise and become
'unpure' (i.e. have
constructs to change variable in place). Even haskell, the only pure
FP
language I know (sort of), has things like mutable arrays.
All these features expose current FP languages at the same 'shared
resource' risks of imperative one,
although admittedly at a less level. And FP languages have their own
crop of problems - e.g how to deal
efficiently with deep recursion levels, parameters passed by copy,
huge list built in memory (if you use eager evaluation)
or build-up of thunks (if you use lazy evaluation).

But then, I'm just a programmer, so I could be easily wrong :-)

Ciao
-----
FB


Ulrich Eckhardt

unread,
Sep 4, 2009, 4:26:09 AM9/4/09
to
Nigel Rantor wrote:
> John Nagle wrote:
>> Immutability is interesting for threaded programs, because
>> immutable objects can be shared without risk. Consider a programming
>> model where objects shared between threads must be either immutable or
>> "synchronized" in the sense that Java uses the term.
>
> Yes, this is one of the reasons I am currently learning Haskell, I am
> not yet anywhwere near proficient but the reason I am looking into FP is
> because of some of the claims of the FP community, particularly Erlang,
> regarding the benefits of pure FP with respect to multi-threading.

Actually, I wouldn't say that FP itself changes anything there. However, FP
usually (correct me if I'm wrong) implies immutability of objects, i.e. no
variable assignment.


> > Such programs are free of most race conditions, without much
> > programmer effort to make them so.
>
> I disagree. They are not free of most race conditions, and it still
> takes a lot of effort. Where did you get this idea from? Have you been
> reading some Java primer that attempts to make it sound easy?

[...]


>> With this mechanism, multi-thread programs with shared data
>> structures can be written with little or no explicit locking by
>> the programmer. If the restrictions are made a bit stricter,
>> strict enough that threads cannot share mutable unsynchronized data,
>> removal of the "global interpreter lock" is potentially possible.
>> This is a route to improved performance on modern multi-core CPUs.
>
> Right, this is where I would love to have had more experience with
> Haksell.
>
> Yes, as soon as you get to a situation where no thread can access shared
> state that is mutable your problems go away, you're also getting no work
> done becasue the threads, whilst they may be performing lots of
> interesting calculations have no way of allowing the rest of the
> program, or operating system, know about it.

I think it is the combination of immutable objects and message passing
(which he omitted mentioning) that is the point. I initially encountered
this principle in Erlang, but it also applies in other languages. For
example in C++, you can create MT-program using std::auto_ptr<T> for
mutable, exclusively owned objects and boost::shared_pointer<T const> for
shared, immutable objects.


> So, in response to your point of trying to get an immutable API so that
> Python can easily have multi-threaded programs that do not present race
> conditions I would say the following:
>
> That is not the challenge, that's the easy part. The challenge is
> getting useful information out of a system that has only been fed
> immutable objects.

You're overly pessimistic. Immutable objects are a tool, not the holy grail.
Having them available can help you solve some problems, not all of them.
You obviously still need mutable data, but if you can restrict access to
them to just one thread, there is no need for synchronisation for them.
Lastly, for the message passing, you also need shared, mutable structures
(queues), so you can't live completely without conventional locking.

Uli

--
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

Paul Rubin

unread,
Sep 4, 2009, 4:41:34 AM9/4/09
to
Ulrich Eckhardt <eckh...@satorlaser.com> writes:
> Lastly, for the message passing, you also need shared, mutable structures
> (queues), so you can't live completely without conventional locking.

But that can be completely behind the scenes in the language or
library implementation. The application programmer doesn't have to
think about those locks.

Hendrik van Rooyen

unread,
Sep 4, 2009, 6:58:05 AM9/4/09
to pytho...@python.org
On Thursday 03 September 2009 21:07:21 Nigel Rantor wrote:

> That is not the challenge, that's the easy part. The challenge is
> getting useful information out of a system that has only been fed
> immutable objects.

Is it really that difficult? (completely speculative):

class MyAnswer(object):
def __init__(self)
self.finished = False
self.Answer = None
self.collected = False

Then when you start a thread, you give it an instance:

ans = MyAnswer()
list_of_answers.append(c)

worker_thread = thread.start_new_thread(worker, (ans, immutable_stuff ))

def worker(ans):
ans.Answer = do_some_stuff()
ans.finished = True

Then to see if everybody is finished:

runbool = True
While runbool:
finished = False
runbool = False
for answer in list_of_answers:
if answer.finished and not answer.collected:
do_something(answer.Answer)
answer.collected = True
else:
runbool = True

This scheme gives only one thread the license to make a change to the answer
instance, so how can it go wrong?

You can also extend it for long running threads by playing ping pong with the
two booleans - the collector sets the collected boolean and clears the
finished, and the worker must clear the collected boolean before starting a
new cycle. ( the worker waits for not finished, then clears collected)

I do similar stuff in assembler and I have no problems - Am I missing
something subtle?

Of course - working with immutable objects only is a bit like working with
read only memory, or worse, write only memory - not easy.

- Hendrik


Adam Skutt

unread,
Sep 4, 2009, 9:36:59 AM9/4/09
to
On Sep 3, 2:03 pm, John Nagle <na...@animats.com> wrote:
>      Suppose, for discussion purposes, we had general "immutable objects".
> Objects inherited from "immutableobject" instead of "object" would be
> unchangeable once "__init__" had returned.  Where does this take us?

You can create this in various ways through metaclasses. I've done
it, mostly because I was curious to see how hard it would be and if it
actually gained me anything useful.

>      With this mechanism, multi-thread programs with shared data
> structures can be written with little or no explicit locking by
> the programmer.  If the restrictions are made a bit stricter,
> strict enough that threads cannot share mutable unsynchronized data,
> removal of the "global interpreter lock" is potentially possible.
> This is a route to improved performance on modern multi-core CPUs.

Nope, preventing mutation of the objects themselves is not enough.
You also have to forbid variables from being rebound (pointed at
another object). Consider this simple example:

---------- Thread 1 ---------- | ---------- Thread 2 ----------
a = "Foo"
spawn Thread 2
a = "Bar" print "thread 2: %s" % a
print "thread 1: %s" % a

You could see (ignoring the fact the output isn't ordered):
"thread 1: Bar"
"thread 2: Foo"
or:
"thread 1: Bar"
"thread 2: Bar"

so the fact "Foo" and "Bar" are immutable isn't enough to solve the
problem. The variables themselves, since they obey pointer semantics,
must also be forbidden from being reseated (i.e., they must be
references in the C++ sense or become 'T const * const' pointers).

Thanks,
Adam

Scott David Daniels

unread,
Sep 4, 2009, 11:04:25 AM9/4/09
to
John Nagle wrote:
> ... Suppose, for discussion purposes, we had general "immutable objects".

> Objects inherited from "immutableobject" instead of "object" would be
> unchangeable once "__init__" had returned. Where does this take us?

Traditionally in Python we make that, "once __new__ had returned."

--Scott David Daniels
Scott....@Acm.Org

sturlamolden

unread,
Sep 4, 2009, 9:55:27 PM9/4/09
to
On 4 Sep, 06:20, John Nagle <na...@animats.com> wrote:

> > In the current CPython implementation, every object has a reference
> > count, even immutable ones. This must be a writable field - and here you
> > have your race condition, even for immutable objects.
>
>     That's an implementation problem with CPython.  I'm looking at this as
> a language design issue.  Shed Skin, which is garbage-collected, doesn't
> have that problem.  Look ahead to a new generation of Python implementations
> that go fast and use multiprocessors effectively.

But in that case, the problem is reference counting, not mutable
objects. If you get rid of reference counts you get rid of the GIL. At
the same time you introduce far worse problems: Memory use will
increase (garbage pile up), long pauses while garbage are collected
(very bad for servers), deallocator methods of extension types not
called when you expect them to. Java's VM may be faster than CPython's
VM, but it runs less smooth.


sturlamolden

unread,
Sep 4, 2009, 10:23:06 PM9/4/09
to
On 3 Sep, 20:03, John Nagle <na...@animats.com> wrote:

>      Python doesn't have immutable objects as a general concept, but
> it may be headed in that direction.  There was some fooling around
> with an "immmutability API" associated with NumPy back in 2007, but
> that was removed.  As more immutable types are added, a more general
> approach may be useful.

I one did a test of NumPy's mutable arrays against Matlab's immutable
arrays on D4 wavelet transforms. On an array of 64 MB of double
precision floats, the Python/NumPy version was faster by an order of
magnitude. On the other hand, immutable arrays does make
multithreading easier. They are particularly interesting for GPGPUs
(OpenCL/CUDA) where multithreading is pervasive. Also they allow
removal of temporary arrays, which are created by NumPy's binary
operators.


Steven D'Aprano

unread,
Sep 4, 2009, 11:12:38 PM9/4/09
to
On Fri, 04 Sep 2009 19:23:06 -0700, sturlamolden wrote:

> I one did a test of NumPy's mutable arrays against Matlab's immutable
> arrays on D4 wavelet transforms. On an array of 64 MB of double
> precision floats, the Python/NumPy version was faster by an order of
> magnitude.

Is the difference because of mutability versus immutability, or because
of C code in Numpy versus Matlab code? Are you comparing bananas and
pears?

Without knowing what the test consisted of, and what it actually
measures, this comparison is meaningless.


--
Steven

sturlamolden

unread,
Sep 4, 2009, 11:48:13 PM9/4/09
to
On 5 Sep, 05:12, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:

> Is the difference because of mutability versus immutability, or because
> of C code in Numpy versus Matlab code? Are you comparing bananas and
> pears?


It consisted of something like this


import numpy

def D4_Transform(x, s1=None, d1=None, d2=None):
C1 = 1.7320508075688772
C2 = 0.4330127018922193
C3 = -0.066987298107780702
C4 = 0.51763809020504137
C5 = 1.9318516525781364
if d1 == None:
d1 = numpy.zeros(x.size/2)
s1 = numpy.zeros(x.size/2)
d2 = numpy.zeros(x.size/2)
odd = x[1::2]
even = x[:-1:2]
d1[:] = odd[:] - C1*even[:]
s1[:-1] = even[:-1] + C2*d1[:-1] + C3*d1[1:]
s1[-1] = even[-1] + C2*d1[-1] + C3*d1[0]
d2[0] = d1[0] + s1[-1]
d2[1:] = d1[1:] + s1[:-1]
even[:] = C5 * s1[:]
odd[:] = C4 * d2[:]
if x.size > 2:
D4_Transform(even,s1[0:even.size/2],
d1[0:even.size/2],d2[0:even.size/2])


against Matlab:

function x = D4_Transform(x)
C1 = 1.7320508075688772;
C2 = 0.4330127018922193;
C3 = -0.066987298107780702;
C4 = 0.51763809020504137;
C5 = 1.9318516525781364;
s1 = zeros(ceil(size(x)/2));
d1 = zeros(ceil(size(x)/2));
d2 = zeros(ceil(size(x)/2));
odd = x(2:2:end);
even = x(1:2:end-1);
d1(:) = odd - C1*even;
s1(1:end-1) = even(1:end-1) + C2*d1(1:end-1) + C3*d1(2:end);
s1(end) = even(end) + C2*d1(end) + C3*d1(1);
d2(1) = d1(1) + s1(end);
d2(2:end) = d1(2:end) + s1(1:end-1);
x(1:2:end-1) = C5*s1;
x(2:2:end) = C4*d2;
if (length(x) > 2)
x(1:2:end-1) = D4_Transform(x(1:2:end-1));
end


I wasn't comparing bananas against pears. Mathworks informed me they
were using my code to investigate why Matlab was showing such slow-
downs. I am not sure what they found out, eventially. I have also
wondered if immutability vs. mutability was the reason, as NumPy
generates temporary arrays. But I have not found a better explanation
either. Anyhow, ~30 seconds for Matlab vs. ~3 seconds for Python is a
major difference. (After I did this test, Matlab has acquired a JIT
compiler which might change the timings. I haven't tested as I don't
have access to it.)


Sturla Molden

Steven D'Aprano

unread,
Sep 5, 2009, 12:06:18 AM9/5/09
to
On Fri, 04 Sep 2009 06:36:59 -0700, Adam Skutt wrote:

> Nope, preventing mutation of the objects themselves is not enough. You
> also have to forbid variables from being rebound (pointed at another
> object). Consider this simple example:
>
> ---------- Thread 1 ---------- | ---------- Thread 2 ----------
> a = "Foo"
> spawn Thread 2
> a = "Bar" print "thread 2: %s" % a
> print "thread 1: %s" % a
>
> You could see (ignoring the fact the output isn't ordered):
> "thread 1: Bar"
> "thread 2: Foo"
> or:
> "thread 1: Bar"
> "thread 2: Bar"
>
> so the fact "Foo" and "Bar" are immutable isn't enough to solve the
> problem.

This is a side-effect of writing code that relies on global variables.
Global variables are generally a bad idea. Global constants are fine.

> The variables themselves, since they obey pointer semantics,

What do you mean by "variables"? Do you mean names?

What are pointer semantics?

> must also be forbidden from being reseated (i.e., they must be
> references in the C++ sense or become 'T const * const' pointers).

Assuming you mean names must be forbidden from being rebound, no,
incorrect. It's only names which are shared between both threads which
must not be re-bound. If you have names local to the thread, you can
change them all you want without affecting any other thread.

--
Steven

Steven D'Aprano

unread,
Sep 5, 2009, 1:04:44 AM9/5/09
to
On Fri, 04 Sep 2009 20:48:13 -0700, sturlamolden wrote:

> > Is the difference because of mutability versus immutability, or
> > because of C code in Numpy versus Matlab code? Are you comparing
> > bananas and pears?
>
> It consisted of something like this


Your code does a lot of unnecessary work if you're just trying to
demonstrate immutability is faster or slower than mutability. A simple
test that just adds one to each element would have much less overhead.

In any case, there's no doubt that immutable objects require extra time
to create compared to re-using an existing mutable object, and that time
is probably O(N) (until you approach the limits of available contiguous
memory, in which case you could see O(N**2) or worse behaviour). But an
order of magnitude difference?


> I wasn't comparing bananas against pears. Mathworks informed me they
> were using my code to investigate why Matlab was showing such slow-
> downs. I am not sure what they found out, eventially. I have also
> wondered if immutability vs. mutability was the reason, as NumPy
> generates temporary arrays. But I have not found a better explanation
> either. Anyhow, ~30 seconds for Matlab vs. ~3 seconds for Python is a
> major difference.

How does Matlab speed compare to Python in general? Ruby, for example, is
an order of magnitude slower than Python (at least it was last time I
looked), not because of immutable arrays, but just because of the speed
of the language.

--
Steven

sturlamolden

unread,
Sep 5, 2009, 1:27:17 AM9/5/09
to
On 5 Sep, 07:04, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:

> Your code does a lot of unnecessary work if you're just trying to
> demonstrate immutability is faster or slower than mutability.

No ... I was trying to compute D4 wavelet transforms. I wanted to see
how NumPy compared with Matlab.


> How does Matlab speed compare to Python in general?

Matlab is an example of a language that only has immutable types.
While it works well if you only play with small arrays, it is horrible
for complex data structures.

Also consider that appending to an array in Matlab always becomes O
(n**2), as the array cannot mutate like Python lists. Thus it is
impossible to amortize to linear complexity.


sturlamolden

unread,
Sep 5, 2009, 1:30:44 AM9/5/09
to
On 5 Sep, 07:04, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:

> How does Matlab speed compare to Python in general?

Speed-wise Matlab is slower, but it is not the interpreter limiting
the speed here.

Steven D'Aprano

unread,
Sep 5, 2009, 2:47:49 AM9/5/09
to

How do you know?


--
Steven

sturlamolden

unread,
Sep 5, 2009, 7:41:08 AM9/5/09
to
On 5 Sep, 08:47, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:

> How do you know?

After more than 10 years experience with scientific programming I just
do. When it comes to numerics I have a gut feeling for what is fast
and what is slow.

It's not difficult actually. You just have to imagine how often the
interpreter is invoked. Explicit loops are therefore evilness in
numerical scripts. Vectorized array expressions tend to be comparable
to C in performance, because they do a lot of work independent of the
interpreter. What do we see here? Vectorized array expressions
throughout the code, no significant amount of looping by the
interpreter (just the recursion at the end).

Adam Skutt

unread,
Sep 5, 2009, 7:57:21 AM9/5/09
to
On Sep 5, 12:06 am, Steven D'Aprano <st...@REMOVE-THIS-

cybersource.com.au> wrote:
> On Fri, 04 Sep 2009 06:36:59 -0700, Adam Skutt wrote:
> > Nope, preventing mutation of the objects themselves is not enough. You
> > also have to forbid variables from being rebound (pointed at another
> > object).  Consider this simple example:
>
> > ---------- Thread 1 ---------- | ---------- Thread 2 ----------
> > a = "Foo"
> > spawn Thread 2
> > a = "Bar"                        print "thread 2: %s" % a
> > print "thread 1: %s" % a
>
> > You could see (ignoring the fact the output isn't ordered):
> > "thread 1: Bar"
> > "thread 2: Foo"
> > or:
> > "thread 1: Bar"
> > "thread 2: Bar"
>
> > so the fact "Foo" and "Bar" are immutable isn't enough to solve the
> > problem.
>
> This is a side-effect of writing code that relies on global variables.
> Global variables are generally a bad idea. Global constants are fine.

Nope, the variables don't have to be global to have this problem, they
just have to be shared:

>>> a = 3
>>> b = lambda x: x + a
>>> print b(3)
6
>>> a = 4
>>> print b(3)
7

Passing any lambda between threads will cause the problem I described
above, regardless of whether the variables captured are global or
not.

> What do you mean by "variables"? Do you mean names?

In the case of python I mean the name and the value, since all
variables in Python are pointers. (Worrying about the difference
though, is semantics)
>
> What are pointer semantics?
Assignment to the variable causes it to point to another object (as
opposed to assigning a new value to the current object, like a C++
reference) and copying the variable does not create a copy of the
referred object (which necessarily implies their lifecycles are
independent).

> Assuming you mean names must be forbidden from being rebound, no,
> incorrect. It's only names which are shared between both threads which
> must not be re-bound. If you have names local to the thread, you can
> change them all you want without affecting any other thread.

What does it matter, seeing as Python lacks the ability altogether?

Thanks,
Adam

Terry Reedy

unread,
Sep 5, 2009, 11:29:46 AM9/5/09
to pytho...@python.org
Adam Skutt wrote:
\

>> This is a side-effect of writing code that relies on global variables.
>> Global variables are generally a bad idea. Global constants are fine.
>
> Nope, the variables don't have to be global to have this problem, they
> just have to be shared:
>
> >>> a = 3
> >>> b = lambda x: x + a

This is a pointless replacement for 'def b(x): return x+a'

> >>> print b(3)
> 6
> >>> a = 4
> >>> print b(3)
> 7
>
> Passing any lambda

Python does not have lambda objects. It has lambda expressions that
produce function objects identical except for .__name__ to the
equivalent def statement output.

tjr

Adam Skutt

unread,
Sep 5, 2009, 5:09:57 PM9/5/09
to
On Sep 5, 11:29 am, Terry Reedy <tjre...@udel.edu> wrote:
>
> This is a pointless replacement for 'def b(x): return x+a'

And? That has nothing to do with anything I was saying whatsoever.
Point is: any mutable shared state is bad, and making objects
immutable isn't enough to remove all shared state, or even reduce it
to be available only with TEH EBIL global variables.

> Python does not have lambda objects. It has lambda expressions that
> produce function objects identical except for .__name__ to the
> equivalent def statement output.

Sure sounds like python has lambda objects to me then... the fact
they're a special case of some more general construct is mostly
semantics, /especially/ in the context of the point I was actually
making, no?

Steven D'Aprano

unread,
Sep 5, 2009, 7:38:29 PM9/5/09
to
On Sat, 05 Sep 2009 14:09:57 -0700, Adam Skutt wrote:

>> Python does not have lambda objects. It has lambda expressions that
>> produce function objects identical except for .__name__ to the
>> equivalent def statement output.
>
> Sure sounds like python has lambda objects to me then... the fact
> they're a special case of some more general construct is mostly
> semantics, /especially/ in the context of the point I was actually
> making, no?

No. Lambdas are a *syntactical construct*, not an object. You wouldn't
talk about "while objects" and "if objects" and "comment objects"
*because they're not objects*. Neither are lambdas -- they're syntax,
which creates ordinary functions:

>>> def f(x):
... return x
...
>>> g = lambda x: x
>>> type(f) is type(g)
True


Functions created with def and functions created with lambda are
*precisely* the same type of object. There is no such thing as a "lambda
object" which is a "special case" of ordinary functions, there are just
functions.

--
Steven

Steven D'Aprano

unread,
Sep 5, 2009, 7:48:03 PM9/5/09
to
On Sat, 05 Sep 2009 04:57:21 -0700, Adam Skutt wrote:

>> > so the fact "Foo" and "Bar" are immutable isn't enough to solve the
>> > problem.
>>
>> This is a side-effect of writing code that relies on global variables.
>> Global variables are generally a bad idea. Global constants are fine.
>
> Nope, the variables don't have to be global to have this problem, they
> just have to be shared:

Yes, you're right, my bad. Globals are shared, but not all shared
variables are global.


>> What do you mean by "variables"? Do you mean names?
>
> In the case of python I mean the name and the value, since all variables
> in Python are pointers.

Let me see if I understand you...

You say that by "variable" you mean the name and the value.

Then you say that all variables are pointers.

In other words... "all names and values in Python are pointers".

Either you're not explaining yourself correctly, or you're badly confused
about Python. In Python, names are keys in namespaces, and values are
objects. Possibly some Python implementations use pointers in some way to
implement namespaces, or objects, but Python the language doesn't have
pointers, and the Python virtual machine that executes Python byte code
doesn't have pointers either.

> (Worrying about the difference though, is semantics)

Semantics are important. If we can't agree on what things mean, how can
we possibly communicate?

>> What are pointer semantics?
>
> Assignment to the variable causes it to point to another object (as
> opposed to assigning a new value to the current object, like a C++
> reference) and copying the variable does not create a copy of the
> referred object (which necessarily implies their lifecycles are
> independent).

I can *guess* what you mean by all that, but it's just a guess. You say
"assignment to the variable", but earlier you said that to you, variables
are names and values, so I'm left wondering if you mean assignment to the
name, assignment to the value, or both. Likewise for copying the variable.

Here is my guess... you're pointing out the similarities between Python
name binding and C pointers:

* when you bind a name to a new object, you cause the name be associated
with the new object rather than mutating the existing object to become
equal to the new object;

* assigning two names to a single object causes both names to be
associated with the same object, rather than each name being associated
with independent copies of the object;

while ignoring the differences between Python name binding and C pointers:

* Python names are keys, not numeric addresses, hence you can't perform
anything like pointer arithmetic on them;

* objects in Python have no idea what, if any, names are associated with
them, unlike C pointers, where you can always ask for the address of a
variable.

Okay, I accept that if you focus only on the similarities and completely
ignore the differences, Python name binding is just like pointer
semantics.

>> Assuming you mean names must be forbidden from being rebound, no,
>> incorrect. It's only names which are shared between both threads which
>> must not be re-bound. If you have names local to the thread, you can
>> change them all you want without affecting any other thread.
>
> What does it matter, seeing as Python lacks the ability altogether?

I don't understand what you mean. Python lacks the ability to do what
altogether?

If you mean that Python threads lack the ability to have local variables,
that's not true at all.


--
Steven

Terry Reedy

unread,
Sep 5, 2009, 10:34:32 PM9/5/09
to pytho...@python.org
Adam Skutt wrote:
> On Sep 5, 11:29 am, Terry Reedy <tjre...@udel.edu> wrote:
>> This is a pointless replacement for 'def b(x): return x+a'
>
> And? That has nothing to do with anything I was saying whatsoever.

Agreed. However, posts are read by newbies.
Posts that promote bad habits are fair game for comment.


>
>> Python does not have lambda objects. It has lambda expressions that
>> produce function objects identical except for .__name__ to the
>> equivalent def statement output.

> Sure sounds like python has lambda objects to me then...

The idea that Python has 'lambda objects' had caused no end of mischief
over the years.

tjr

John Nagle

unread,
Sep 6, 2009, 1:06:55 AM9/6/09
to
Steven D'Aprano wrote:
> On Fri, 04 Sep 2009 06:36:59 -0700, Adam Skutt wrote:
>
>> Nope, preventing mutation of the objects themselves is not enough. You
>> also have to forbid variables from being rebound (pointed at another
>> object).

Right. What's needed for safe concurrency without global locking looks
something like this:

Object categories:
Immutable
Mutable
Synchronized
Unsynchronized
Owned by a thread.
Owned by synchronized object

"Synchronized" objects would be created with something like

class foo(synchronized) :
pass

Only one thread can be active within a synchronized object, as in Java.
So there's implicit locking at entry, unlocking at exit, and temporary
unlocking when the thread is blocked on a lock.
External access to non-function members of synchronized objects has to be
prohibited, since that would allow race conditions.

Everything else can be handled implicitly, without declarations or
annotation.

Here's the big problem:

class foo(synchronized) :
def __init__(self) :
self.items = []
def putitem(self,item) :
self.items.append(item) # adds item to object's list
def getitem(self,item) :
return(self.items.pop()) # removes item

def test()
words = ["hello","there"] # a mutable object
sobj = foo() # a synchronized object
sobj.putitem(words) # add to object
words[0] = "goodbye" # ERROR - no longer can access


The concept here is that objects have an "owner", which is either
a thread or some synchronized object. Locking is at the "owner"
level. This is simple until "ownership" needs to be transferred.
Can this be made to work in a Pythonic way, without explicit
syntax?

What we want to happen, somehow, is to
transfer the ownership of "words" from the calling thread to the object
in "putitem", and transfer it to the calling thread in "getitem".
How can this be done?

If ownership by a synchronized object has "priority" over ownership
by a thread, it works. When "putitem" above does the "append",
the instance of "foo" becomes the owner of "item". In general,
when a reference to an object is created, and the reference
is from an object owned by a synchronized object, the object
being referenced has to undergo an ownership change.

Once "putitem" has returned, after the ownership change,
it is an error (a "sharing violation?") for the calling thread to
access the object. That seems weird, but the alternative is
some explicit way of swapping ownership.

What's wrong with this? It takes two reference counts and a pointer
for each mutable object, which is a memory cost. Worse, when a
collection of unsynchronized mutable objects is passed in this manner,
all the elements in the collection have to undergo an ownership change.
That's expensive. (If you're passing trees around, it's better if the
tree root is synchronized. Then the tree root owns the tree, and
the tree can be passed around or even shared, with locking controlled
by the root.)

A compiler smart enough to notice when a variable goes "dead" can
optimize out some of the checking.

None of this affects single-thread programs at all. This is purely
needed for safe, efficient concurrency.

It's kind of a pain, but if servers are going to have tens or hundreds
of CPUs in future, we're going to have to get serious about concurrency.

John Nagle

Adam Skutt

unread,
Sep 6, 2009, 9:18:23 AM9/6/09
to
On Sep 5, 7:38 pm, Steven D'Aprano <st...@REMOVE-

> No. Lambdas are a *syntactical construct*, not an object. You wouldn't
> talk about "while objects" and "if objects" and "comment objects"
> *because they're not objects*.
This rhetoric precludes functions objects as well and is entirely non-
compelling.

> Functions created with def and functions created with lambda are
> *precisely* the same type of object.

Which means you have lambda objects. The fact they're same as any
other function is irrelevant and not especially interesting.

> There is no such thing as a "lambda
> object" which is a "special case" of ordinary functions, there are just
> functions.

Hey, I was just trying to resolve tjr's view, he seemed to think
that .__name__ is different is pretty important, and he's the one you
should take your objections up with, not me.

Adam Skutt

unread,
Sep 6, 2009, 9:21:12 AM9/6/09
to
On Sep 5, 10:34 pm, Terry Reedy <tjre...@udel.edu> wrote:
> Adam Skutt wrote:
> > On Sep 5, 11:29 am, Terry Reedy <tjre...@udel.edu> wrote:
> >> This is a pointless replacement for 'def b(x): return x+a'
>
> > And?  That has nothing to do with anything I was saying whatsoever.
>
> Agreed.  However, posts are read by newbies.
> Posts that promote bad habits are fair game for comment.
There's nothing inappropriate about using a lambda for a function I
don't care to give a name. That's the entire reason they exist.

> The idea that Python has 'lambda objects' had caused no end of mischief
> over the years.

As near as I can tell, this is because you're insisting on creating a
semantic distinction where there just isn't one.

Adam

Bearophile

unread,
Sep 6, 2009, 9:39:01 AM9/6/09
to
John Nagle:

> The concept here is that objects have an "owner", which is either
> a thread or some synchronized object.   Locking is at the "owner"
> level.  This is simple until "ownership" needs to be transferred.
> Can this be made to work in a Pythonic way, without explicit
> syntax?
>
> What we want to happen, somehow, is to
> transfer the ownership of "words" from the calling thread to the object
> in "putitem", and transfer it to the calling thread in "getitem".
> How can this be done?

There are several people that have found ways to implement such owning
of variables, for example Bartosz Milewski:

http://bartoszmilewski.wordpress.com/2009/06/02/race-free-multithreading-ownership/

It requires some extra complexities, so statically typed languages
usually don't implement this idea, even if it avoids bugs in user
code.
Implementing it in Python in a simple enough way looks like a good
topic for advanced research :-)

Bye,
bearophile

Terry Reedy

unread,
Sep 6, 2009, 10:12:56 AM9/6/09
to pytho...@python.org
Adam Skutt wrote:
> On Sep 5, 10:34 pm, Terry Reedy <tjre...@udel.edu> wrote:
>> Adam Skutt wrote:
>>> On Sep 5, 11:29 am, Terry Reedy <tjre...@udel.edu> wrote:
>>>> This is a pointless replacement for 'def b(x): return x+a'
>>> And? That has nothing to do with anything I was saying whatsoever.
>> Agreed. However, posts are read by newbies.
>> Posts that promote bad habits are fair game for comment.
> There's nothing inappropriate about using a lambda for a function I
> don't care to give a name. That's the entire reason they exist.

But you did give a name -- 'b' -- and that is when a lambda expression
is inappropriate and when a def statement should be used instead

>> The idea that Python has 'lambda objects' had caused no end of mischief
>> over the years.
> As near as I can tell, this is because you're insisting on creating a
> semantic distinction where there just isn't one.

To the contrary, I am objecting to the spurious distinction 'lambda
object' as people often use it.

tjr

John Nagle

unread,
Sep 6, 2009, 11:29:47 PM9/6/09
to

I hadn't seen that implementation for D. That's an interesting idea.
I'm certainly not the first to go down this road. But nobody seems to have
thought hard about this for Python yet. I looked at this years ago for C++,
but you get bogged down in the same problems that keep C++ "smart pointer"
implementations from being airtight.

In a static language like D, it takes a lot of declarations to make this
go. In Python, more of the ownership decisions have to be implicit, to keep
to a declaration-free Pythonic style.

That article has a policy that "every object has one owner, for life".
That allows compile-time checking, but it's very limiting. You can't,
for example, pass an array into a member function of a synchronized object
as a parameter and have the object keep a reference to it. (That's what
"Queue" does. for example.) Milewski would have you make a copy in
that situation. I felt that was too restrictive, but if you make
that restriction, it cuts down on overhead and simplifies the implementation.
The toughest part of this approach is doing a handoff from one owner to another.

Python has the advantage that a sizable fraction of its objects, especially
the very common ones like numbers and strings, are immutable. Immutable objects
can be shared without problems (other than storage management, but there are
ways to handle that.) So the programmer doesn't have to obsess over "who owns
a string" when a string parameter is passed to a function. So at least we
don't have to sweat the ownership issue for little stuff.

It's worth looking at this for Python because CPython's current thread model
behaves terribly on multiprocessors. We really do need something better.

John Nagle

Steven D'Aprano

unread,
Sep 7, 2009, 12:27:01 AM9/7/09
to
On Sun, 06 Sep 2009 06:18:23 -0700, Adam Skutt wrote:

> On Sep 5, 7:38 pm, Steven D'Aprano <st...@REMOVE-
>> No. Lambdas are a *syntactical construct*, not an object. You wouldn't
>> talk about "while objects" and "if objects" and "comment objects"
>> *because they're not objects*.
> This rhetoric precludes functions objects as well and is entirely non-
> compelling.

Functions ARE objects in Python. They even inherit from object:

>>> def f():
... return None
...
>>> isinstance(f, object)
True


Just because there is syntax for creating functions (at least two
different syntax forms actually) doesn't "preclude function objects".
There is syntax for dicts, and dict objects; syntax for lists, and list
objects; syntax for strings, and string objects. But there's syntax for
while loops, and no such thing as a while object.

Lambda expressions are syntax for creating function objects. That's all
there is to it, end of story.


>> Functions created with def and functions created with lambda are
>> *precisely* the same type of object.
> Which means you have lambda objects. The fact they're same as any other
> function is irrelevant and not especially interesting.

They're *function* objects, not "lambda" objects:

>>> type(lambda: None)
<type 'function'>


>> There is no such thing as a "lambda
>> object" which is a "special case" of ordinary functions, there are just
>> functions.
> Hey, I was just trying to resolve tjr's view, he seemed to think that
> .__name__ is different is pretty important, and he's the one you should
> take your objections up with, not me.

Nice try but no. It was YOU, not Terry, claiming that lambda's are a
special kind of object different from ordinary functions.

--
Steve

Steven D'Aprano

unread,
Sep 7, 2009, 12:32:12 AM9/7/09
to
On Sun, 06 Sep 2009 10:12:56 -0400, Terry Reedy wrote:

> Adam Skutt wrote:

>> There's nothing inappropriate about using a lambda for a function I
>> don't care to give a name. That's the entire reason they exist.
>
> But you did give a name -- 'b' -- and that is when a lambda expression
> is inappropriate and when a def statement should be used instead


I think that's too strong a claim. Functions are first class objects, and
there no reason why you can't do this:

def f():
return None

g = f

So what's wrong with doing this?

g = lambda: None

>>> The idea that Python has 'lambda objects' had caused no end of
>>> mischief over the years.
>> As near as I can tell, this is because you're insisting on creating a
>> semantic distinction where there just isn't one.
>
> To the contrary, I am objecting to the spurious distinction 'lambda
> object' as people often use it.


Agreed.

--
Steven

Message has been deleted

Terry Reedy

unread,
Sep 7, 2009, 1:55:05 AM9/7/09
to pytho...@python.org
Dennis Lee Bieber wrote:
> On Sun, 06 Sep 2009 20:29:47 -0700, John Nagle <na...@animats.com>
> declaimed the following in gmane.comp.python.general:

>
>> Python has the advantage that a sizable fraction of its objects, especially
>> the very common ones like numbers and strings, are immutable. Immutable objects
>
> We must have different ideas of "sizable"
>
> Numbers, strings, and tuples are immutable...
> Lists, dictionaries, and pretty much anything else (functions, class
> instances, etc.) are mutable in one way or another... I'd say the
> mutables are in the majority <G>

I think it depends on whether one counts classes or instances. Typical
programs have a lot of numbers and strings.

tjr


Simon Brunning

unread,
Sep 7, 2009, 4:09:29 AM9/7/09
to Terry Reedy, python-list
2009/9/7 Terry Reedy <tjr...@udel.edu>:

> Dennis Lee Bieber wrote:
>> I'd say the
>> mutables are in the majority <G>
>
> I think it depends on whether one counts classes or instances. Typical programs have a lot of numbers and strings.

Ah, but immutable instances can be, and often are, interned. This will
cut down on their number considerably. ;-)

--
Cheers,
Simon B.

Graham Breed

unread,
Sep 7, 2009, 5:58:06 AM9/7/09
to
John Nagle wrote:

> In the beginning, strings, tuples, and numbers were immutable, and
> everything else was mutable. That was simple enough. But over time,
> Python has acquired more immutable types - immutable sets and immutable
> byte arrays. Each of these is a special case.

<snip>

> Immutability is interesting for threaded programs, because
> immutable objects can be shared without risk. Consider a programming
> model where objects shared between threads must be either immutable or
> "synchronized" in the sense that Java uses the term. Such programs
> are free of most race conditions, without much programmer effort to
> make them so.

Of course, tuples would still be a special case because they
may contain mutable objects. You need to check they're
immutable all the way down.

Nothing to do with threading, but it's also the cause of
this weirdness:

http://bytes.com/topic/python/answers/752154-list-tuple

>>a = ([1], 2)
>>a[0] += [3]

succeeds, but raises an error.


Graham

John Nagle

unread,
Sep 7, 2009, 2:26:02 PM9/7/09
to
Graham Breed wrote:
> John Nagle wrote:
>
>> In the beginning, strings, tuples, and numbers were immutable, and
>> everything else was mutable. That was simple enough. But over time,
>> Python has acquired more immutable types - immutable sets and immutable
>> byte arrays. Each of these is a special case.
>
> <snip>
>
>> Immutability is interesting for threaded programs, because
>> immutable objects can be shared without risk. Consider a programming
>> model where objects shared between threads must be either immutable or
>> "synchronized" in the sense that Java uses the term. Such programs
>> are free of most race conditions, without much programmer effort to
>> make them so.
>
> Of course, tuples would still be a special case because they may contain
> mutable objects. You need to check they're immutable all the way down.

Right. Tracking mutablity and ownership all the way down without
making the language either restrictive or slow is tough.

In multi-thread programs, though, somebody has to be clear on who owns
what. I'm trying to figure out a way for the language, rather than the
programmer, to do that job. It's a major source of trouble in threaded
programs.

John Nagle

Paul Rubin

unread,
Sep 7, 2009, 3:56:41 PM9/7/09
to
John Nagle <na...@animats.com> writes:
> Right. Tracking mutablity and ownership all the way down without
> making the language either restrictive or slow is tough.
>
> In multi-thread programs, though, somebody has to be clear on who owns
> what. I'm trying to figure out a way for the language, rather than the
> programmer, to do that job. It's a major source of trouble in threaded
> programs.

Python's threading system is modelled after Java's, which basically
uses explicit locks under programmer control. That has its hazards
but does manage to multiprocess effectively. CPython's
multiprocessing is limited by the notorious GIL, but that's an
implementation artifact.

Having the language automatically manage ownership of mutable objects
without a fancy type system sounds self-contradictory. Erlang (which
is typeless like Python) does it by not allowing mutation at all.
Haskell uses a type-based scheme around software transactional memory
(STM) primitives:

http://book.realworldhaskell.org/read/software-transactional-memory.html

The Microsoft Research "Singularity" OS may also be of interest.

Hendrik van Rooyen

unread,
Sep 8, 2009, 3:38:51 AM9/8/09
to pytho...@python.org
On Monday 07 September 2009 20:26:02 John Nagle wrote:

> Right. Tracking mutablity and ownership all the way down without
> making the language either restrictive or slow is tough.
>
> In multi-thread programs, though, somebody has to be clear on who owns
> what. I'm trying to figure out a way for the language, rather than the
> programmer, to do that job. It's a major source of trouble in threaded
> programs.

I think that trying to make the language instead of the programmer responsible
for this is a ball-buster. It is unlikely to be either easy or cheap.
I would rather have the programmer responsible for the mental model, and give
her the tools to do the job with.

In any case - if you do not actually like juggling with knives, then you
should not be mucking around with concurrency, and by making the language
safe, you are taking the fun out.

- Hendrik

Paul Rubin

unread,
Sep 8, 2009, 4:10:47 AM9/8/09
to
Hendrik van Rooyen <hen...@microcorp.co.za> writes:
> In any case - if you do not actually like juggling with knives, then you
> should not be mucking around with concurrency, and by making the language
> safe, you are taking the fun out.

Oh come on, Erlang and Haskell both take care of it rather well.

Steven D'Aprano

unread,
Sep 8, 2009, 4:21:18 AM9/8/09
to
On Tue, 08 Sep 2009 09:38:51 +0200, Hendrik van Rooyen wrote:

> On Monday 07 September 2009 20:26:02 John Nagle wrote:
>
>> Right. Tracking mutablity and ownership all the way down without
>> making the language either restrictive or slow is tough.
>>
>> In multi-thread programs, though, somebody has to be clear on who
>> owns
>> what. I'm trying to figure out a way for the language, rather than the
>> programmer, to do that job. It's a major source of trouble in threaded
>> programs.
>
> I think that trying to make the language instead of the programmer
> responsible for this is a ball-buster. It is unlikely to be either easy
> or cheap. I would rather have the programmer responsible for the mental
> model, and give her the tools to do the job with.

That was the situation 20 years ago with memory management. I'm sure
people back then thought that the Right Solution was to give the
programmer tools to get the job done, and hope they can avoid
dereferencing nil pointers and memory leaks and all the other cruft of
hand-managing memory. Today, we have garbage collectors and high-level
languages like Ruby, Python, Haskell etc that manage that for you, and
even heavyweight garbage collectors are practical for the majority of
userspace applications.


> In any case - if you do not actually like juggling with knives, then you
> should not be mucking around with concurrency, and by making the
> language safe, you are taking the fun out.

If by "fun" you mean "screaming horrors", I agree.


--
Steven

John Nagle

unread,
Sep 8, 2009, 1:45:46 PM9/8/09
to
Steven D'Aprano wrote:
> On Tue, 08 Sep 2009 09:38:51 +0200, Hendrik van Rooyen wrote:
>
>> On Monday 07 September 2009 20:26:02 John Nagle wrote:
>>
>>> Right. Tracking mutablity and ownership all the way down without
>>> making the language either restrictive or slow is tough.
>>>
>>> In multi-thread programs, though, somebody has to be clear on who
>>> owns
>>> what. I'm trying to figure out a way for the language, rather than the
>>> programmer, to do that job. It's a major source of trouble in threaded
>>> programs.
>> I think that trying to make the language instead of the programmer
>> responsible for this is a ball-buster. It is unlikely to be either easy
>> or cheap. I would rather have the programmer responsible for the mental
>> model, and give her the tools to do the job with.
>
> That was the situation 20 years ago with memory management.

Good point.

The other big point is the CPython deals with concurrency by
preventing it. This is killing performance on multi-core CPUs.
Read "http://www.dabeaz.com/python/GIL.pdf", which demonstrates
just how awful the current GIL implementation is. Adding more
CPUs slows CPython down.

John Nagle

Daniel Fetchinson

unread,
Sep 8, 2009, 2:23:05 PM9/8/09
to pytho...@python.org
>> > Is the difference because of mutability versus immutability, or
>> > because of C code in Numpy versus Matlab code? Are you comparing
>> > bananas and pears?
>>
>> It consisted of something like this
>
>
> Your code does a lot of unnecessary work if you're just trying to
> demonstrate immutability is faster or slower than mutability. A simple
> test that just adds one to each element would have much less overhead.
>
> In any case, there's no doubt that immutable objects require extra time
> to create compared to re-using an existing mutable object, and that time
> is probably O(N) (until you approach the limits of available contiguous
> memory, in which case you could see O(N**2) or worse behaviour). But an
> order of magnitude difference?
>
>
>> I wasn't comparing bananas against pears. Mathworks informed me they
>> were using my code to investigate why Matlab was showing such slow-
>> downs. I am not sure what they found out, eventially. I have also
>> wondered if immutability vs. mutability was the reason, as NumPy
>> generates temporary arrays. But I have not found a better explanation
>> either. Anyhow, ~30 seconds for Matlab vs. ~3 seconds for Python is a
>> major difference.
>
> How does Matlab speed compare to Python in general? Ruby, for example, is
> an order of magnitude slower than Python (at least it was last time I
> looked)

For what operations? Under what circumstances? I'm just being pedantic
because you mentioned comparing bananas and pears ......

>, not because of immutable arrays, but just because of the speed
> of the language.

Ummmm, what is 'speed of a language'? I thought ruby or python or
anything else as a language is separate from their implementations.
Implementations might have 'speed' but languages don't. Aren't you
comparing bananas and pears again?

Cheers,
Daniel


--
Psss, psss, put it down! - http://www.cafepress.com/putitdown

Steven D'Aprano

unread,
Sep 8, 2009, 6:44:17 PM9/8/09
to
On Tue, 08 Sep 2009 11:23:05 -0700, Daniel Fetchinson wrote:

>> How does Matlab speed compare to Python in general? Ruby, for example,
>> is an order of magnitude slower than Python (at least it was last time
>> I looked)
>
> For what operations? Under what circumstances? I'm just being pedantic
> because you mentioned comparing bananas and pears ......

In general, Ruby was significantly slower than Python "most of the time",
although my recollection was wrong about it being an order of magnitude
difference -- it was more like a factor of 3-6 depending on the specific
benchmark being tested. This was for Ruby 1.8, Ruby 1.9 included some
significant speedups, including tail-optimization which makes it about
three times as fast as Python 2.5 for tail-recursive functions.

Of course there's a lot of hand-waving in the above. Language
implementations vary in their performance for specific pieces of code --
it's invalid to conclude that every single Ruby script will be exactly 3
times faster than every single Python script. But one can argue that Ruby
1.8 was, in general, at least three times slower than Python 2.5
comparing equivalent pieces of code.

There's nothing controversial or strange over the claim that a well-
written (but not heavily optimized) C program will be much faster than
the equivalent Python program, which in turn will be faster than the
equivalent PHP program. Nobody is surprised to learn that Numpy's C
implementation of some function will almost certainly out-perform the
same function written in pure Python. It's easy to oversell the idea that
"language X is faster than language Y", but that's not what I'm doing.
I'm asking, all else being equal, how does the speed of Matlab compare to
the speed of Numpy?



>>, not because of immutable arrays, but just because of the speed
>> of the language.
>
> Ummmm, what is 'speed of a language'? I thought ruby or python or
> anything else as a language is separate from their implementations.
> Implementations might have 'speed' but languages don't. Aren't you
> comparing bananas and pears again?

Go point -- of course you're right, and I was sloppy. I meant "the speed
of the specific implementation".

--
Steven

0 new messages