threading support in python

km

unread,

Sep 4, 2006, 10:15:03 AM9/4/06

to pytho...@python.org

Hi all,

Is there any PEP to introduce true threading features into python's
next version as in java? i mean without having GIL.
when compared to other languages, python is fun to code but i feel its
is lacking behind in threading

regards,
KM

bayerj

unread,

Sep 4, 2006, 10:58:00 AM9/4/06

to

Hi,

GIL won't go. You might want to read
http://blog.ianbicking.org/gil-of-doom.html .

Regards,
-Justin

km

unread,

Sep 4, 2006, 11:32:16 AM9/4/06

to bayerj, pytho...@python.org

Hi all,
Are there any alternate ways of attaining true threading in python ?
if GIL doesnt go then does it mean that python is useless for
computation intensive scientific applications which are in need of
parallelization in threading context ?

regards,
KM
---------------------------------------------------------------------------

> --
> http://mail.python.org/mailman/listinfo/python-list
>

bayerj

unread,

Sep 4, 2006, 11:43:36 AM9/4/06

to

Hi,

You might want to split your calculation onto different
worker-processes.

Then you can use POSH [1] to share data and objects.
You might even want to go a step further and share the data via
Sockets/XML-RPC or something like that. That makes it easy to throw
aditional boxes at a specific calculation, because it can be set up in
about no time.
You can even use Twisted Spread [2] and its perspective broker to do
this on a higher level.

If that's not what you want, you are left with Java I guess.

Regards,
-Justin

[1] http://poshmodule.sourceforge.net/
[2] http://twistedmatrix.com/projects/core/documentation/howto/pb.html

Richard Brodie

unread,

Sep 4, 2006, 11:49:31 AM9/4/06

to

"km" <srikris...@gmail.com> wrote in message
news:mailman.10337.1157383...@python.org...

> if GIL doesnt go then does it mean that python is useless for
> computation intensive scientific applications which are in need of
> parallelization in threading context ?

No.

Sybren Stuvel

unread,

Sep 4, 2006, 11:48:14 AM9/4/06

to

km enlightened us with:

> Is there any PEP to introduce true threading features into python's
> next version as in java? i mean without having GIL.

What is GIL? Except for the Dutch word for SCREAM that is...

> when compared to other languages, python is fun to code but i feel
> its is lacking behind in threading

What's wrong with the current threading? AFAIK it's directly linked to
the threading of the underlying platform.

Sybren
--
Sybren Stüvel
Stüvel IT - http://www.stuvel.eu/

Jean-Paul Calderone

unread,

Sep 4, 2006, 12:01:30 PM9/4/06

to pytho...@python.org

On Mon, 4 Sep 2006 17:48:14 +0200, Sybren Stuvel <sybr...@yourthirdtower.com.imagination> wrote:
>km enlightened us with:
>> Is there any PEP to introduce true threading features into python's
>> next version as in java? i mean without having GIL.
>
>What is GIL? Except for the Dutch word for SCREAM that is...
>
>> when compared to other languages, python is fun to code but i feel
>> its is lacking behind in threading
>
>What's wrong with the current threading? AFAIK it's directly linked to
>the threading of the underlying platform.

Only one thread per process can execute Python bytecode, even on an SMP
system. The main bytecode eval loop is protected by a lock, the "Global
Interpreter Lock".

This doesn't prevent certain operations from being parallelized. For
example, many of the I/O calls in Python release the GIL so that while
they are blocked on the network or the disk, another thread can continue
to execute. Extension modules which perform computationally intensive
tasks can also release the GIL to allow better exploitation of SMP
resources.

Jean-Paul

Diez B. Roggisch

unread,

Sep 4, 2006, 12:11:39 PM9/4/06

to

Sybren Stuvel wrote:

> km enlightened us with:
>> Is there any PEP to introduce true threading features into python's
>> next version as in java? i mean without having GIL.
>
> What is GIL? Except for the Dutch word for SCREAM that is...

the global interpreter lock, that prevents python from concurrently
modifying internal structures causing segfaults.

>> when compared to other languages, python is fun to code but i feel
>> its is lacking behind in threading
>
> What's wrong with the current threading? AFAIK it's directly linked to
> the threading of the underlying platform.

There exist rare cases (see the link from bayerj) where the GIL is an
annoyance, and with the dawn of MP-cores all over the place it might be
considered a good idea removing it - maybe. But I doubt that is something
to be considered for py2.x

Diez

Sandra-24

unread,

Sep 4, 2006, 12:33:46 PM9/4/06

to

The trouble is there are some environments where you are forced to use
threads. Apache and mod_python are an example. You can't make use of
mutliple CPUs unless you're on *nux and run with multiple processes AND
you're application doesn't store large amounts of data in memory (which
mine does) so you'd have to physically double the computer's memory for
a daul-core, or quadruple it for a quadcore. And forget about running a
windows server, apache will not even run with multiple processes.

In years to come this will be more of an issue because single core CPUs
will be harder to come by, you'll be throwing away half of every CPU
you buy.

-Sandra

Daniel Dittmar

unread,

Sep 4, 2006, 12:41:42 PM9/4/06

to

Some of the technical problems:

- probably breaks compatibility of extensions at the source level in a
big way, although this might be handled by SWIG, boost and other code
generators
- reference counting will have to be synchronized, which means that
Python will become slower
- removing reference counting and relying on garbage collection alone
will break many Python applications (because they rely on files being
closed at end of scope etc.)

Daniel

Rob Williscroft

unread,

Sep 4, 2006, 2:53:05 PM9/4/06

to

Daniel Dittmar wrote in news:edhl07$b2t$1...@news.sap-ag.de in
comp.lang.python:

> - removing reference counting and relying on garbage collection alone
> will break many Python applications (because they rely on files being
> closed at end of scope etc.)
>

They are already broken on at least 2 python implementations, so
why worry about another one.

Rob.
--
http://www.victim-prime.dsl.pipex.com/

sjde...@yahoo.com

unread,

Sep 4, 2006, 3:36:33 PM9/4/06

to

Sandra-24 wrote:
> The trouble is there are some environments where you are forced to use
> threads. Apache and mod_python are an example. You can't make use of
> mutliple CPUs unless you're on *nux and run with multiple processes AND
> you're application doesn't store large amounts of data in memory (which
> mine does) so you'd have to physically double the computer's memory for
> a daul-core, or quadruple it for a quadcore.

You seem to be confused about the nature of multiple-process
programming.

If you're on a modern Unix/Linux platform and you have static read-only
data, you can just read it in before forking and it'll be shared
between the processes..

If it's read/write data or you're not on a Unix platform, you can use
shared memory to shared it between many processes.

Threads are way overused in modern multiexecution programming. The
decision on whether to use processes or threads should come down to
whether you want to share everything, or whether you have specific
pieces of data you want to share. With processes + shm, you can gain
the security of protected memory for the majority of your code + data,
only sacrificing it where you need to share the data.

The entire Windows programming world tends to be so biased toward
multithreading that they often don't even acknowledge the existence of
generally superior alternatives. I think that's in large part because
historically on Windows 3.1/95/98 there was no good way to create
processes without running a new binary, and so a culture of threading
grew up. Even today many Windows programmers are unfamiliar with using
CreateProcessEx with SectionHandle=NULL for efficient copy-on-write
process creation.

> And forget about running a
> windows server, apache will not even run with multiple processes.

It used to run on windows with multiple processes. If it really won't
now, use an older version or contribute a fix.

Now, the GIL is independent of this; if you really need threading in
your situation (you share almost everything and have hugely complex
data structures that are difficult to maintain in shm) then you're
still going to run into GIL serialization. If you're doing a lot of
work in native code extensions this may not actually be a big
performance hit, if not it can be pretty bad.

Paul Rubin

unread,

Sep 4, 2006, 4:06:31 PM9/4/06

to

"sjde...@yahoo.com" <sjde...@yahoo.com> writes:
> If it's read/write data or you're not on a Unix platform, you can use
> shared memory to shared it between many processes.
>
> Threads are way overused in modern multiexecution programming. The
> decision on whether to use processes or threads should come down to
> whether you want to share everything, or whether you have specific
> pieces of data you want to share.

Shared memory means there's a byte vector (the shared memory region)
accessible to multiple processes. The processes don't use the same
machine addresses to reference the vector. Any data structures
(e.g. those containing pointers) shared between the processes have to
be marshalled in and out of the byte vector instead of being accessed
normally. Any live objects such as open sockets have to be shared
some other way. It's not a matter of sharing "everything"; shared
memory is a pain in the neck even to share a single object. These
things really can be easier with threads.

Daniel Dittmar

unread,

Sep 4, 2006, 6:00:12 PM9/4/06

to

Rob Williscroft wrote:
> Daniel Dittmar wrote in news:edhl07$b2t$1...@news.sap-ag.de in
> comp.lang.python:
>
>
>>- removing reference counting and relying on garbage collection alone
>>will break many Python applications (because they rely on files being
>>closed at end of scope etc.)
>>
>
>
> They are already broken on at least 2 python implementations, so
> why worry about another one.

I guess few applications or libraries are being ported from CPython to
Jython or IronPython as each is targeting a different standard library,
so this isn't that much of a problem yet.

Daniel

Sandra-24

unread,

Sep 4, 2006, 10:19:24 PM9/4/06

to

> You seem to be confused about the nature of multiple-process
> programming.
>
> If you're on a modern Unix/Linux platform and you have static read-only
> data, you can just read it in before forking and it'll be shared
> between the processes..

Not familiar with *nix programming, but I'll take your word on it.

> If it's read/write data or you're not on a Unix platform, you can use
> shared memory to shared it between many processes.

I know how shared memory works, it's the last resort in my opinion.

> Threads are way overused in modern multiexecution programming. The

<snip>

> It used to run on windows with multiple processes. If it really won't
> now, use an older version or contribute a fix.

First of all I'm not in control of spawning processes or threads.
Apache does that, and apache has no MPM for windows that uses more than
1 process. Secondly "Superior" is definately a matter of opinion. Let's
see how you would define superior.

1) Port (a nicer word for rewrite) the worker MPM from *nix to Windows.
2) Alternately switch to running Linux servers (which have their
plusses) but about which I know nothing. I've been using Windows since
I was 10 years old, I'm confident in my ability to build, secure, and
maintain a Windows server. I don't think anyone would recommend me to
run Linux servers with very little in the way of Linux experience.
3) Rewrite my codebase to use some form of shared memory. This would be
a terrible nightmare that would take at least a month of development
time and a lot of heavy rewriting. It would be very difficult, but I'll
grant that it may work if done properly with only small performance
losses. Sounds like a deal.

I would find an easier time, I think, porting mod_python to .net and
leaving that GIL behind forever. Thankfully, I'm not considering such
drastic measures - yet.

Why on earth would I want to do all of that work? Just because you want
to keep this evil thing called a GIL? My suggestion is in python 3
ditch the ref counting, use a real garbage collector, and make that GIL
walk the plank. I have my doubts that it would happen, but that's fine,
the future of python is in things like IronPython and PyPy. CPython's
days are numbered. If there was a mod_dotnet I wouldn't be using
CPython anymore.

> Now, the GIL is independent of this; if you really need threading in
> your situation (you share almost everything and have hugely complex
> data structures that are difficult to maintain in shm) then you're
> still going to run into GIL serialization. If you're doing a lot of
> work in native code extensions this may not actually be a big
> performance hit, if not it can be pretty bad.

Actually, I'm not sure I understand you correctly. You're saying that
in an environment like apache (with 250 threads or so) and my hugely
complex shared data structures, that the GIL is going to cause a huge
performance hit? So even if I do manage to find my way around in the
Linux world, and I upgrade my memory, I'm still going to be paying for
that darned GIL?

Will the madness never end?
-Sandra

Steve Holden

unread,

Sep 5, 2006, 2:49:44 AM9/5/06

to pytho...@python.org

Sandra-24 wrote:
[Sandra understands shared memory]

>
> I would find an easier time, I think, porting mod_python to .net and
> leaving that GIL behind forever. Thankfully, I'm not considering such
> drastic measures - yet.
>

Quite right too. You haven't even sacrificed a chicken yet ...

> Why on earth would I want to do all of that work? Just because you want
> to keep this evil thing called a GIL? My suggestion is in python 3
> ditch the ref counting, use a real garbage collector, and make that GIL
> walk the plank. I have my doubts that it would happen, but that's fine,
> the future of python is in things like IronPython and PyPy. CPython's
> days are numbered. If there was a mod_dotnet I wouldn't be using
> CPython anymore.
>

You write as though the GIL was invented to get in the programmer's way,
which is quite wrong. It's there to avoid deep problems with thread
interaction. Languages that haven't bitten that bullet can bite you in
quite nasty ways when you write threaded applications.

Contrary to your apparent opinion, the GIL has nothing to do with
reference-counting.

>
>>Now, the GIL is independent of this; if you really need threading in
>>your situation (you share almost everything and have hugely complex
>>data structures that are difficult to maintain in shm) then you're
>>still going to run into GIL serialization. If you're doing a lot of
>>work in native code extensions this may not actually be a big
>>performance hit, if not it can be pretty bad.
>
>
> Actually, I'm not sure I understand you correctly. You're saying that
> in an environment like apache (with 250 threads or so) and my hugely
> complex shared data structures, that the GIL is going to cause a huge
> performance hit? So even if I do manage to find my way around in the
> Linux world, and I upgrade my memory, I'm still going to be paying for
> that darned GIL?
>

I think the suggestion was rather that abandoning Python because of the
GIL might be premature optimisation. But since you appear to be sticking
with it, that might have been unnecessary advice.

> Will the madness never end?

This reveals an opinion of the development team that's altogether too
low. I believe the GIL was introduced for good reasons.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Paul Rubin

unread,

Sep 5, 2006, 4:03:51 AM9/5/06

to

Steve Holden <st...@holdenweb.com> writes:
> You write as though the GIL was invented to get in the programmer's
> way, which is quite wrong. It's there to avoid deep problems with
> thread interaction. Languages that haven't bitten that bullet can bite
> you in quite nasty ways when you write threaded applications.

And yet, Java programmers manage to write threaded applications all
day long without getting bitten (once they're used to the issues),
despite usually being less skilled than Python programmers ;-).

> Contrary to your apparent opinion, the GIL has nothing to do with
> reference-counting.

I think it does, i.e. one of the GIL's motivations was to protect the
management of reference counts in CPython, which otherwise wasn't
thread-safe. The obvious implementation of Py_INCREF has a race
condition, for example. The GIL documentation at

http://docs.python.org/api/threads.html

describes this in its very first paragraph.

> > Will the madness never end?
>
> This reveals an opinion of the development team that's altogether too
> low. I believe the GIL was introduced for good reasons.

The GIL was an acceptable tradeoff when it was first created in the
previous century. First of all, it gave a way to add threads to the
existing, non-threadsafe CPython implementation without having to
rework the old code too much. Second, Python was at that time
considered a "scripting language" and there was less concern about
writing complex apps in it, especially multiprocessing apps. Third,
multiprocessor computers were themselves exotic, so people who wanted
to program them probably had exotic problems that they were willing to
jump through hoops to solve.

These days, even semi-entry-level consumer laptop computers have dual
core CPU's, and quad Opteron boxes (8-way multiprocessing using X2
processors) are quite affordable for midrange servers or engineering
workstations, and there's endless desire to write fancy server apps
completely in Python. There is no point paying for all that
multiprocessor hardware if your programming language won't let you use
it. So, Python must punt the GIL if it doesn't want to keep
presenting undue obstacles to writing serious apps on modern hardware.

Felipe Almeida Lessa

unread,

Sep 5, 2006, 5:19:27 AM9/5/06

to Sandra-24, pytho...@python.org

4 Sep 2006 19:19:24 -0700, Sandra-24 <sandra...@yahoo.com>:

> If there was a mod_dotnet I wouldn't be using
> CPython anymore.

I guess you won't be using then: http://www.mono-project.com/Mod_mono

--
Felipe.

Sandra-24

unread,

Sep 5, 2006, 10:40:55 AM9/5/06

to

Steve Holden wrote:
> Quite right too. You haven't even sacrificed a chicken yet ...

Hopefully we don't get to that point.

> You write as though the GIL was invented to get in the programmer's way,
> which is quite wrong. It's there to avoid deep problems with thread
> interaction. Languages that haven't bitten that bullet can bite you in
> quite nasty ways when you write threaded applications.

I know it was put there because it is meant to be a good thing.
However, it gets in my way. I would be perfectly happy if it were gone.
I've never written code that assumes there's a GIL. I always write my
code with all shared writable objects protected by locks. It's far more
portable, and a good habit to get into. You realize that because of the
GIL, they were discussing (and may have already implemented) Java style
synchronized dictionaries and lists for IronPython simply because
python programmers just assume they are thread safe thanks to the GIL.
I always hated that about Java. If you want to give me thread safe
collections, fine, they'll be nice for sharing between threads, but
don't make me use synchronized collections for single-threaded code.
You'll notice the newer Java collections are not synchronized, it would
seem I'm not alone in that opinion.

> Contrary to your apparent opinion, the GIL has nothing to do with
> reference-counting.

Actually it does. Without the GIL reference counting is not thread
safe. You have to synchronize all reference count accesses, increments,
and decrements because you have no way of knowing which objects get
shared across threads. I think with Python's current memory management,
the GIL is the lesser evil.

I'm mostly writing this to provide a different point of view, many
people seem to think (previously linked blog) that there is no downside
to the GIL, and that's just not true. However, I don't expect that the
GIL can be safely removed from CPython. I also think that it doesn't
matter because projects like IronPython and PyPy are very likely the
way of the future for Python anyway. Once you move away from C there
are so many more things you can do.

> I think the suggestion was rather that abandoning Python because of the
> GIL might be premature optimisation. But since you appear to be sticking
> with it, that might have been unnecessary advice.

I would never abandon Python, and I hold the development team in very
high esteem. That doesn't mean there's a few things (like the GIL, or
super) that I don't like. But overall they've done an excellent job on
the 99% of things the've got right. I guess we don't say that enough.

I might switch from CPython sometime to another implementation, but it
won't be because of the GIL. I'm very fond of the .net framework as a
library, and I'd also rather write performance critical code in C# than
C (who wouldn't?) I'm also watching PyPy with interest.

-Sandra

Bryan Olson

unread,

Sep 5, 2006, 10:46:43 AM9/5/06

to

bayerj wrote:

> Then you can use POSH [1] to share data and objects.

Do you use POSH? How well does it work with current Python?
Any major gotchas?

I think POSH looks like a great thing to have, but the latest
version is an alpha from over three years ago. Also, it only
runs on *nix systems.

--
--Bryan

Sandra-24

unread,

Sep 5, 2006, 10:57:39 AM9/5/06

to

Oh I'm aware of that, but it's not what I'm looking for. Mod_mono just
lets you run ASP.NET on Apache. I'd much rather use Python :) Now if
there was a way to run IronPython on Apache I'd be interested.

-Sandra

sk...@pobox.com

unread,

Sep 5, 2006, 10:58:43 AM9/5/06

to Sandra-24, pytho...@python.org

Sandra> However, I don't expect that the GIL can be safely removed from
Sandra> CPython.

It was removed at one point in the dim, dark past (circa Python 1.4) on an
experimental basis. Aside from the huge amount of work, it resulted in
significantly lower performance for single-threaded apps (that is, the
common case). Maybe more effort should have been put in at that time to
improve performance, but that didn't happen. Much more water has gone under
the bridge at this point, so extracting the GIL from the core would be
correspondingly more difficult.

Skip

km

unread,

Sep 5, 2006, 11:09:18 AM9/5/06

to pytho...@python.org

Hi all,

> And yet, Java programmers manage to write threaded applications all
> day long without getting bitten (once they're used to the issues),
> despite usually being less skilled than Python programmers ;-).

> These days, even semi-entry-level consumer laptop computers have dual
> core CPU's, and quad Opteron boxes (8-way multiprocessing using X2
> processors) are quite affordable for midrange servers or engineering
> workstations, and there's endless desire to write fancy server apps
> completely in Python. There is no point paying for all that
> multiprocessor hardware if your programming language won't let you use
> it. So, Python must punt the GIL if it doesn't want to keep
> presenting undue obstacles to writing serious apps on modern hardware.

True
GIL implementation must have got its own good causes as it it designed
but as language evolves its very essential that one increases the
scope such that it fits into many usage areas(eg. scientific
applications using multiprocessors etc.).

In the modern scientific age where
__multiprocessor_execution_environment__ is quite common, i feel there
is a need to rethink abt the introduction of true parallelization
capabilities in python.
I know many of my friends who didnot choose python for obvious reasons
of the nature of thread execution in the presence of GIL which means
that one is wasting sophisticated hardware resources.

##########################################
if __name__ == ''__multiprocessor_execution_environment__':
for python_version in range(python2.4.x, python3.x, x):

if python_version.GIL:

print 'unusable for computation intensive multiprocessor
architecture'

else:
print cmp(python,java)
############################################

regards,
KM

sjde...@yahoo.com

unread,

Sep 5, 2006, 11:23:45 AM9/5/06

to

Sandra-24 wrote:
> > You seem to be confused about the nature of multiple-process
> > programming.
> >
> > If you're on a modern Unix/Linux platform and you have static read-only
> > data, you can just read it in before forking and it'll be shared
> > between the processes..
>
> Not familiar with *nix programming, but I'll take your word on it.

You can do the same on Windows if you use CreateProcessEx to create the
new processes and pass a NULL SectionHandle. I don't think this helps
in your case, but I was correcting your impression that "you'd have to
physically double the computer's memory for a dual core, or quadruple
it for a quadcore". That's just not even near true.

> > Threads are way overused in modern multiexecution programming. The
>
> <snip>
>
> > It used to run on windows with multiple processes. If it really won't
> > now, use an older version or contribute a fix.
>
> First of all I'm not in control of spawning processes or threads.
> Apache does that, and apache has no MPM for windows that uses more than
> 1 process.

As I said, Apache used to run on Windows with multiple processes; using
a version that supports that is one option. There are good reasons not
to do that, though, so you could be stuck with threads.

> Secondly "Superior" is definately a matter of opinion. Let's
> see how you would define superior.

Having memory protection is superior to not having it--OS designers
spent years implementing it, why would you toss out a fair chunk of it?
Being explicit about what you're sharing is generally better than not.

But as I said, threads are a better solution if you're sharing the vast
majority of your memory and have complex data structures to share.
When you're starting a new project, really think about whether they're
worth the considerable tradeoffs, though, and consider the merits of a
multiprocess solution.

> 3) Rewrite my codebase to use some form of shared memory. This would be
> a terrible nightmare that would take at least a month of development
> time and a lot of heavy rewriting. It would be very difficult, but I'll
> grant that it may work if done properly with only small performance
> losses.

It's almost certainly not worth rewriting a large established
codebasen.

> I would find an easier time, I think, porting mod_python to .net and
> leaving that GIL behind forever. Thankfully, I'm not considering such
> drastic measures - yet.

The threads vs. processes thing isn't strongly related to the
implementation language (though a few languages like Java basically
take the decision out of your hands). Moving to .NET leaves you with
the same questions to consider before making the decision--just working
in C# doesn't somehow make threads the right choice all the time.

> Why on earth would I want to do all of that work? Just because you want
> to keep this evil thing called a GIL?

No, I agreed that the GIL is a bad thing for some applications.

> My suggestion is in python 3
> ditch the ref counting, use a real garbage collector

I disagree with this, though. The benefits of deterministic GC are
huge and I'd like to see ref-counting semantics as part of the language
definition. That's a debate I just had in another thread, though, and
don't want to repeat.

> > Now, the GIL is independent of this; if you really need threading in
> > your situation (you share almost everything and have hugely complex
> > data structures that are difficult to maintain in shm) then you're
> > still going to run into GIL serialization. If you're doing a lot of
> > work in native code extensions this may not actually be a big
> > performance hit, if not it can be pretty bad.
>
> Actually, I'm not sure I understand you correctly. You're saying that
> in an environment like apache (with 250 threads or so) and my hugely
> complex shared data structures, that the GIL is going to cause a huge
> performance hit?

I didn't say that. It can be a big hit or it can be unnoticeable. It
depends on your application. You have to benchmark to know for sure.

But if you're trying to make a guess: if you're doing a lot of heavy
lifting in native modules then the GIL may be released during those
calls, and you might get good multithreading performance. If you're
doing lots of I/O requests the GIL is generally released during those
and things will be fine. If you're doing lots of heavy crunching in
Python, the GIL is probably held and can be a big performance issue.

Since your app sounds like it's basically written, there's not much
cause to guess; benchmark it and see if it's fast enough or not. If
so, don't spend time and effort optimizing.

Bryan Olson

unread,

Sep 5, 2006, 11:28:13 AM9/5/06

to

Paul Rubin wrote:
> "sjde...@yahoo.com" <sjde...@yahoo.com> writes:
>> If it's read/write data or you're not on a Unix platform, you can use
>> shared memory to shared it between many processes.
>>
>> Threads are way overused in modern multiexecution programming. The
>> decision on whether to use processes or threads should come down to
>> whether you want to share everything, or whether you have specific
>> pieces of data you want to share.
>
> Shared memory means there's a byte vector (the shared memory region)
> accessible to multiple processes. The processes don't use the same
> machine addresses to reference the vector. Any data structures
> (e.g. those containing pointers) shared between the processes have to
> be marshalled in and out of the byte vector instead of being accessed
> normally.

I think it's even worse. The standard Python library offers
shared memory, but not cross-process locks. Sharing read-write
memory looks like an automatic race condition. I guess one could
implement one of the primitive spin-lock based mutual exclusion
algorithms, but I think even that would depend on non-portable
assumptions about cache consistency.

--
--Bryan

Richard Brodie

unread,

Sep 5, 2006, 11:32:26 AM9/5/06

to

"km" <srikris...@gmail.com> wrote in message

news:mailman.20.11574689...@python.org...

> I know many of my friends who did not choose python for obvious reasons

> of the nature of thread execution in the presence of GIL which means
> that one is wasting sophisticated hardware resources.

It would probably be easier to find smarter friends than to remove the
GIL from Python.

km

unread,

Sep 5, 2006, 11:43:57 AM9/5/06

to Richard Brodie, pytho...@python.org

True, since smartness is a comparison, my friends who have chosen java
over python for considerations of a true threading support in a
language are smarter, which makes me a dumbo ! :-)

KM

> --
> http://mail.python.org/mailman/listinfo/python-list
>

Richard Brodie

unread,

Sep 5, 2006, 12:09:39 PM9/5/06

to

"km" <srikris...@gmail.com> wrote in message

news:mailman.21.11574710...@python.org...

> True, since smartness is a comparison, my friends who have chosen java
> over python for considerations of a true threading support in a
> language are smarter, which makes me a dumbo ! :-)

No, but I think you making unwise assumptions about performance.
You have to ask yourself: is Amdahl's law really hurting me?

In some situations Python could no doubt benefit from fine grained
locking. However, it's likely that scientific programming is not typically
one of them, because most of the heavy lifting is done in C or C++
extensions which can run in parallel if they release the GIL. Or you
are going to use a compute farm, and fork as many worker processes
as you have cores.

You might find these slides from SciPy 2004 interesting:
http://datamining.anu.edu.au/~ole/pypar/py4cfd.pdf

Steve Holden

unread,

Sep 5, 2006, 12:33:16 PM9/5/06

to pytho...@python.org

Given the effort that GIL-removal would take, I'm beginning to wonder if
PyPy doesn't offer a better way forward than CPython, in terms of
execution speed improvements returned per developer-hour.

sk...@pobox.com

unread,

Sep 5, 2006, 1:09:03 PM9/5/06

to Steve Holden, pytho...@python.org

Steve> Given the effort that GIL-removal would take, I'm beginning to
Steve> wonder if PyPy doesn't offer a better way forward than CPython,
Steve> in terms of execution speed improvements returned per
Steve> developer-hour.

How about execution speed improvements per hour of discussion about removing
the GIL? ;-)

Skip

sk...@pobox.com

unread,

Sep 5, 2006, 1:09:15 PM9/5/06

to Richard Brodie, pytho...@python.org

Richard> It would probably be easier to find smarter friends than to
Richard> remove the GIL from Python.

And if the friends you find are smart enough, they can remove the GIL for
you!

Skip

sjde...@yahoo.com

unread,

Sep 5, 2006, 1:11:38 PM9/5/06

to

Bryan Olson wrote:
> I think it's even worse. The standard Python library offers
> shared memory, but not cross-process locks.

File locks are supported by the standard library (at least on Unix,
I've not tried on Windows). They work cross-process and are a normal
method of interprocess locking even in C code.

Lawrence Oluyede

unread,

Sep 5, 2006, 1:27:39 PM9/5/06

to

Sandra-24 <sandra...@yahoo.com> wrote:

> Oh I'm aware of that, but it's not what I'm looking for. Mod_mono just
> lets you run ASP.NET on Apache. I'd much rather use Python :) Now if
> there was a way to run IronPython on Apache I'd be interested.

Take a look here:
http://lists.ironpython.com/pipermail/users-ironpython.com/2006-March/00
2049.html
and this thread:
http://www.mail-archive.com/us...@lists.ironpython.com/msg01826.html

--
Lawrence - http://www.oluyede.org/blog
"Nothing is more dangerous than an idea
if it's the only one you have" - E. A. Chartier

Lawrence Oluyede

unread,

Sep 5, 2006, 1:36:53 PM9/5/06

to

Lawrence Oluyede <rhy...@myself.com> wrote:

> Take a look here:
> http://lists.ironpython.com/pipermail/users-ironpython.com/2006-March/00
> 2049.html
> and this thread:
> http://www.mail-archive.com/us...@lists.ironpython.com/msg01826.html

Also this: http://www.codeproject.com/useritems/ipaspnet.asp

Google is you friend! :-)

sk...@pobox.com

unread,

Sep 5, 2006, 1:40:35 PM9/5/06

to Andre Meyer, pytho...@python.org

Andre> This seems to be an important issue and fit for discussion in the
Andre> context of Py3k. What is Guido's opinion?

Dunno. I've never tried channeling Guido before. You'd have to ask him.
Well, maybe Tim Peters will know. He channels Guido on a fairly regular
basis.

Skip

Sandra-24

unread,

Sep 5, 2006, 3:13:47 PM9/5/06

to

sjde...@yahoo.com wrote:
> You can do the same on Windows if you use CreateProcessEx to create the
> new processes and pass a NULL SectionHandle. I don't think this helps
> in your case, but I was correcting your impression that "you'd have to
> physically double the computer's memory for a dual core, or quadruple
> it for a quadcore". That's just not even near true.

Sorry, my bad. What I meant to say is that for my application I would
have to increase the memory linearly with the number of cores. I have
about 100mb of memory that could be shared between processes, but
everything else would really need to be duplicated.

> As I said, Apache used to run on Windows with multiple processes; using
> a version that supports that is one option. There are good reasons not
> to do that, though, so you could be stuck with threads.

I'm not sure it has done that since the 1.3 releases. mod_python will
work for that, but involves going way back in it's release history as
well. I really don't feel comfortable with that, and I don't doubt I'd
give up a lot of things I'd miss.

> Having memory protection is superior to not having it--OS designers
> spent years implementing it, why would you toss out a fair chunk of it?
> Being explicit about what you're sharing is generally better than not.

Actually, I agree. If shared memory will prove easier, then why not use
it, if the application lends itself to that.

> But as I said, threads are a better solution if you're sharing the vast
> majority of your memory and have complex data structures to share.
> When you're starting a new project, really think about whether they're
> worth the considerable tradeoffs, though, and consider the merits of a
> multiprocess solution.

There are merits, the GIL being one of those. I believe I can fairly
easily rework things into a multi-process environment by duplicating
memory. Over time I can make the memory usage more efficient by sharing
some data structures out, but that may not even be necessary. The
biggest problem is learning my way around Linux servers. I don't think
I'll choose that option initially, but I may work on it as a project in
the future. It's about time I got more familiar with Linux anyway.

> It's almost certainly not worth rewriting a large established

> codebase.

Lazy me is in perfect agreement.

> I disagree with this, though. The benefits of deterministic GC are
> huge and I'd like to see ref-counting semantics as part of the language
> definition. That's a debate I just had in another thread, though, and
> don't want to repeat.

I just took it for granted that a GC like Java and .NET use is better.
I'll dig up that thread and have a look at it.

> I didn't say that. It can be a big hit or it can be unnoticeable. It
> depends on your application. You have to benchmark to know for sure.
>
> But if you're trying to make a guess: if you're doing a lot of heavy
> lifting in native modules then the GIL may be released during those
> calls, and you might get good multithreading performance. If you're
> doing lots of I/O requests the GIL is generally released during those
> and things will be fine. If you're doing lots of heavy crunching in
> Python, the GIL is probably held and can be a big performance issue.

I don't do a lot of work in native modules, other than the standard
library things I use, which doesn't count as heavy lifting. However I
do a fair amount of database calls, and either the GIL is released by
MySQLdb, or I'll contribute a patch so that it is. At any rate, I will
measure, and I suspect the GIL will not be an issue.

-Sandra

Paul Rubin

unread,

Sep 5, 2006, 4:19:03 PM9/5/06

to

sk...@pobox.com writes:
> It was removed at one point in the dim, dark past (circa Python 1.4) on an
> experimental basis. Aside from the huge amount of work, it resulted in
> significantly lower performance for single-threaded apps (that is, the
> common case).

That's probably because they had to put locking and unlocking around
every access to a reference count. A real GC might have fixed that.

Jean-Paul Calderone

unread,

Sep 5, 2006, 5:00:54 PM9/5/06

to pytho...@python.org

It would have made a difference, surely. Whether that difference would have
been positive, negative, or unnoticable is a matter for benchmarking and
profiling.

Even if you eliminate reference counting, you still have memory allocation
to deal with. Python allocates approximately a jillion objects every time
you sneeze, and each of these goes through CPython's allocator, which needs
a lock to protect it from being corrupted by concurrent invocations.

It's not a simple matter to make CPython free-threaded (even so simple as
replacing all reference counting with another form of garbage collection),
although perhaps if half as much time were spent cutting code as is spent
discussing the matter, we might learn if there were any value in doing so.

Jean-Paul

Paul Rubin

unread,

Sep 5, 2006, 8:31:11 PM9/5/06

to

"sjde...@yahoo.com" <sjde...@yahoo.com> writes:
> Having memory protection is superior to not having it--OS designers
> spent years implementing it, why would you toss out a fair chunk of it?
> Being explicit about what you're sharing is generally better than not.

Part of the win of programming in Python instead of C is having the
language do memory management for you--no more null pointers
dereferences or malloc/free errors. Using shared memory puts all that
squarely back in your lap.

> I disagree with this, though. The benefits of deterministic GC are
> huge and I'd like to see ref-counting semantics as part of the language
> definition. That's a debate I just had in another thread, though, and
> don't want to repeat.

That's ok, it can be summarized quickly: it lets you keep saying

def func(filename):
f = open(filename)
do_something_with(f)
# exit from function scope causes f to automagically get closed,
# unless the "do_something_with" didn't know about this expectation
# and saved a reference for some reason.

instead of using the Python 2.5 construction

def func(filename):
with open(filename) as f:
do_something_with(f)
# f definitely gets closed when the "with" block exits

which more explicitly shows the semantics actually desired. Not that
"huge" a benefit as far as I can tell. Lisp programmers have gotten
along fine without it for 40+ years...

Jean-Paul Calderone

unread,

Sep 5, 2006, 9:35:20 PM9/5/06

to pytho...@python.org

On 05 Sep 2006 17:31:11 -0700, Paul Rubin <"http://phr.cx"@nospam.invalid> wrote:
>
> def func(filename):
> with open(filename) as f:
> do_something_with(f)
> # f definitely gets closed when the "with" block exits
>
>which more explicitly shows the semantics actually desired. Not that
>"huge" a benefit as far as I can tell. Lisp programmers have gotten
>along fine without it for 40+ years...

Uh yea. No lisp programmer has ever written a with-* function... ever.

Jean-Paul

Paul Rubin

unread,

Sep 5, 2006, 9:49:14 PM9/5/06

to

Jean-Paul Calderone <exa...@divmod.com> writes:
> >which more explicitly shows the semantics actually desired. Not that
> >"huge" a benefit as far as I can tell. Lisp programmers have gotten
> >along fine without it for 40+ years...
>
> Uh yea. No lisp programmer has ever written a with-* function... ever.

The context was Lisp programmers have gotten along fine without
counting on the refcounting GC semantics that sjdevnull advocates
Python stay with. GC is supposed to make it look like every object
stays around forever, and any finalizer that causes an explicit
internal state change in an extant object (like closing a file or
socket) is not in the GC spirit to begin with.

Paul Rubin

unread,

Sep 5, 2006, 10:30:36 PM9/5/06

to

I may be missing your point but I didn't realize you could use file
locks to synchronize shared memory in any useful way. File locks are
usually made and released when the file is opened and closed, or at
best through flock or fcntl calls. Shared memory locks should
generally be done with mechanisms like futex, that in the no-wait case
should not involve any system calls.

sjde...@yahoo.com

unread,

Sep 6, 2006, 3:06:11 AM9/6/06

to

I disagree, strongly. If you want "every object stays around forever"
semantics, you can just not free anything. GC is actually supposed to
free things that are unreachable at least when memory becomes tight,
and nearly every useful garbage collected language allows destructors
that could have effects visible to the rest of the program. Reference
counting allows more deterministic semantics that can eliminate
repeating scope information multiple times.

sjde...@yahoo.com

unread,

Sep 6, 2006, 3:09:19 AM9/6/06

to

Paul Rubin wrote:
> "sjde...@yahoo.com" <sjde...@yahoo.com> writes:
> > Having memory protection is superior to not having it--OS designers
> > spent years implementing it, why would you toss out a fair chunk of it?
> > Being explicit about what you're sharing is generally better than not.
>
> Part of the win of programming in Python instead of C is having the
> language do memory management for you--no more null pointers
> dereferences or malloc/free errors. Using shared memory puts all that
> squarely back in your lap.

Huh? Why couldn't you use garbage collection with objects allocated in
shm? The worst theoretical case is about the same programatically as
having garbage collected objects in a multithreaded program.

Python doesn't actually support that as of yet, but it could. In the
interim, if the memory you're sharing is array-like then you can
already take full advantage of multiprocess solutions in Python.

sjde...@yahoo.com

unread,

Sep 6, 2006, 3:13:57 AM9/6/06

to

Paul Rubin wrote:
> "sjde...@yahoo.com" <sjde...@yahoo.com> writes:
> > > I think it's even worse. The standard Python library offers
> > > shared memory, but not cross-process locks.
> >
> > File locks are supported by the standard library (at least on Unix,
> > I've not tried on Windows). They work cross-process and are a normal
> > method of interprocess locking even in C code.
>
> I may be missing your point but I didn't realize you could use file
> locks to synchronize shared memory in any useful way.

You can, absolutely. If you're sharing memory through mmap it's
usually the preferred solution; fcntl locks ranges of an open file, so
you lock exactly the portions of the mmap that you're using at a given
time.

It's not an unusual use at all, Unix programs have used file locks in
this manner for upwards of a decade--things like the Apache public
runtime use fcntl or flock for interprocess mutexes, and they're quite
efficient. (The futexes you mentioned are a very recent Linux
innovation).

Paul Rubin

unread,

Sep 6, 2006, 3:20:18 AM9/6/06

to

"sjde...@yahoo.com" <sjde...@yahoo.com> writes:
> > Part of the win of programming in Python instead of C is having the
> > language do memory management for you--no more null pointers
> > dereferences or malloc/free errors. Using shared memory puts all that
> > squarely back in your lap.
>
> Huh? Why couldn't you use garbage collection with objects allocated in
> shm? The worst theoretical case is about the same programatically as
> having garbage collected objects in a multithreaded program.

I'm talking about using a module like mmap or the now-AWOL shm module,
which gives you a big shared byte array that you have to do your own
memory management in. POSH is a slight improvement over this, since
it does its own ref counting, but that is slightly leaky, and POSH has
to marshal every object into the shared area.

> Python doesn't actually support that as of yet, but it could.

Well, yeah, with a radically different memory system that's even
more pie in the sky than the GIL and refcount removal that we've
been discussing.

> In the interim, if the memory you're sharing is array-like then you
> can already take full advantage of multiprocess solutions in Python.

But then you're back to doing your own memory management within that
array. Sure, that's tolerable for some applications (C programmers do
it for everything), but not exactly joy.

And as already mentioned, the stdlib currently gives no way to
implement shared memory locks (file locks aren't the same thing).
POSH and the old shm library do, but POSH is apparently not that
reliable, and nobody knows what happened to shm.

Paul Rubin

unread,

Sep 6, 2006, 3:28:07 AM9/6/06

to

"sjde...@yahoo.com" <sjde...@yahoo.com> writes:
> You can, absolutely. If you're sharing memory through mmap it's
> usually the preferred solution; fcntl locks ranges of an open file, so
> you lock exactly the portions of the mmap that you're using at a given
> time.

How can it do that without having to touch the PTE for every single
page in the range, which might be gigabytes? For that matter, how can
it do that on regions smaller than a page? And how does another
process query whether a region is locked, without taking a kernel trap
if it's locked? This sounds absolutely horrendous compared to a
futex, which should usually be just one or two user-mode instructions
and no context switches.

> It's not an unusual use at all, Unix programs have used file locks in
> this manner for upwards of a decade--things like the Apache public
> runtime use fcntl or flock for interprocess mutexes, and they're quite
> efficient. (The futexes you mentioned are a very recent Linux
> innovation).

Apache doesn't use shared memory in the same way that something like a
database does, so maybe it can more easily tolerate the overhead of
fcntl. Futex is just a somewhat standardized way to do what
programmers have done less portably since the dawn of multiprocessors.

Steve Holden

unread,

Sep 6, 2006, 3:46:51 AM9/6/06

to pytho...@python.org

Ah, right. So then we end up with processes that have to suspend because
they can't collect garbage? "Could" covers a multitude of sins, and
distributed garbage collection across shard memory is by no means a
trivial problem.

Steve Holden

unread,

Sep 6, 2006, 3:52:34 AM9/6/06

to pytho...@python.org

Clearly you guys are determined to disagree. It seemed obvious to me
that Paul's reference to making it "look like every object stays around
forever" doesn't exclude their being garbage-collected once the program
no longer contains any reference to them.

You simplify the problems involved with GC-triggered destructors to the
point of triviality. There are exceedingly subtle and difficult issues
here: read some of the posts to the python-dev list about such issues
and then see if you still feel the same way.

sjde...@yahoo.com

unread,

Sep 6, 2006, 5:03:19 AM9/6/06

to

No doubt that it's hard. On the other hand, current CPython captures
programmer-friendly behavior quite well. My main assertions are that:
1. Saying that GC is just freeing memory after it won't be referenced
anymore is disingenuous; it is _already_ common practice in Python (and
other languages) for destructors to close files, sockets, and otherwise
deallocate non-memory resources.
2. The ref-counting semantics are extremely valuable to the programmer.
Throwing them out without careful consideration is a bad
idea--ref-counting is _not_ simply one GC implementation among many, it
actually offers useful semantics and the cost of giving up those
semantics should be considered before throwing out refcounting.

I'm actually willing to be convinced on (2); I think that what
ref-counting offers is a massive improvement over nondeterministic GC,
and it seems that refcounting has historically been supportable, but if
there are real tangible benefits to python programmers from eliminating
it that outweigh the niceties of deterministic GC, then I'd be okay
with sacrificing it. It just seems like people are very cavalier about
giving up something that is a very nice feature in order to make other
implementations simpler.

(1) I think is here to stay, if you're going to tell programmers that
their destructors can't make program-visible changes (e.g. closing the
database connection when a dbconn is destroyed), that's a _huge_ change
from current practice that needs serious debate.

Paul Rubin

unread,

Sep 6, 2006, 5:14:28 AM9/6/06

to

"sjde...@yahoo.com" <sjde...@yahoo.com> writes:
> Throwing them out without careful consideration is a bad
>idea--ref-counting is _not_ simply one GC implementation among many, it
>actually offers useful semantics and the cost of giving up those
>semantics should be considered before throwing out refcounting.

It's too late to consider anything before throwing out refcounting.
Refcounting has already been thrown out (in Jython, IronPython, and
maybe PyPy). It's just an implementation artifact of CPython and MANY
other language implementations have gotten along perfectly well
without it.

> (1) I think is here to stay, if you're going to tell programmers that
> their destructors can't make program-visible changes (e.g. closing the
> database connection when a dbconn is destroyed), that's a _huge_ change
> from current practice that needs serious debate.

We had that debate already (PEP 343). Yes, there is some sloppy
current practice by CPython users that relies on the GC to close the
db conn. That practice already fails in several other Python
implementations and with PEP 343, we now have a clean way to fix it.
I don't understand why you're so fixated on keeping the sloppy method
around. The benefit is marginal at best. If you want stack-like
deallocation of something, ask for it explicitly.

mystilleef

unread,

Sep 6, 2006, 6:29:08 AM9/6/06

to

You can use multiple processes to simulate threads via an IPC
mechanism. I use D-Bus to achieve this.

http://www.freedesktop.org/wiki/Software/dbus

km wrote:
> Hi all,
> Are there any alternate ways of attaining true threading in python ?
> if GIL doesnt go then does it mean that python is useless for
> computation intensive scientific applications which are in need of
> parallelization in threading context ?
>
> regards,
> KM
> ---------------------------------------------------------------------------
> On 4 Sep 2006 07:58:00 -0700, bayerj <bay...@in.tum.de> wrote:
> > Hi,
> >
> > GIL won't go. You might want to read
> > http://blog.ianbicking.org/gil-of-doom.html .
> >
> > Regards,
> > -Justin
> >
> > --
> > http://mail.python.org/mailman/listinfo/python-list
> >

Bryan Olson

unread,

Sep 6, 2006, 12:08:23 PM9/6/06

to

Ah, O.K. Like Paul, I was unaware how Unix file worked with
mmap.

--
--Bryan

sjde...@yahoo.com

unread,

Sep 6, 2006, 2:22:18 PM9/6/06

to

Paul Rubin wrote:
> "sjde...@yahoo.com" <sjde...@yahoo.com> writes:

> > (1) I think is here to stay, if you're going to tell programmers that
> > their destructors can't make program-visible changes (e.g. closing the
> > database connection when a dbconn is destroyed), that's a _huge_ change
> > from current practice that needs serious debate.
>
> We had that debate already (PEP 343). Yes, there is some sloppy
> current practice by CPython users that relies on the GC to close the
> db conn.

This point is unrelated to with or ref-counting. Even the standard
library will close file objects when they are GC'd. If this is not
acceptable, it's a major change; that's why I say (1) is here to stay.
But I think we're misunderstanding each other somehow on this point (I
don't think you're saying that the standard library is sloppily coded
in this regard), I just don't know how.

Bryan Olson

unread,

Sep 6, 2006, 2:22:53 PM9/6/06

to

I wrote:
> Ah, O.K. Like Paul, I was unaware how Unix file worked with
> mmap.

Insert "locking" after "file".

--
--Bryan

lcaamano

unread,

Sep 6, 2006, 2:39:05 PM9/6/06

to

Here's a relevant post

http://mail.python.org/pipermail/python-3000/2006-April/001051.html

or

http://tinyurl.com/fod9u

--
lpc

Paul Rubin

unread,

Sep 6, 2006, 4:29:33 PM9/6/06

to

"sjde...@yahoo.com" <sjde...@yahoo.com> writes:
> > We had that debate already (PEP 343). Yes, there is some sloppy
> > current practice by CPython users that relies on the GC to close the
> > db conn.
>
> This point is unrelated to with or ref-counting. Even the standard
> library will close file objects when they are GC'd.

I don't think so. AFAIK, there is no such thing as "when they are
GC'd" in the language spec. There is, at best, "if they are GC'd",
not "when". You are not guaranteed that GC ever takes place. At
best, those destructors run when the application totally shuts down,
as the process is about to exit, but I'm not sure whether the spec
promises even that.

> If this is not acceptable, it's a major change;

It is guaranteeing GC running at any particular time that would be a
major change. There is no such guarantee right now; there is just an
implementation artifact in CPython that has led to careless habits
among some users.

Jean-Paul Calderone

unread,

Sep 6, 2006, 4:46:22 PM9/6/06

to pytho...@python.org

On 06 Sep 2006 13:29:33 -0700, Paul Rubin <"http://phr.cx"@nospam.invalid> wrote:
>"sjde...@yahoo.com" <sjde...@yahoo.com> writes:
>> > We had that debate already (PEP 343). Yes, there is some sloppy
>> > current practice by CPython users that relies on the GC to close the
>> > db conn.
>>
>> This point is unrelated to with or ref-counting. Even the standard
>> library will close file objects when they are GC'd.
>
>I don't think so. AFAIK, there is no such thing as "when they are
>GC'd" in the language spec. There is, at best, "if they are GC'd",
>not "when". You are not guaranteed that GC ever takes place. At
>best, those destructors run when the application totally shuts down,
>as the process is about to exit, but I'm not sure whether the spec
>promises even that.
>

It doesn't. Fortunately, the platform will close your files for you
when your process exits. ;)

>> If this is not acceptable, it's a major change;
>
>It is guaranteeing GC running at any particular time that would be a
>major change. There is no such guarantee right now; there is just an
>implementation artifact in CPython that has led to careless habits
>among some users.

Actually, there is an API for instructing the GC how frequently to run,
in addition to the explicit API for causing the GC to run when you invoke
it. See the gc module for more information.

Jean-Paul

sjde...@yahoo.com

unread,

Sep 6, 2006, 5:00:08 PM9/6/06

to

Paul Rubin wrote:
> "sjde...@yahoo.com" <sjde...@yahoo.com> writes:
> > > We had that debate already (PEP 343). Yes, there is some sloppy
> > > current practice by CPython users that relies on the GC to close the
> > > db conn.
> >
> > This point is unrelated to with or ref-counting. Even the standard
> > library will close file objects when they are GC'd.
>
> I don't think so. AFAIK, there is no such thing as "when they are
> GC'd" in the language spec.

If they don't get GC'd, then "when they are GC'd" is never. The point
is that the standard library _does_ close files and take other
program-visible actions in __del__ methods; I'm unclear on if you think
that doing so is actually sloppy practice (as opposed to users of the
standard library relying on the GC running in some deterministic manner
or even at all, which you clearly do think is sloppy practice).

I originally thought that was what you meant when you said that "GC is

supposed to make it look like every object stays around forever, and
any finalizer that causes an explicit internal state change in an
extant object (like closing a file or socket) is not in the GC spirit

to begin with." but going back and reading it I'm not sure.

Paul Rubin

unread,

Sep 6, 2006, 5:19:39 PM9/6/06

to

"sjde...@yahoo.com" <sjde...@yahoo.com> writes:
> If they don't get GC'd, then "when they are GC'd" is never. The point
> is that the standard library _does_ close files and take other
> program-visible actions in __del__ methods; I'm unclear on if you think
> that doing so is actually sloppy practice

I don't know if I'd say "sloppy"; it's certainly messy and can lead to
complex bugs, but there are pragmatic reasons for wanting to do it in
some situations and it can be used to good purpose if you're careful.
But I think people sometimes expect too much from it, as Steve Holden
indicated.

It should not be compared with C++ destructors, since C++ deallocation
is explicit. I'm not sure what other languages with actual GC support
anything like it. I implemented it in Lisp once but found it
disconcerting since some of the stuff removed by GC in the system I
was working on were no-longer-referenced graphics objects on the
screen, which would thereby wink out of existence at surprising times.
I concluded that the application should keep track of stuff like that
instead of tossing them into the air for the GC to clean up.

> I originally thought that was what you meant when you said that "GC is
> supposed to make it look like every object stays around forever, and
> any finalizer that causes an explicit internal state change in an
> extant object (like closing a file or socket) is not in the GC spirit
> to begin with." but going back and reading it I'm not sure.

GC is supposed to just release resources that are no longer referenced
(since you can't tell whether they're still around, you can act as if
they are always stay around). For files, maybe closing them is
ok--the file abstraction mimics synchronous i/o and in the idealized
version, closing them is a do-nothing and the closure is undetectable.
For sockets (which are supposed to send particular messages over the
wire when the socket shuts down) it's less appealing.

Antoon Pardon

unread,

Sep 7, 2006, 4:02:57 AM9/7/06

to

On 2006-09-06, sjde...@yahoo.com <sjde...@yahoo.com> wrote:
> Paul Rubin wrote:
>> "sjde...@yahoo.com" <sjde...@yahoo.com> writes:
>> > (1) I think is here to stay, if you're going to tell programmers that
>> > their destructors can't make program-visible changes (e.g. closing the
>> > database connection when a dbconn is destroyed), that's a _huge_ change
>> > from current practice that needs serious debate.
>>
>> We had that debate already (PEP 343). Yes, there is some sloppy
>> current practice by CPython users that relies on the GC to close the
>> db conn.
>
> This point is unrelated to with or ref-counting. Even the standard
> library will close file objects when they are GC'd.

This is not totally true. My experience is that if you use the
tarfile module any tarfile that was opened for appending or
writing risks being corrupted if it isn't closed explicedly.

--
Antoon Pardon