Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

multithreading

0 views
Skip to first unread message

Rob Hall

unread,
May 19, 2002, 1:54:40 PM5/19/02
to
Can anyone point me to a useful tutorial on multithreading?

Rob


Aahz

unread,
May 19, 2002, 3:00:44 PM5/19/02
to
In article <3ce7e613$0$37...@echo-01.iinet.net.au>,

Rob Hall <bloke at ii dot net> wrote:
>
>Can anyone point me to a useful tutorial on multithreading?

Take a look at my web page.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

"In the end, outside of spy agencies, people are far too trusting and
willing to help." --Ira Winkler

Nils Kassube

unread,
May 19, 2002, 4:05:30 PM5/19/02
to
"Rob Hall" <bloke at ii dot net> writes:

> Can anyone point me to a useful tutorial on multithreading?

One advice: Avoid multithreading like the plague.

Multithreading is a very big stability risk if you don't know exactly
what you are doing, i.e. most programmers most of the time.

Christopher Saunter

unread,
May 20, 2002, 7:08:24 AM5/20/02
to
Nils Kassube (ni...@kassube.de) wrote:

I have found multithreaded programing under Python to be rock solid, if a
few simple rules are followed. Well, three infact.

1. Prefereably pass information to threads when they are created, not at
run time.

2. All run time comms between threads is via a dictionary, with two
functions that use semaphore protection to set/read variables to /
from the dictionary. Maybe not so good where you need a lot of comms
between threads?

3. Using wxPython, wxMutexGuiEnter() and wxMutexGuiLeave() are sufficient
and necessary for gui work from non main threads. I imagine most GUI
libraries have a similar concept.

Following these three rules, I have a program that can hapilly run 5
threads, all accessing the GUI, and with the addition of a few carefully
placed time.sleep(x) calls remains highly responsive.

---

cds

Aahz

unread,
May 20, 2002, 9:23:13 AM5/20/02
to
In article <87sn4n6...@kursk.kassube.de>,

Depends what you're trying to do. If you're trying to create a
multi-threaded spider, that's pretty close to dead-simple, particularly
if you pay attention to my tutorial slides. GUIs can be a bit more
difficult, but using Queue takes care of most problems.

Jacob Hallen

unread,
May 20, 2002, 10:40:28 AM5/20/02
to
In article <87sn4n6...@kursk.kassube.de>,
Nils Kassube <ni...@kassube.de> wrote:

I plan to hold a useful tutorial at the Europython Conference.

When you know some basic facts about the Threading module and know
a few standard idioms for threads, they can be very useful and not
error prone at all. However, they do require very much discipline of
the user all of the time.

Jacob Hallén


--

François Pinard

unread,
May 20, 2002, 10:54:19 AM5/20/02
to
[Aahz]

> In article <87sn4n6...@kursk.kassube.de>,
> Nils Kassube <ni...@kassube.de> wrote:

> >Multithreading is a very big stability risk if you don't know exactly
> >what you are doing, i.e. most programmers most of the time.

> Depends what you're trying to do.

Multi-threading has been very welcome in some of my projects. However, I'm
rather uncomfortable about precisely knowing whether various Python usages
are atomic or not, and which parts of the Python library are thread-safe.
Someone once suggested: "Try, and you will see!". The fact that something
works never proves it is correct, nor that it will always work. Short of
precise documentation on these things, I feel a bit lost when I observe lack
of stability. So I sometimes abuse of this thread-off option in my things.

--
François Pinard http://www.iro.umontreal.ca/~pinard


Peter Hansen

unread,
May 20, 2002, 1:04:57 PM5/20/02
to

Nothing is atomic except sending messages to a Queue. At least, if you start
with that premise you probably can't go wrong...

In what types of applications have you observed instability, and can you
describe the types of thread interaction you were using? Maybe it would
be instructive for beginners to learn alternative, known-clean approaches
to solving the same problems.

-Peter

Aahz

unread,
May 20, 2002, 2:37:07 PM5/20/02
to
In article <mailman.1021906534...@python.org>,

My response is that instead of trying to take advantage of the few
atomic Python constructs, instead code defensively and always use
thread-safe mechanisms for passing information. Because Python has a
powerful and simple Queue, this is straightforward to accomplish.

David LeBlanc

unread,
May 20, 2002, 3:16:32 PM5/20/02
to
Lock, Semaphore and friends mentioned in the pythondoc (2.2.1) aren't
atomic? If not, then they're misnamed since atomicity is a required property
of such operations.

David LeBlanc
Seattle, WA USA

<snip>


> Nothing is atomic except sending messages to a Queue. At least,
> if you start
> with that premise you probably can't go wrong...

<snip>
>
> -Peter
> --
> http://mail.python.org/mailman/listinfo/python-list

Aahz

unread,
May 20, 2002, 9:58:43 PM5/20/02
to
In article <mailman.102192196...@python.org>,

David LeBlanc <whi...@oz.net> wrote:
>
>Lock, Semaphore and friends mentioned in the pythondoc (2.2.1) aren't
>atomic? If not, then they're misnamed since atomicity is a required
>property of such operations.

What Peter meant is that Queue is the only single atomic operation for
passing data between threads. Lock & Semaphore are atomic, but they
aren't by themselves sufficient for passing data.

David LeBlanc

unread,
May 20, 2002, 10:15:09 PM5/20/02
to
Although, of course, you can create any access managed datatype as long as
the primatives are there...

David LeBlanc
Seattle, WA USA

> --
> http://mail.python.org/mailman/listinfo/python-list

Peter Hansen

unread,
May 21, 2002, 12:29:22 AM5/21/02
to
David LeBlanc wrote (top-posting):

>
> Aahz wrote:
> > David LeBlanc <whi...@oz.net> wrote:
> > >Lock, Semaphore and friends mentioned in the pythondoc (2.2.1) aren't
> > >atomic? If not, then they're misnamed since atomicity is a required
> > >property of such operations.
> >
> > What Peter meant is that Queue is the only single atomic operation for
> > passing data between threads. Lock & Semaphore are atomic, but they
> > aren't by themselves sufficient for passing data.
>
> Although, of course, you can create any access managed datatype as long as
> the primatives are there...

At great risk of creating all kinds of difficulty like deadlocks
if you are not very experienced and knowledgeable about such things.

Queue is much, much safer for someone who isn't sure, which was the
point of my earlier post.

-Peter

Cliff Wells

unread,
May 23, 2002, 4:21:58 PM5/23/02
to

I think this is a bit of an overstatement. Many problems are best expressed as
multithreaded programs. Trying to solve a naturally multithreaded problem as a
single-threaded app can be more complex and error-prone than the natural
multithreaded solution. The argument about knowing exactly what you are doing
could be easily applied to any moderately complex programming endeavor (network
programming, GUI programming, etc). Should these be avoided as well? In fact,
programming itself poses the same risks. Learning to write threaded programs is
a natural step in a programmer's development.

--
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308 (800) 735-0555 x308


Skip Montanaro

unread,
May 23, 2002, 4:41:07 PM5/23/02
to
>> > Can anyone point me to a useful tutorial on multithreading?

>> One advice: Avoid multithreading like the plague.

Cliff> I think this is a bit of an overstatement. Many problems are
Cliff> best expressed as multithreaded programs.

I agree. I have an xml-rpc server that was single-threaded for several
years. I always meant to multi-thread it, but I feared introducing
instability in the server. When I finally bit the bullet I discovered it
wasn't all that difficult. You have to exercise care to make sure all your
shared objects are protected by locks or are contained in Queue objects.
You may also have to internally cap the number of active threads. For
example, my xml-rpc server creates a thread to handle each connection. The
number of threads is proportional to the number of requests coming into the
websites it serves. The number of active threads is effectively capped by a
Queue object which contains a set of cached MySQLdb connection objects.

--
Skip Montanaro (sk...@pobox.com - http://www.mojam.com/)
"Excellant Written and Communications Skills required" - seen on chi.jobs


Peter Hansen

unread,
May 23, 2002, 8:13:21 PM5/23/02
to
Skip Montanaro wrote:
>
> >> > Can anyone point me to a useful tutorial on multithreading?
>
> >> One advice: Avoid multithreading like the plague.
>
> Cliff> I think this is a bit of an overstatement. Many problems are
> Cliff> best expressed as multithreaded programs.
>
> I agree. I have an xml-rpc server that was single-threaded for several
> years. I always meant to multi-thread it, but I feared introducing
> instability in the server. When I finally bit the bullet I discovered it
> wasn't all that difficult. You have to exercise care to make sure all your
> shared objects are protected by locks or are contained in Queue objects.
> You may also have to internally cap the number of active threads. For
> example, my xml-rpc server creates a thread to handle each connection. The
> number of threads is proportional to the number of requests coming into the
> websites it serves. The number of active threads is effectively capped by a
> Queue object which contains a set of cached MySQLdb connection objects.

Cool! Do you mean you prepare the Queue ahead of time with a fixed
number of those objects, then any attempt to create a new thread blocks
on the Queue until a previous thread has terminated and released its
resource back into the Queue with a put(), presumably inside a 'finally'
block?

I think something like that would have simplified a little server we
just wrote. Never thought of that approach before.

-Peter

Skip Montanaro

unread,
May 23, 2002, 11:08:26 PM5/23/02
to

>> The number of active threads is effectively capped by a Queue object
>> which contains a set of cached MySQLdb connection objects.

Peter> Cool! Do you mean you prepare the Queue ahead of time with a
Peter> fixed number of those objects, then any attempt to create a new
Peter> thread blocks on the Queue until a previous thread has terminated
Peter> and released its resource back into the Queue with a put(),
Peter> presumably inside a 'finally' block?

More or less, yes. Once a thread is finished with a db connection, it
places it back on the queue. It is not obligated to terminate at that
point. It can try and grab other resources it needs.

Every chunk of code where I lock something looks something like:

self.cache_lock.acquire()
try:
fiddle_the_cache...
finally:
self.cache_lock.release()

The dance with the Queue object full of database connections doesn't
actually use try/finally because it has a fairly weird set of interactions -
restarting some queries if they fail for specific reasons, reconnecting if a
connection has logged too many errors, etc. I'm fairly careful to catch all
the possible exceptions (yes, I have a general except: clause).

Nils Kassube

unread,
May 24, 2002, 9:10:42 AM5/24/02
to
Cliff Wells <logiplex...@earthlink.net> writes:

> I think this is a bit of an overstatement. Many problems are best
> expressed as multithreaded programs. Trying to solve a naturally
> multithreaded problem as a single-threaded app can be more complex
> and error-prone than the natural

man fork

Often you do not need shared memory space. Dealing with shared memory
and multiple threads can introduce subtle bugs that will be extremely
hard to reproduce. Using multiple processes is in many cases the
better solution (on real operating systems).

François Pinard

unread,
May 24, 2002, 6:03:15 PM5/24/02
to
[Aahz]

> My response is that instead of trying to take advantage of the few
> atomic Python constructs, instead code defensively and always use
> thread-safe mechanisms for passing information. Because Python has a
> powerful and simple Queue, this is straightforward to accomplish.

But abusing Queues for very simple things, a bit everywhere, might yield
code bloat, and impinge readability. This is a bit like if someone was
inviting everyone to abuse fixed point integer arithmetic all over as a
way to program defensively against floating point arithmetic.

The key point is proper documentation. Even saying that a behaviour is
undefined is good documentation, as it teaches what should be avoided.

One could be paranoid and setup queues and server to serve `os.listdir()',
say, in fear that two threads could not simultaneously use that library
function. We can go overboard doing such things, maybe without any kind
of real necessity. Best is to know how thing works, that is, what is
guaranteed and can be relied upon, and what is not guaranteed, and need
synchronisation mechanisms. Mere testing is out of question, as a working
program is no proof of a correct usage. Abusing of a few synchronisation
primitives is no good either: I hope being able to know Python well enough
to feel it is on my side, and not have to program defensively as if it
was a lost cause trying to understand how Python is meant to be used.

François Pinard

unread,
May 24, 2002, 5:49:14 PM5/24/02
to
[Peter Hansen]

> Nothing is atomic except sending messages to a Queue. At least, if you
> start with that premise you probably can't go wrong...

But that premise looks much exaggerated. Python offers a nice variety
of threading primitives, and it would look excessive limiting myself to
Queue only. Aren't the others dependable? They should be. I surely like
using them whenever they seem appropriate.

> In what types of applications have you observed instability, and can
> you describe the types of thread interaction you were using? Maybe it
> would be instructive for beginners to learn alternative, known-clean
> approaches to solving the same problems.

It could yield an interesting discussion indeed. My applications might not
have all the simplicity for such a discussion to be all flowing however,
and the mere comments in the code are a bit lengthy already :-). But I'll
try to find some time for sharing the problems I see, and maybe put the
code available somewhere, if people feel like sharing discussion on this.

Aahz

unread,
May 28, 2002, 9:20:58 PM5/28/02
to
In article <mailman.1022277883...@python.org>,

=?iso-8859-1?q?Fran=E7ois?= Pinard <pin...@iro.umontreal.ca> wrote:
>[Aahz]
>>
>> My response is that instead of trying to take advantage of the few
>> atomic Python constructs, instead code defensively and always use
>> thread-safe mechanisms for passing information. Because Python has a
>> powerful and simple Queue, this is straightforward to accomplish.
>
>But abusing Queues for very simple things, a bit everywhere, might yield
>code bloat, and impinge readability. This is a bit like if someone was
>inviting everyone to abuse fixed point integer arithmetic all over as a
>way to program defensively against floating point arithmetic.

You're both misreading me and overstating your point, IMO. I completely
agree that RLock() is a valuable and necessary tool for managing critical
sections of code. However, I stand by my claim that Queue() should be
the primary mechanism for passing data around. Getting the semantics of
Event() and Semaphore() correct (not even talking about Condition() --
<shudder>) can be extremely difficult for all but the simplest cases,
leading to application deadlock.

Queue is both powerful and simple, and I therefore invoke Python's
"There's Only One Way" principle.

Mark Hammond

unread,
May 28, 2002, 10:21:53 PM5/28/02
to
Aahz wrote:
> In article <mailman.1022277883...@python.org>,
> =?iso-8859-1?q?Fran=E7ois?= Pinard <pin...@iro.umontreal.ca> wrote:
>
>>[Aahz]
>>
>>>My response is that instead of trying to take advantage of the few
>>>atomic Python constructs, instead code defensively and always use
>>>thread-safe mechanisms for passing information. Because Python has a
>>>powerful and simple Queue, this is straightforward to accomplish.
>>
>>But abusing Queues for very simple things, a bit everywhere, might yield
>>code bloat, and impinge readability. This is a bit like if someone was
>>inviting everyone to abuse fixed point integer arithmetic all over as a
>>way to program defensively against floating point arithmetic.
>
>
> You're both misreading me and overstating your point, IMO. I completely
> agree that RLock() is a valuable and necessary tool for managing critical
> sections of code. However, I stand by my claim that Queue() should be
> the primary mechanism for passing data around. Getting the semantics of
> Event() and Semaphore() correct (not even talking about Condition() --
> <shudder>) can be extremely difficult for all but the simplest cases,
> leading to application deadlock.
>
> Queue is both powerful and simple, and I therefore invoke Python's
> "There's Only One Way" principle.

You can invoke whatever you like, but it doesn't change anything for
anyone else ;)

I believe advocating the Queue module as the "one way" is naive. The
Queue module is very useful, and indeed has solved many threading
problems in an elegant way for me - however, in my experience, it has
been used in less than 50% of the times I have needed mutli-threaded
synchronization.

If-your-only-tool-is-a-hammer-everything-starts-looking-like-a-thumb ly

Mark.

Aahz

unread,
May 28, 2002, 11:33:56 PM5/28/02
to
In article <3CF43BA6...@skippinet.com.au>,

Mark Hammond <mham...@skippinet.com.au> wrote:
>Aahz wrote:
>> In article <mailman.1022277883...@python.org>,
>> =?iso-8859-1?q?Fran=E7ois?= Pinard <pin...@iro.umontreal.ca> wrote:
>>>[Aahz]
>>>
>>>>My response is that instead of trying to take advantage of the few
>>>>atomic Python constructs, instead code defensively and always use
>>>>thread-safe mechanisms for passing information. Because Python has a
>>>>powerful and simple Queue, this is straightforward to accomplish.
>>>
>>>But abusing Queues for very simple things, a bit everywhere, might yield
>>>code bloat, and impinge readability. This is a bit like if someone was
>>>inviting everyone to abuse fixed point integer arithmetic all over as a
>>>way to program defensively against floating point arithmetic.
>>
>>
>> You're both misreading me and overstating your point, IMO. I completely
>> agree that RLock() is a valuable and necessary tool for managing critical
>> sections of code. However, I stand by my claim that Queue() should be
>> the primary mechanism for passing data around. Getting the semantics of
>> Event() and Semaphore() correct (not even talking about Condition() --
>> <shudder>) can be extremely difficult for all but the simplest cases,
>> leading to application deadlock.
>>
>> Queue is both powerful and simple, and I therefore invoke Python's
>> "There's Only One Way" principle.
>
>You can invoke whatever you like, but it doesn't change anything for
>anyone else ;)

If you really want to argue about it, go take it up with Tim; I am
merely the acolyte.

>I believe advocating the Queue module as the "one way" is naive. The
>Queue module is very useful, and indeed has solved many threading
>problems in an elegant way for me - however, in my experience, it has
>been used in less than 50% of the times I have needed mutli-threaded
>synchronization.

Once again, I'm talking about data passing. If pure synchronization is
what you're talking about, then Queue isn't necessarily relevant.
However, I'd claim that once data passing is involved, you'd be
hard-pressed to provide an example where what you used instead of Queue
is significantly better. That's the whole point of the "One Way"
paradigm -- use as few idioms as possible for the low-level solutions to
save brain-power for the higher-order problems.

François Pinard

unread,
May 28, 2002, 11:50:48 PM5/28/02
to
[Aahz]

> [...] I therefore invoke Python's "There's Only One Way" principle.

That principle still existed with Python 1.5.2, but is fading out since
then, and does not reflect Python reality anymore. Python now offers
many ways in various fields of the language. I too, like you, feel a bit
nostalgic of the past simplicity :-). On the other hand, all the recent
improvements have their own value.

Aahz

unread,
May 29, 2002, 4:14:09 PM5/29/02
to
In article <mailman.1022644310...@python.org>,

=?iso-8859-1?q?Fran=E7ois?= Pinard <pin...@iro.umontreal.ca> wrote:
>[Aahz]
>>
>> [...] I therefore invoke Python's "There's Only One Way" principle.
>
>That principle still existed with Python 1.5.2, but is fading out since
>then, and does not reflect Python reality anymore. Python now offers
>many ways in various fields of the language. I too, like you, feel a bit
>nostalgic of the past simplicity :-). On the other hand, all the recent
>improvements have their own value.

In terms of strict reality, the principle was never 100% accurate.
Don't forget that there are many Pythonic principles ("import this" in
Python 2.2+), and it is often not possible to satisfy all of them
simultaneously. But I'll bet you find it difficult to name a principle
that Queue fails to satisfy.

François Pinard

unread,
May 29, 2002, 6:48:15 PM5/29/02
to
[Aahz]

> But I'll bet you find it difficult to name a principle that Queue fails
> to satisfy.

While I'm very happy to use Queue where I feel it well fits, it looks like
a bit of overkill in other circumstances.

Using Queue also means creating client-server relationships for handling data
movement. In a threaded project, implemented queues are quite fundamental in
the overall description of the project design and algorithms, but I would not
overly multiply queues for each and every tiny aspect of data movement, when
mere locks are sufficient. Multiplying queues for everything and everywhere
in a project might impair its overall legibility, drawning the fish.

So, the principle I would be tempted to invoke against abusing of Queue is
the principle of simplicity. We both agree on that, when Queue is proper,
that is, when the appropriate client-server relation already exists in a
project, Queue represents a simple, straightforward and legible approach.

Kragen Sitaker

unread,
May 29, 2002, 10:07:25 PM5/29/02
to
pin...@iro.umontreal.ca (François Pinard) writes:
> Using Queue also means creating client-server relationships for handling data
> movement. In a threaded project, implemented queues are quite fundamental in
> the overall description of the project design and algorithms, but I would not
> overly multiply queues for each and every tiny aspect of data movement, when
> mere locks are sufficient. Multiplying queues for everything and everywhere
> in a project might impair its overall legibility, drawning the fish.

I've been following this discussion with interest, but I'm afraid it's
a bit abstract for me. I haven't written a lot of multithreaded
systems. Can you give an example of a case where using a queue
instead of just a lock makes the code a lot more complicated?

Tim Peters

unread,
May 29, 2002, 11:24:53 PM5/29/02
to
[Mark Hammond]

> You can invoke whatever you like, but it doesn't change anything for
> anyone else ;)
>
> I believe advocating the Queue module as the "one way" is naive. The
> Queue module is very useful, and indeed has solved many threading
> problems in an elegant way for me - however, in my experience, it has
> been used in less than 50% of the times I have needed mutli-threaded
> synchronization.

That's because you're not thinking straight. Here's an ugly critical
section with a dangerous primitive lock:

alock.acquire()
do something critical
alock.release()

Here's a beautiful critical section with a robust Queue q:

q.get()
do something critical
q.put("this may look like a string, but it's a lock")

See? Just make sure every critical section tries to get from the queue
before putting something on it, and make sure nobody puts something on the
queue when they shouldn't. Also put something on q right after you create
it, so the first q.get() doesn't block forever. That's much easier than
remembering not to acquire a lock right after you create one.

> If-your-only-tool-is-a-hammer-everything-starts-looking-like-a-thumb ly

I can't imagine what you're on about.

when-all-you-have-are-thumbs-everything-looks-like-an-ass<wink>-ly y'rs
- tim

0 new messages