The problem is that for SQL databases, there is a substantial API that they can
all share. The implementations are primarily differentiated by other factors
like speed, in-memory or on-disk, embedded or server, the flavor of SQL, etc.
and only secondarily differentiated by their extensions to the DB-API. With
parallel processing, the API itself is a key differentiator between toolkits and
approaches. Different problems require different APIs, not just different
I suspect that one of the smaller implementations like processing.py might get
adopted into the standard library if the author decides to push for it. The ones
I am thinking of are relatively new, so I imagine that it might take a couple of
years of vigorous use by the community before it gets into the standard library.
My recommendation to you is to pick one of the smaller implementations that
solves the problems in front of you. Read and understand that module so you
could maintain it yourself if you had to. Post to this list about how you use
it. Blog about it if you blog. Write some Python Cookbook recipes to show how
you solve problems with it. If there is a lively community around it, that will
help it get into the standard library. Things get into the standard library
*because* they are supported, not the other way around.
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
Well, there is one parallel processing API that already *is* part of stdlib:
the threading module. So the processing module would fit just nicely into the
idea of a "standard" library.
Don't you forget the select module and its siblings for I/O bound
Hmm, when I think of "parallel processing", it's usually about processing, not
about I/O. If it starts getting I/O bound, it's rather worth considering
single-threaded processing instead.
On Dec 27, 2007 8:52 AM, Emin.shopper Martinian.shopper
> Dear Experts,
> Is there any hope of a parallel processing toolkit being incorporated into
> the python standard library? I've seen a wide variety of toolkits each with
> various features and limitations. Unfortunately, each has its own API. For
> coarse-grained parallelism, I suspect I'd be pretty happy with many of the
> existing toolkits, but if I'm going to pick one API to learn and program to,
> I'd rather pick one that I'm confident is going to be supported for a while.
> So is there any hope of adoption of a parallel processing system into the
> python standard library? If not, is there any hope of something like the
> db-api for coarse grained parallelism (i.e, a common API that different
> toolkits can support)?
Read my blog! I depend on your acceptance of my opinion! I am interesting!
True. I suspect that if any of them get into the standard library, it will be
> My recommendation to you is to pick one of the smaller
> implementations that
> solves the problems in front of you. Read and understand that module
> so you
> could maintain it yourself if you had to. Post to this list about
> how you use
> it. Blog about it if you blog. Write some Python Cookbook recipes to
> show how
> you solve problems with it.
> That is a good suggestion, but for most of the coarse grained
> parallelism tasks I've worked on it would be easier to roll my own
> system than do that. To put it another way, why spend the effort to use
> a particular API if I don't know its going to be around for a while?
> Since a lot of the value is in the API as opposed to the implementation,
> unless there is something special about the API ( e.g., it is an
> official or at least de factor standard) the learning curve may not be
> worth it.
And you think that you will encounter no learning curve writing your own code?
At least take the opportunity to see how other people have solved your problem.
Some of the implementations floating around now fit into one module. Surely, it
would take less time to understand one of them than write your own. And let's
not forget testing your module. The initial writing is never the timesink; it's
> If there is a lively community around it, that will
> help it get into the standard library. Things get into the standard
> *because* they are supported, not the other way around.
> You make a good point and in general I would agree with you. Isn't it
> possible, however, that there are cases where inclusion in the standard
> library would build a better community?
Not inclusion by itself, no. The standard library's APIs are only as supported
as there exists people willing to support them. *Their being in the standard
library does not create people out of thin air*. That's why the python-dev team
now have a hard requirement that new contributions must come with a guarantee of
support. Asking for inclusion without offering the corresponding guarantee will
be met with rejection, and rightly so.
> How would you or the rest of the community react to a proposal for a
> generic parallelism API? I suspect the response would be "show us an
> implementation of the code". I could whip up an implementation or adapt
> one of the existing systems, but then I worry that the discussion would
> devolve into an argument about the pros and cons of the particular
> implementation instead of the API. Even worse, it might devolve into an
> argument of the value of fine-grained vs. coarse-grained parallelism or
> the GIL. Considering that these issues seem to have been discussed quite
> a bit already and there are already multiple parallel processing
> implementations, it seems like the way forward lies in either a blessing
> of a particular package that already exists or adoption of an API
> instead of a particular implementation.
Well, you can't design a good API without having an implementation of it. If you
can't use the API in real problems, then you won't know what problems it has.
Preferably, for an API that's intended to have multiple "vendors", you should
have 2 implementations taking different approaches so you can get some idea of
whether the API generalizes well.
> Is there any hope of a parallel processing toolkit being
> incorporated into the python standard library? I've seen a wide
> variety of toolkits each with various features and limitations.
> Unfortunately, each has its own API. For coarse-grained
> parallelism, I suspect I'd be pretty happy with many of the
> existing toolkits, but if I'm going to pick one API to learn and
> program to, I'd rather pick one that I'm confident is going to be
> supported for a while.
I don't think that parallel computing is mature enough to allow the
standardization of APIs, except within a given and well specified
parallel computing model such as message passing.
The Python Wiki has an impressive list of parallel processing options
for Python (see http://wiki.python.org/moin/ParallelProcessing). With
the exception of the various MPI interfaces, I don't think that any
two of them are based on the same parallel computing model. I don't
expect this situation to change any time soon, as parallel computing
is still very much experimental. Whereas sequential computing has
well-tested software engineering techniques, reliable libraries that
can be combined into programs, and ever improving testing techniques,
none of these exist for parallel computing.
For an overview of parallel computing models and for a more detailed
description of one of them as implemented in Python, please see my
recent article in "Computing in Science and Engineering":