thx
Run Lib/test/test_thread.py with your Python build. If you are on any
common platform then Python, unless specifically crippled, is
thread-enabled.
> Also, what is a reasonable number of threads to expect to be able to run
> before context switching overhead becomes a problem (I'm using a PIII 500
> Mhz with 512MB ram if that helps).
I have no experience putting that many threads to use.
Trent
--
Trent Mick
Tre...@ActiveState.com
I think you know that doesn't include NetBSD, and I believe it doesn't
include MacOS either. I'd call both of those common platforms, though
you can draw that line anywhere you want.
|> Also, what is a reasonable number of threads to expect to be able to run
|> before context switching overhead becomes a problem (I'm using a PIII 500
|> Mhz with 512MB ram if that helps).
|
| I have no experience putting that many threads to use.
Of course it would depend, too, on what "a problem" means, and on what
operating system the application is hosted on. And it might be worth
mentioning that the Python parts of the application will not execute
in parallel, anyway, thanks to a global interpreter lock that can be
held by only one thread at a time.
Donn Cave, do...@drizzle.com
Check out Stackless Python which supports micro threads (very small
context switching overhead ..)
Armin
>
>
> Trent
It depends. For pure Python-based computation, more than one thread
will cost you more than you gain. If it's almost pure I/O (file or
socket stuff), you might be able to use as many as a couple of hundred
threads. Mostly, you're probably best off somewhere between five and
thirty threads, depending on what you're doing.
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/
Project Vote Smart: http://www.vote-smart.org/
I would have thought that a brave call. If you are performing lots of
IO and need many many connections, you are almost certainly better off
using an "asynchronous" approach, much like asyncore. With a large
number of threads (where "large" can be as low as single figures) the OS
overhead of switching will be greater than your internal overhead of a
more complicated design.
A "one thread per connection" model is doomed from the start -
therefore, in most applications, I would consider 5 threads a large
number - particularly when all are actively doing something. OTOH, if
you are willing to accept that your first version will accept a max of
~20 connections, the brain-power saved by using threads beats the
additional OS overhead by a mile ;)
Mark.
The new Linux native thread implementation was recently benchmarked
running 100,000 simultaneous threads on a 1 GB machine. Starting all
the threads and closing them took around 2 seconds.
Just because you *can* doesn't mean you *should* <wink>. Are you
suggesting you would use a threading model to support 100,000
simultaneous connections to your program on such a system?
Mark.
Yes BUT -- as of course you know, select, on Windows, can only deal with
sockets. Other kinds of asynchronous I/O require a completely different
architecture on Windows wrt Unixoid systems -- file I/O, specifically,
has the best performance with Windows' on asynchronous-IO-with-callbacks-
on-completion, an arrangement very reminiscent of old VMS systems. Even
for sockets, I benchmarked back in NT 3.something times that "native"
(async-with-messages) winsock was vastly more scalable than "BSD emulating
sockets and select" (not sure if that's still true these days).
twisted.internet does have multiple implementations of the Reactor
design patterns, including among others both a portable one based
on select, and a Windows-specific one with a specialized message loop
at its core (and others yet, such as ones working with GUI frameworks'
events, poll in lieu of select, etc) -- haven't examined implementation
in detail, but that does seem like the only sane approach if you need
servers that are BOTH cross-platform AND highly scalable.
> number of threads (where "large" can be as low as single figures) the OS
> overhead of switching will be greater than your internal overhead of a
> more complicated design.
I guess so, depending of course on the OS -- threading is more and
more likely to be well-optimized these days, but still the context
switch has to cost something compared to a "cooperative" asynchronous
architecture you can implement within one thread (or a very few
threads of your own -- twisted lets you do that, too).
> A "one thread per connection" model is doomed from the start -
Hear, hear.
> therefore, in most applications, I would consider 5 threads a large
> number - particularly when all are actively doing something. OTOH, if
> you are willing to accept that your first version will accept a max of
> ~20 connections, the brain-power saved by using threads beats the
> additional OS overhead by a mile ;)
I think it only LOOKS simple if you don't look into it deeply enough.
If the threads have almost no need to cooperate or share resources,
then you may well be right. If the cooperation needs are limited
enough that you can smoothly deal with them with Python "Queue"
instances, you still have a chance to be right. But generally, I
find that Ousterhout was right -- I'm sure you're familiar with his
talk on "Why threads are a bad idea for most purposes" (I guess one
can still find the slides on the net)... his point is more about
"brain-power saved" (or consumed by debugging in a just-too-hard
environment, with transient errors, race conditions, etc, etc) by
using event-driven (i.e., async) approaches rather than threading,
rather than about performance and scalability.
Python may not yet support event-driven programming quite as smoothly
and seamlessly as Ousterhout's Tcl, but, thanks to such developments
as twisted, I think we're drawing progressively closer. With a solid
enough underlying framework to dispatch events for you, I think event
driven programming (perhaps with a few simple specialized threads off
to one side, interfacing to some slow external resource, and getting
work requests off a Queue, posting results back to a Queue...) can
be made conceptually simpler as well as more scalable than threading.
After all, in event-driven / async programming, ONE thing is happening
at a time -- you only "switch" at known points, and between two such
points you need not worry about what is or isn't atomic, locking, &c.
It's also easier to debug, I think...
Alex
> Python may not yet support event-driven programming quite as smoothly
> and seamlessly as Ousterhout's Tcl, but, thanks to such developments
> as twisted, I think we're drawing progressively closer. With a solid
> enough underlying framework to dispatch events for you, I think event
> driven programming (perhaps with a few simple specialized threads off
> to one side, interfacing to some slow external resource, and getting
> work requests off a Queue, posting results back to a Queue...) can
> be made conceptually simpler as well as more scalable than threading.
> After all, in event-driven / async programming, ONE thing is happening
> at a time -- you only "switch" at known points, and between two such
> points you need not worry about what is or isn't atomic, locking, &c.
> It's also easier to debug, I think...
>
>
Having written _many_ state machines in my life, which async programming
boils down to in one form or another, I feel it depends on the task. When
you have an async event driven model, the code that services a
"transaction-package" ends up spreading functionality that services the
"transaction-package" application's code landscape. This can frequently be
a headache. As an example of a "transaction-package", take all the steps
involved in servicing an incoming TCP connection, with custom authentication
of a user, and perhaps a header and data object transacttion or two,
finishing with a proper ending and cleanup of the transaction, and with
subsequent notification of all necessary modules and data structures of the
results of the transaction. For dessert add a centralized clean error
handler that can log errors propertly and gracefully recover the app
regardless of where in the distributed event flow the error took place.
Contrast this with a simple procedure call where a linear series of function
calls are made, each with appropriate error handling and status adjustment,
in a pleasant centrally located manner.
There is an upcoming massively multiplayer game called EVE which is using
Stackless Python which makes this point quite well. Game AI suffers from
the same design considerations as mentioned above. Here's the link if
you're interested (I'm not associated with them in any way, just considering
taking a look at Stackless). Look for question "8.5 How is the game logic
implemented in EVE?":
http://www.eve-online.com/faq/faq_08.asp
thx
> Having written _many_ state machines in my life, which async programming
> boils down to in one form or another, I feel it depends on the task. When
> you have an async event driven model, the code that services a
> "transaction-package" ends up spreading functionality that services the
> "transaction-package" application's code landscape. This can frequently be
> a headache. As an example of a "transaction-package", take all the steps
> involved in servicing an incoming TCP connection, with custom authentication
> of a user, and perhaps a header and data object transacttion or two,
> finishing with a proper ending and cleanup of the transaction, and with
> subsequent notification of all necessary modules and data structures of the
> results of the transaction. For dessert add a centralized clean error
> handler that can log errors propertly and gracefully recover the app
> regardless of where in the distributed event flow the error took place.
>
> Contrast this with a simple procedure call where a linear series of function
> calls are made, each with appropriate error handling and status adjustment,
> in a pleasant centrally located manner.
>
I wonder if generators can come to a rescue here?
Still trying to get them into my head...
Thomas
Any info or opinion on what would be the best way to implement a filesharing
app in python; threaded, async, raw-socket etc. are of interest to me.
NB! It would be nice to be able to limit the number of threads, actually the
number of concurrent connections, on the server too, so if anybody has any
comments on some way of doing that and still handle request gracefully, it
would be great.
Best regards,
Thomas
"Aahz" <aa...@pythoncraft.com> wrote in message
news:amogmi$r1l$1...@panix1.panix.com...
I was thinking exactly the same thing, generators as
cooperative multithreading.
I wonder if a "task", or a "service", in an async server
could simply initialize and then "yield" when it's ready
to be paused, and between yield's do some processing.
The core server would simpy call the service factory when
it's requested, receive a generator back, add it to it's
pool of active services, and iteratively call every
generators ".next()" method, so everybody gets a bit of
attention and love each cycle.
You could even use priorities, something simple, where high-
priority services would receive a bit more attention every
complete cycle.
I dunno, but it's an idea who's time has time, I think.
:-)
-gus
Advertencia:
La informacion contenida en este mensaje es confidencial y restringida y
esta destinada unicamente para el uso de la persona arriba indicada, Esta
comunicacion representa la opinion personal del remitente y no refleja
necesariamente la opinion de la Compañia. Se le notifica que esta
estrictamente prohibida cualquier difusion, distribucion o copia de este
mensaje. Si ha recibido esta comunicacion o copia de este mensaje por error,
o si hay problemas en la transmision, favor de comunicarse con el remitente.
Todo el correo electrónico enviado para o desde esta dirección será
procesado por el sistema de correo corporativo de HEB. Tal correo
electrónico esta sujeto a ser almacenado y puede ser revisado por alguien
ajeno al recipiente autorizado con el propósito de monitorear que se cumplan
las normas de seguridad de la empresa.
> This might be just a stupid question, but I'm trying to implement a
> filesharing-app, based on the BaseHTTPServer in the standard python-distro.
> This will of course be heavy IO-processing involved. Will threading still be
> a bad idea? The project is aimed at small user-groups ( I do not think one
> person will have more than say 5-10 concurrent connections at a time, most
> of them downloading files 1MB + in size),
At those low load levels, threads will be fine.
> as if threading is a very bad idea no matter what, still it's used almost
> everywhere ( I got no statistics to back up that statement ;-) ).
You have to factor in the issue of complexity too though. The threading
approach is almost always much simpler and many applications don't need to
be high performance, so threading is acceptable. That said, the asynch
approach _is_ useful to know and understand - if for no other reason than
it makes you understand the problem from a very different angle. And after
awhile it really starts to "click" in your head and make sense. State
machines are nifty. :)
> NB! It would be nice to be able to limit the number of threads, actually the
> number of concurrent connections, on the server too, so if anybody has any
> comments on some way of doing that and still handle request gracefully
Is the client side a web browser or a custom app? Your request handler
could return a nice "server is busy" (e.g. HTTP 503) message to a browser,
but if you want to explicitly cap the number of threads you'll need to
subclass the server. It's been awhile since I've looked at SocketServer,
but I think it has a check_request or verify_request or something method
that you could override to send back an HTTP 503 if the server is
overloaded (already has N threads running) and then return false (as in,
"don't process this request any further).
-Dave
[Thomas Heller]
> I wonder if generators can come to a rescue here?
Absolutely. Read the start of PEP 255.
> Still trying to get them into my head...
Think of them as resumable functions. You're familiar with functions that
remember their own data state across calls, via abusing globals, or via
declaring private static vrbls, in C. Methods on objects often do the same
by stuffing their state into instance attributes between calls. A generator
remembers its data state across resumptions simply by virtue of that its
local variables don't vanish, and also remembers its control-flow state
across resumptions. Remembering local data state by magic is a real
convenience, but can be simulated in many other ways with reasonable effort;
it's the ability to remember and restore control-flow state by magic that
can make generators more than merely convenient.
I _have_ written kind of hardware descriptions with it - what you do
with VHDL testbenches normally. This was fun!
There were several generators like this, manipulating clock, reset
and other signals in a counter device. The value yield'ed is simply
the time to wait:
def run_clock(self):
while 1:
self.clock = not self.clock
yield 50 # wait 50 ns
def run_reset(self):
self.reset = 1
yield 200 # a 200 ns reset pulse
self.reset = 0
The dispatcher goes like this: Creates the generators,
pushes them onto a priority queue (together with the time
value they will have to be served), serves them by calling
the next() method, and pushes the result back onto the queue:
def run(*gens):
queue = MyList() # a priority queue
# insert all generators
for g in gens:
queue.put((0, g()))
# and run them
while queue:
now, gen = queue.pop(0)
try:
dt = gen.next()
except StopIteration:
pass
else:
queue.put((now + dt, gen))
--------
OTOH, I've been missing the possibility to pass
values back _to_ the generators - something like maybe
'work = yield event', probably triggered by calling gen.next(work)).
Wouldn't this be useful?
I fear I cannot explain this clearly, but I think
David Mertz statemachine example would also profit from it.
As I understand it, he passes this 'work' around in global variables.
Thomas
You want Twisted. http://twistedmatrix.com/
It's an asynchronous networking framework which has an elegant abstraction
of callbacks and error handlers that I find make event-based programming a
breeze. See http://twistedmatrix.com/documents/howto/defer for a detailed
description of how this works. It makes it trivial to sequence a series of
operations on some event, regardless of when it happens.
In fact, here's a breakdown of your steps and Twisted:
1. "servicing an incoming connection":
twisted.internet does that for you. It's dead easy.
2. "custom authentication of a user":
Use the twisted.cred package. This is exactly what it is designed for.
It has a standard asynchronous interface, with ready-made implementations
involving a simple python dict, and also a database-based implementation.
3. "perhaps a header and a data object transacttion":
There isn't general transaction support in Twisted, but I imagine any
object database you'd be using would support them, e.g. the ZODB. And
again, Deferreds make it easy to catch exceptions and so on (see my next
points).
4. "centralized clean error handler than can log errors properly and
gracefully recover":
Twisted's Deferreds make that trivial. In fact, if you forget to install
an error handler, Twisted will log the error for you automatically.
Deferreds callbacks and errbacks in many ways mirror your usual
try/except constructs, so structured error handling is straightforward.
See the doc I linked to above for more details. Also, Twisted's main loop
will never be crashed by an exception in one of the connections it
services... in general, if something causes one connection to crash, the
rest of Twisted merrily keeps on going regardless.
> Contrast this with a simple procedure call where a linear series of
> function calls are made, each with appropriate error handling and status
> adjustment, in a pleasant centrally located manner.
For instance, talking to a database asynchronously, Twisted-style:
# See also http://twistedmatrix.com/documents/howto/enterprise
deferred = db.runQuery('SELECT * FROM table')
Then hooking up a bunch of callbacks:
deferred.addCallbacks(gotResult, somethingBroke)
deferred.addCallbacks(doStep2, step2Rollback)
deferred.addCallback(doStep3) # step 3 doesn't care about rollbacks
# ...etc...
# Catch any uncaught errors
deferred.addErrback(logError)
The results of each callback get fed into the next, so you can perform a
series of data transformations or whatever, e.g. perhaps you want something
like:
# Note that addCallback returns the Deferred for your convenience
deferred.addCallback(recordSetToXML).addCallback(applyXSLT) # etc
Which looks like a linear series of function calls to me, and it certainly
has error handling (.addErrback) and status adjustment (you could easily be
updating an instance variable from within the callbacks).
And there's even more to Deferreds than I've described here! Go read the
documentation :)
> There is an upcoming massively multiplayer game called EVE which is using
> Stackless Python which makes this point quite well. Game AI suffers from
> the same design considerations as mentioned above. Here's the link if
> you're interested (I'm not associated with them in any way, just
> considering taking a look at Stackless). Look for question "8.5 How is
> the game logic implemented in EVE?":
Interesting. Twisted is actually designed with exactly this goal in mind
(MMORPG). Stackless is nice, but I think for what you're talking about
Twisted already solves your problems in "pure" Python :)
Twisted is an excellent framework. I highly recommend you download it
(version 0.99.2 is still fresh off the press -- grab it while it's hot!),
play with it, bug us on the mailing list and irc channel (#twisted on
irc.freenode.net), and let us know what you think. We think you might just
like it.
-Andrew.
Not too bad.
How about 1,000,000 in 100 MB?
Started and shut down in 15 seconds, from Python?
This was just a rough test of new features, will
give exact timings when I write the article...
ciao - chris
--
Christian Tismer :^) <mailto:tis...@tismer.com>
Mission Impossible 5oftware : Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/
14109 Berlin : PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/
Any thoughts on defining the next method with a generic *args optional arg list
and making it visible to an executing generator function g as g.__args__ ?
g.__args__ would then just be () for current usage, and could be
backwards-compatibly ignored as desired.
I'm thinking it would, e.g., provide a channel to pass messages etc to
a running task-implemented-as-generator.
Or you could pass control info to a tokenizer, e.g. telling it to back up
one and switch to profuse-info mode before yielding again, or whatever.
Another thought is to pass optional return args in the args tuple attribute
of the StopException object. This could be used variously, but might be used
to return a final result from a child task, where ordinary yields might be
reserved for control info for a dispatcher. Or it could be used to distinguish
exceptional return conditions without using other exceptions. Etc., etc.
This could also be reverse-compatibly ignored, but I think the two thoughts above
could open up a lot of possiblilities. Perhaps they have both been discussed before?
Regards,
Bengt Richter
I think you can get something like this using a class. For this solution,
you'll need to make the list visible as a parameter to the generator, or
else write it so that you derive a class for each generator, making the
generator a method. You may also be able to work something up involving
nested functions that lets the inner function see the argument list as
a variable in the outer scope.
class ArgsGenerator:
def __init__(self, f, *args, **kw):
self.args = []
self.__iter = f(self.args, *args, **kw)
def next(self, *args):
self.args[:] = list(args)
return self.__iter.next()
def primes(x):
n = 2
while 1:
if x: n=x[0]
if n == -1: return
while 1:
composite = 0
for d in range(2, n-1):
if n%d == 0:
composite = 1
break
if not composite:
yield n
break
n=n+1
n=n+1
x = ArgsGenerator(primes)
print x.next() # 2
print x.next(8) # 11
print x.next() # 13
print x.next(-1) # raises StopIteration
Jeff
I've got no practical experience with asyncore, but I do with threads.
While I accept the conventional wisdom that asyncore will frequently do
better than threads for performance, I stand by my claims for what will
work well with threads.
>A "one thread per connection" model is doomed from the start -
>therefore, in most applications, I would consider 5 threads a large
>number - particularly when all are actively doing something. OTOH, if
>you are willing to accept that your first version will accept a max of
>~20 connections, the brain-power saved by using threads beats the
>additional OS overhead by a mile ;)
Overall, yes, but a lot of I/O is bursty, and with a number of blocked
threads, up to a hundred or so can work well. Note carefully that I
*am* assuming a thread pool to avoid startup costs.
On the other hand, if the asynchronous handler starts a thread to handle its
task, then when the processing thread blocks the main asynchronous thread
can process further connections or inputs. Thus, I thought, there would be
benefits particularly when the requested tasks were likely to be relatively
lengthy.
> >A "one thread per connection" model is doomed from the start -
> >therefore, in most applications, I would consider 5 threads a large
> >number - particularly when all are actively doing something. OTOH, if
> >you are willing to accept that your first version will accept a max of
> >~20 connections, the brain-power saved by using threads beats the
> >additional OS overhead by a mile ;)
>
> Overall, yes, but a lot of I/O is bursty, and with a number of blocked
> threads, up to a hundred or so can work well. Note carefully that I
> *am* assuming a thread pool to avoid startup costs.
The thread pool obviously helps a great deal. I was sorry to see Mark
suggesting that "thread per connection" models won't be responsive, since
his word on that topic (especially for Windows) is likely to be
authoritative.
regards
-----------------------------------------------------------------------
Steve Holden http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/pwp/
Previous .sig file retired to www.homeforoldsigs.com
-----------------------------------------------------------------------
It seems that Twisted Matrix lets you mix the two approaches quite nicely --
as an asyncore fan to another, I would suggest you look into Twisted, as "a
better asyncore than asyncore" in some ways:-). (I've only really looked
into the lower levels, so far, but I like what I see there, quite a lot).
Alex
In that case mmayeb you could help me. I've always found twisted to be a bit
unapproachable, though I suspect it might be more to do with me than with
twisted. The impression I have is that "it's easy to understand if you
already understand it", but that the documentation doesn't realy explain the
principles on which it's based.
Of course this is far from uncommon in the open-source world, but I guess
I'll take another look.
Yep. But now that you've carefully technical-reviewed the Nutshell
chapter on networking, o technical reviewer mine, surely that's
changed? I mean, I _do_ in that chapter cover Twisted.internet,
side by side with asyncore, asynchat and other approaches, with the
same example server shown for each... (pls mail me if you need a
copy of that chapter in the latest/current version!).
Alex