I (mostly) understand the nature of "threads" in Python. From my
understanding, the GIL locks the interpreter to executing only one
Python thread at a time, but C modules can take advantage of a Python
application being multithreaded, because they can operate independant
of the GIL. Presumably, this would mean that there is, in fact, a
benefit to using threads in Paste, because most network I/O bound
stuff happens within a C module.
Given this situation, I believe that despite paste making an effort to
be multithreaded, it would still be advantageous to run a cluster of
four Pylons instances and proxy to these using nginx.
Using our setup we'd have four pylons instances being proxied to by
four nginx worker threads.
In nginx you can set the processor affinity for each worker thread,
thus placing each worker on a different core 0..3.
Here's where things get tricky:
I've found a Python package that apparently allows Python applications
to set their processor affinity (I'm afraid it doesn't work on OS X):
http://pypi.python.org/pypi/affinity/0.1.0
Using this, what do you guys thing on my idea to write a custom
cluster controller, perhaps using supervisord, that will start nginx
and the four worker processes, and then fork()'s my Pylons app into
into a cluster of four?
Is this overkill? Is Paste more mulithreaded than I'm giving it credit
for? Is there a better way to go about this? Does an alternative to
the 'affinity' package exist?
-Devin Torres
Separate processes is likely to work better. You might find one of the
flup forking servers to be better (using fastcgi), though I don't know
for sure. That will run each request in its own process, so you'll get
multiple processes without the same infrastructure complications of a
cluster of servers.
I don't think affinity should be that important. Doesn't the OS handle
that itself?
--
Ian Bicking : ia...@colorstudy.com : http://blog.ianbicking.org
Also, is it only fastcgi, not scgi as well?
-Devin
Hmm... well, yeah, that probably wouldn't work well -- I think each
request being a new fork won't get any shared connections. So perhaps a
cluster of servers would work better for you.
I'm reading this as meaning intranet, not internet?
> Given this situation, I believe that despite paste making an effort to
> be multithreaded, it would still be advantageous to run a cluster of
> four Pylons instances and proxy to these using nginx.
>
> Using our setup we'd have four pylons instances being proxied to by
> four nginx worker threads.
Unless you are expecting hundreds of req/s or are serving large static
files, I'd suggest just one or maybe two nginx workers. Nginx will not
be the bottleneck in this situation. One Nginx worker can easily handle
proxying four Pylons apps (or a hundred, for that matter).
> In nginx you can set the processor affinity for each worker thread,
> thus placing each worker on a different core 0..3.
Not true on Linux. This has been broken for some time.
> Here's where things get tricky:
> I've found a Python package that apparently allows Python applications
> to set their processor affinity (I'm afraid it doesn't work on OS X):
> http://pypi.python.org/pypi/affinity/0.1.0
>
> Using this, what do you guys thing on my idea to write a custom
> cluster controller, perhaps using supervisord, that will start nginx
> and the four worker processes, and then fork()'s my Pylons app into
> into a cluster of four?
>
> Is this overkill? Is Paste more mulithreaded than I'm giving it credit
> for? Is there a better way to go about this? Does an alternative to
> the 'affinity' package exist?
I think it's overkill, but not for the reasons you seem to think. Much
easier is to simply run four Pylons processes from the command line,
each with a custom .ini file. Just use a shell script.
You can also set the CPU affinity for Pylons (or more specifically,
Python) from the command line using a small C program (and I'm sure
there are pre-written utilities or recipes you could follow).
You'll want to follow a shared-nothing approach or use something like
Memcached to share data between processes. It's probably also possible
to use Memcached as a secondary cache for SQLAlchemy (although I haven't
tried it). There's a thread about someone doing this here:
http://www.mail-archive.com/sqlalche...@lists.sourceforge.net/msg02499.html
Regards,
Cliff
Flup has both fcgi_fork and scgi_fork flavors. They are pre-fork so it
creates a pool of long running processes and it passes connections to
them. This is the same model that Apache uses an is in theory quite
efficient. You do NOT have to wait for a fork on every connection
because the pool of processes has been forked in advance and is ready
and waiting.
Do you happen to know the applicable setting to use when specifying
the size of that pool?
-Devin Torres
Just to use fcgi_fork do this.
[server:main]
use = egg:PasteScript#flup_fcgi_fork
host = 0.0.0.0
port = 5000
I've never changed the defaults for the pool but I think this is
supposed to be the right way to do it.
[server:main]
paste.server_factory = flup.server.fcgi_fork:factory
host = 0.0.0.0
port = 5000
maxChildren=50
maxSpare=5
minSpare=1
Those are the default pool settings so that SHOULD be the equivalent of
the first config section. The problem is it doesn't seem to work that
way. If I use the server_factory and start things up with 'paster
serve development.ini' it seems fine until I hit ctrl-c to stop it.
Then all hell breaks loose and it starts forking off children like mad
bringing the machine to its knees.
I was planning on moving an app from quixote using its preforked
scgi_server.py to pylons with flup_scgi_fork but apparently thats a bad
idea. Either I'm using the factory wrong or I need to figure out whats
up with flup.
I suppose another option is using a Paste#http instance for each
processor and nginx as a reverse proxy spreading the load over them.
>
> I suppose another option is using a Paste#http instance for each
> processor and nginx as a reverse proxy spreading the load over them.
That's what I do.
Cliff
You may consider apache with mod_wsgi, it can be simpler to manage in
such context. In particular WSGIDaemonProcess let you set the number
of dedicated processess...
--
----------------------------------------------------------------------
| Marcin Kasperski | A process that is too complex will fail.
| http://mekk.waw.pl | (Booch)
| |
----------------------------------------------------------------------
Read.
Apache is starting to look attractive now. So I assume I'm not looking
for embedded mode, right? You say it's more performance, but at the
cost of what? Using the worker MPM and, say, daemon-mode, using, say,
4 processes and 16 threads each, would my processes be dying as soon
as they're not needed? My application takes awhile to load because I
autoload my database using SQLAlchemy. Is it that easy to configure
apache to start 4 by default and load balance between all of them?