Multiple Pylons instances, processor affinity, and "threads"

45 views
Skip to first unread message

Devin Torres

unread,
Apr 23, 2008, 1:51:50 PM4/23/08
to pylons-...@googlegroups.com
So we're using Pylons and Python in general for our new company
platform. We just bought a server with 4 cores to help us reach our
scalability goals, but there are a few questions I'm interested in
asking the Pylons community.

I (mostly) understand the nature of "threads" in Python. From my
understanding, the GIL locks the interpreter to executing only one
Python thread at a time, but C modules can take advantage of a Python
application being multithreaded, because they can operate independant
of the GIL. Presumably, this would mean that there is, in fact, a
benefit to using threads in Paste, because most network I/O bound
stuff happens within a C module.

Given this situation, I believe that despite paste making an effort to
be multithreaded, it would still be advantageous to run a cluster of
four Pylons instances and proxy to these using nginx.

Using our setup we'd have four pylons instances being proxied to by
four nginx worker threads.

In nginx you can set the processor affinity for each worker thread,
thus placing each worker on a different core 0..3.

Here's where things get tricky:
I've found a Python package that apparently allows Python applications
to set their processor affinity (I'm afraid it doesn't work on OS X):
http://pypi.python.org/pypi/affinity/0.1.0

Using this, what do you guys thing on my idea to write a custom
cluster controller, perhaps using supervisord, that will start nginx
and the four worker processes, and then fork()'s my Pylons app into
into a cluster of four?

Is this overkill? Is Paste more mulithreaded than I'm giving it credit
for? Is there a better way to go about this? Does an alternative to
the 'affinity' package exist?

-Devin Torres

Ian Bicking

unread,
Apr 23, 2008, 1:56:56 PM4/23/08
to pylons-...@googlegroups.com
Devin Torres wrote:
> So we're using Pylons and Python in general for our new company
> platform. We just bought a server with 4 cores to help us reach our
> scalability goals, but there are a few questions I'm interested in
> asking the Pylons community.
>
> I (mostly) understand the nature of "threads" in Python. From my
> understanding, the GIL locks the interpreter to executing only one
> Python thread at a time, but C modules can take advantage of a Python
> application being multithreaded, because they can operate independant
> of the GIL. Presumably, this would mean that there is, in fact, a
> benefit to using threads in Paste, because most network I/O bound
> stuff happens within a C module.
>
> Given this situation, I believe that despite paste making an effort to
> be multithreaded, it would still be advantageous to run a cluster of
> four Pylons instances and proxy to these using nginx.

Separate processes is likely to work better. You might find one of the
flup forking servers to be better (using fastcgi), though I don't know
for sure. That will run each request in its own process, so you'll get
multiple processes without the same infrastructure complications of a
cluster of servers.

I don't think affinity should be that important. Doesn't the OS handle
that itself?

--
Ian Bicking : ia...@colorstudy.com : http://blog.ianbicking.org

Devin Torres

unread,
Apr 23, 2008, 2:05:28 PM4/23/08
to pylons-...@googlegroups.com
If I understand you correctly, there's a flup entry point that forks
the process instead of flup_fcgi_thread? I'm not sure that would have
good performance, but maybe you think forking is capable of good
performance in this case. After forking, would SQLAlchemy connections
stay persistent? Is that safe?

Also, is it only fastcgi, not scgi as well?

-Devin

Ian Bicking

unread,
Apr 23, 2008, 2:20:08 PM4/23/08
to pylons-...@googlegroups.com
Devin Torres wrote:
> If I understand you correctly, there's a flup entry point that forks
> the process instead of flup_fcgi_thread? I'm not sure that would have
> good performance, but maybe you think forking is capable of good
> performance in this case. After forking, would SQLAlchemy connections
> stay persistent? Is that safe?

Hmm... well, yeah, that probably wouldn't work well -- I think each
request being a new fork won't get any shared connections. So perhaps a
cluster of servers would work better for you.

climbus

unread,
Apr 23, 2008, 2:43:14 PM4/23/08
to pylons-discuss
Devin Torres napisał(a):

> Given this situation, I believe that despite paste making an effort to
> be multithreaded, it would still be advantageous to run a cluster of
> four Pylons instances and proxy to these using nginx.

We're using 2 instances of paster with few threads. It's working
better than one instance. We have apache load balancer in front.

Configuration:

[server:main]
use = egg:paste#http
host = 0.0.0.0
port = 5000
use_threadpool = True
threadpool_workers = 10

[server:main2]
use = egg:paste#http
host = 0.0.0.0
port = 5001
use_threadpool = True
threadpool_workers = 10

Start commands:

paster serve production.ini --server-name=main --pid-file=main.pid --
log-file=main.log --daemon start
paster serve production.ini --server-name=main2 --pid-file=main2.pid --
log-file=main2.log --daemon start

Apache conf:

RewriteRule ^(.*)$ balancer://somename$1 [P,L]

<Proxy balancer://somename>
BalancerMember http://127.0.0.1:5000 retry=3
BalancerMember http://127.0.0.1:5001 retry=3
</Proxy>

You can use nginx too.

Climbus

Cliff Wells

unread,
Apr 23, 2008, 2:53:56 PM4/23/08
to pylons-...@googlegroups.com

On Wed, 2008-04-23 at 12:51 -0500, Devin Torres wrote:
> So we're using Pylons and Python in general for our new company
> platform.

I'm reading this as meaning intranet, not internet?

> Given this situation, I believe that despite paste making an effort to
> be multithreaded, it would still be advantageous to run a cluster of
> four Pylons instances and proxy to these using nginx.
>
> Using our setup we'd have four pylons instances being proxied to by
> four nginx worker threads.

Unless you are expecting hundreds of req/s or are serving large static
files, I'd suggest just one or maybe two nginx workers. Nginx will not
be the bottleneck in this situation. One Nginx worker can easily handle
proxying four Pylons apps (or a hundred, for that matter).

> In nginx you can set the processor affinity for each worker thread,
> thus placing each worker on a different core 0..3.

Not true on Linux. This has been broken for some time.

> Here's where things get tricky:
> I've found a Python package that apparently allows Python applications
> to set their processor affinity (I'm afraid it doesn't work on OS X):
> http://pypi.python.org/pypi/affinity/0.1.0
>
> Using this, what do you guys thing on my idea to write a custom
> cluster controller, perhaps using supervisord, that will start nginx
> and the four worker processes, and then fork()'s my Pylons app into
> into a cluster of four?
>
> Is this overkill? Is Paste more mulithreaded than I'm giving it credit
> for? Is there a better way to go about this? Does an alternative to
> the 'affinity' package exist?

I think it's overkill, but not for the reasons you seem to think. Much
easier is to simply run four Pylons processes from the command line,
each with a custom .ini file. Just use a shell script.

You can also set the CPU affinity for Pylons (or more specifically,
Python) from the command line using a small C program (and I'm sure
there are pre-written utilities or recipes you could follow).

You'll want to follow a shared-nothing approach or use something like
Memcached to share data between processes. It's probably also possible
to use Memcached as a secondary cache for SQLAlchemy (although I haven't
tried it). There's a thread about someone doing this here:

http://www.mail-archive.com/sqlalche...@lists.sourceforge.net/msg02499.html


Regards,
Cliff


Christopher Weimann

unread,
Apr 23, 2008, 4:20:57 PM4/23/08
to pylons-...@googlegroups.com
Devin Torres wrote:
> If I understand you correctly, there's a flup entry point that forks
> the process instead of flup_fcgi_thread? I'm not sure that would have
> good performance, but maybe you think forking is capable of good
> performance in this case. After forking, would SQLAlchemy connections
> stay persistent? Is that safe?
>

Flup has both fcgi_fork and scgi_fork flavors. They are pre-fork so it
creates a pool of long running processes and it passes connections to
them. This is the same model that Apache uses an is in theory quite
efficient. You do NOT have to wait for a fork on every connection
because the pool of processes has been forked in advance and is ready
and waiting.

Devin Torres

unread,
Apr 23, 2008, 5:29:06 PM4/23/08
to pylons-...@googlegroups.com
On Wed, Apr 23, 2008 at 3:20 PM, Christopher Weimann
<chris...@weimann.us> wrote:
> Flup has both fcgi_fork and scgi_fork flavors. They are pre-fork so it
> creates a pool of long running processes and it passes connections to
> them. This is the same model that Apache uses an is in theory quite
> efficient. You do NOT have to wait for a fork on every connection
> because the pool of processes has been forked in advance and is ready
> and waiting.

Do you happen to know the applicable setting to use when specifying
the size of that pool?

-Devin Torres

Graham Dumpleton

unread,
Apr 23, 2008, 8:00:10 PM4/23/08
to pylons-discuss
Use Apache and mod_wsgi and you have all that you want except playing
with 'processor affinity'. This is because Apache is multi process by
design and thus can properly make use of multiple CPUs. A lot of what
goes on in Apache is also not implemented in Python and thus not
subject to GIL issues.

You might also have a read of the following:

http://blog.dscpl.com.au/2007/09/parallel-python-discussion-and-modwsgi.html
http://blog.dscpl.com.au/2007/07/web-hosting-landscape-and-modwsgi.html

These explain some of these issues about multiprocess web servers and
the GIL.

Not sure why you just wouldn't let the operating system handle
allocation of processes/threads across CPUs as it is likely in general
to do a better job. Are you sure you aren't trying to solve a problem
that doesn't really exist.

Graham

Christopher Weimann

unread,
Apr 23, 2008, 9:04:50 PM4/23/08
to pylons-...@googlegroups.com
Devin Torres wrote:
> On Wed, Apr 23, 2008 at 3:20 PM, Christopher Weimann
>
> Do you happen to know the applicable setting to use when specifying
> the size of that pool?
>

Just to use fcgi_fork do this.

[server:main]
use = egg:PasteScript#flup_fcgi_fork


host = 0.0.0.0
port = 5000

I've never changed the defaults for the pool but I think this is
supposed to be the right way to do it.

[server:main]
paste.server_factory = flup.server.fcgi_fork:factory


host = 0.0.0.0
port = 5000

maxChildren=50
maxSpare=5
minSpare=1

Those are the default pool settings so that SHOULD be the equivalent of
the first config section. The problem is it doesn't seem to work that
way. If I use the server_factory and start things up with 'paster
serve development.ini' it seems fine until I hit ctrl-c to stop it.
Then all hell breaks loose and it starts forking off children like mad
bringing the machine to its knees.

I was planning on moving an app from quixote using its preforked
scgi_server.py to pylons with flup_scgi_fork but apparently thats a bad
idea. Either I'm using the factory wrong or I need to figure out whats
up with flup.

I suppose another option is using a Paste#http instance for each
processor and nginx as a reverse proxy spreading the load over them.

Cliff Wells

unread,
Apr 23, 2008, 9:36:56 PM4/23/08
to pylons-...@googlegroups.com

On Wed, 2008-04-23 at 21:04 -0400, Christopher Weimann wrote:

>
> I suppose another option is using a Paste#http instance for each
> processor and nginx as a reverse proxy spreading the load over them.

That's what I do.

Cliff

Marcin Kasperski

unread,
Apr 24, 2008, 5:07:47 AM4/24/08
to pylons-...@googlegroups.com
> Given this situation, I believe that despite paste making an effort to
> be multithreaded, it would still be advantageous to run a cluster of
> four Pylons instances and proxy to these using nginx.

You may consider apache with mod_wsgi, it can be simpler to manage in
such context. In particular WSGIDaemonProcess let you set the number
of dedicated processess...

--
----------------------------------------------------------------------
| Marcin Kasperski | A process that is too complex will fail.
| http://mekk.waw.pl | (Booch)
| |
----------------------------------------------------------------------

Devin Torres

unread,
Apr 24, 2008, 2:59:42 PM4/24/08
to pylons-...@googlegroups.com
> Use Apache and mod_wsgi and you have all that you want except playing
> with 'processor affinity'. This is because Apache is multi process by
> design and thus can properly make use of multiple CPUs. A lot of what
> goes on in Apache is also not implemented in Python and thus not
> subject to GIL issues.
>
> You might also have a read of the following:
>
> http://blog.dscpl.com.au/2007/09/parallel-python-discussion-and-modwsgi.html
> http://blog.dscpl.com.au/2007/07/web-hosting-landscape-and-modwsgi.html

Read.

Apache is starting to look attractive now. So I assume I'm not looking
for embedded mode, right? You say it's more performance, but at the
cost of what? Using the worker MPM and, say, daemon-mode, using, say,
4 processes and 16 threads each, would my processes be dying as soon
as they're not needed? My application takes awhile to load because I
autoload my database using SQLAlchemy. Is it that easy to configure
apache to start 4 by default and load balance between all of them?

Graham Dumpleton

unread,
Apr 26, 2008, 6:33:34 AM4/26/08
to pylons-discuss
On Apr 25, 4:59 am, "Devin Torres" <devin.tor...@gmail.com> wrote:
> >  Use Apache and mod_wsgi and you have all that you want except playing
> >  with 'processor affinity'. This is because Apache is multi process by
> >  design and thus can properly make use of multiple CPUs. A lot of what
> >  goes on in Apache is also not implemented in Python and thus not
> >  subject to GIL issues.
>
> >  You might also have a read of the following:
>
> >  http://blog.dscpl.com.au/2007/09/parallel-python-discussion-and-modws...
> >  http://blog.dscpl.com.au/2007/07/web-hosting-landscape-and-modwsgi.html
>
> Read.
>
> Apache is starting to look attractive now. So I assume I'm not looking
> for embedded mode, right? You say it's more performance, but at the
> cost of what? Using the worker MPM and, say, daemon-mode, using, say,
> 4 processes and 16 threads each, would my processes be dying as soon
> as they're not needed? My application takes awhile to load because I
> autoload my database using SQLAlchemy. Is it that easy to configure
> apache to start 4 by default and load balance between all of them?

If you are running a web site that requires the absolute best
performance possible, you would dedicate the Apache instance to
running just the one Python web application. That Apache instance
would be setup to use prefork MPM and you would use mod_wsgi embedded
mode. You would turn off keep alive for the Apache instance. You would
throw as much memory as possible into the system and you would use a
dedicated machine and not a VPS.

At the same time, all static media would be served from a distinct
nginx or lighttpd instance or via a content delivery provider. The
static media server would still use keep alive.

A typical default Apache prefork configuration is:

<IfModule mpm_prefork_module>
StartServers 5
MinSpareServers 5
MaxSpareServers 10
MaxClients 150
MaxRequestsPerChild 0
</IfModule>

That is, initially create 5 child processes for serving requests. To
support maximum 150 clients at a time, because each child process is
single threaded, it can theoretically create up to 150 child processes
to handle requests, if demand so requires. As demand drops off and
process become unused, it will start to kill of additional child
processes, when this occurs it will keep arise between 5 and 10 of
these additional servers as spares for future bursts in traffic.

Apart from where it creates additional process to meet demand and then
kills them off when no longer required, the child process will be kept
around for ever. This is because max requests per child is set to 0.
If you had a problem with memory creep in an application, you could
set max requests per child to some non zero number and child processes
would be recycled after than number of requests.

Now, depending on how expensive loading the application is initially
and what you expected traffic volume is, you would customise these
values to keep as many persistent child processes around as possible
to meet average demand, plus some measure of bursts in traffic. What
the values would be you would have to experiment with.

Anyway, that is the extreme end where performance is the most
important thing. In this case you would use prefork MPM and mod_wsgi
embedded mode.

The other extreme end is a memory constrained system, in which case
you would use worker MPM, with small number of initial Apache child
processes, plus use mod_wsgi daemon mode with single daemon process
with limited number of threads. Static media would be served on same
Apache instance.

The limited number of threads would be to minimise possibility of
memory blowing out due to multiple concurrent requests allocating a
lot of transient memory at the same time. To temper this one would set
maximum number of requests for a process and set inactivity timeouts
so that daemon processes recycled if not doing anything, thus bring
memory back to minimal levels.

Apache, through which MPM you use and how you configure it, plus
mod_wsgi and whether you use embedded mode or daemon mode, plus how
you configure daemon mode, provide a great deal of flexibility in
creating a setup anywhere between these extremes. What configuration
is going to be best really depends on a lot of different issues, many
of which you don't expand on, such as how important is performance,
how much memory is available, how much memory your applications
require etc etc etc.

Even when you think you have a good idea of what sort of configuration
will work, you need to then properly test it, as well as compare that
performance to alternate configurations.

Personally I'd probably just suggest you start out with mod_wsgi
embedded mode with either prefork or worker MPM and just see how it
goes and get a feel for how Apache works, especially with respect to
it use of multiple processes to handle requests.

For most peoples web site applications, the configuration doesn't
generally matter that much as there application never has high enough
demands on it to become an issue. The only problematic case which
affects a lot of people is trying to do too much in a memory
constrained VPS system because of them not wanting to put up the cash
to get a better system with more memory. :-)

BTW, it is assumed you are using UNIX type systems here. Apache on
Windows is not multiprocess and mod_wsgi on Windows doesn't support
daemon mode.

Graham
Reply all
Reply to author
Forward
0 new messages