Deployment Question

17 views
Skip to first unread message

Jonathan Vanasco

unread,
May 16, 2008, 4:38:24 PM5/16/08
to pylons-discuss
I'm a little unclear on the better ways to deploy a Pylons app.

My production servers run nginx -- is it better to use some fastcgi
support (if so, how?) or just do a "paster serve" and proxy to that
port?

I've read a handful of ways on how-to-deploy apps, and all seem
different. I've yet to see a comparison or "this is THE way to do it"
document.

Garland, Ken R

unread,
May 16, 2008, 4:46:22 PM5/16/08
to pylons-...@googlegroups.com
From general chat on #pylons a lot of people prefer to proxy, or
simply run paster.

In my deployment Paster is serving directly to the world.

I'm not sure anyone has taken up a comparison in the ways you speak
of, at least I have not come across it. I'm sure it would be a
welcomed test.

Cliff Wells

unread,
May 16, 2008, 9:44:58 PM5/16/08
to pylons-...@googlegroups.com

Probably because "THE" way can never satisfy everyone =)

Personally I proxy Nginx to paster or CP's wsgiserver. I find it a bit
easier to debug than FastCGI.

Cliff


Ross Vandegrift

unread,
May 18, 2008, 3:44:43 PM5/18/08
to pylons-...@googlegroups.com

All of my apps are deployed in a FastCGI environment with Apache. Our
live application server has a custom install of python2.4 with
appropriate module versions that we can care for beside the ones that
RedHat wants.

This works really well. It seems a lot of people hate FCGI for
different reasons, but I have found it to be pretty awesome. Apps are
very stable, no complicated proxying, and it's almost as performant as
mod_python.

I have considered converting our deployments to mod_python, but only
recently acquired a practical staging environment to test things like
that.

--
Ross Vandegrift
ro...@kallisti.us

"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37

Antonio Beamud Montero

unread,
May 19, 2008, 6:45:36 AM5/19/08
to pylons-...@googlegroups.com

El vie, 16-05-2008 a las 13:38 -0700, Jonathan Vanasco escribió:
> I'm a little unclear on the better ways to deploy a Pylons app.
>
> My production servers run nginx -- is it better to use some fastcgi
> support (if so, how?) or just do a "paster serve" and proxy to that
> port?

apache + mod_wsgi is other option, powerful and easy.

Greetings.

lapc...@gmail.com

unread,
May 19, 2008, 1:43:06 PM5/19/08
to pylons-discuss
what is your platform?

i just tried fcgid on centos 5.1 not long ago, with
mod_fcgid-2.1-3.el5 from epel.repo + apache (version come with
centos2.5) + python 2.4, following this wiki
http://wiki.pylonshq.com/display/pylonscookbook/Production+Deployment+Using+Apache%2C+FastCGI+and+mod_rewrite%2C+alternate+version

there seems to be a bug and i can reproduce it easy. a fresh project
create by "paster create -t pylons project" will run no problem. any
other project with sql access will just crash. i've try the quickwiki
demo and it won't work too. i've also try mod_wsgi from epel.repo but
the same occur.

i am too lazy to try and recompile everything so i end up running it
behind proxy....

rgds,
Vincent

On May 19, 3:44 am, Ross Vandegrift <r...@kallisti.us> wrote:
> On Fri, May 16, 2008 at 01:38:24PM -0700, Jonathan Vanasco wrote:
>
> > I'm a little unclear on the better ways to deploy a Pylons app.
>
> > My production servers run nginx -- is it better to use some fastcgi
> > support (if so, how?) or just do a "paster serve" and proxy to that
> > port?
>
> > I've read a handful of ways on how-to-deploy apps, and all seem
> > different. I've yet to see a comparison or "this is THE way to do it"
> > document.
>
> All of my apps are deployed in a FastCGI environment with Apache. Our
> live application server has a custom install of python2.4 with
> appropriate module versions that we can care for beside the ones that
> RedHat wants.
>
> This works really well. It seems a lot of people hate FCGI for
> different reasons, but I have found it to be pretty awesome. Apps are
> very stable, no complicated proxying, and it's almost as performant as
> mod_python.
>
> I have considered converting our deployments to mod_python, but only
> recently acquired a practical staging environment to test things like
> that.
>
> --
> Ross Vandegrift
> r...@kallisti.us

Ross Vandegrift

unread,
May 19, 2008, 2:06:10 PM5/19/08
to pylons-...@googlegroups.com
On Mon, May 19, 2008 at 10:43:06AM -0700, lapc...@gmail.com wrote:
>
> what is your platform?
>
> i just tried fcgid on centos 5.1 not long ago, with
> mod_fcgid-2.1-3.el5 from epel.repo + apache (version come with
> centos2.5) + python 2.4, following this wiki
> http://wiki.pylonshq.com/display/pylonscookbook/Production+Deployment+Using+Apache%2C+FastCGI+and+mod_rewrite%2C+alternate+version

Apache 2.0.52 from RHEL4, Python 2.4, Flup 0.5, some version of fcgi
that appears to have been custom installed and I cannot figure out the
version for the life of me....

Ross

--
Ross Vandegrift
ro...@kallisti.us

Vasco Rodrigues

unread,
May 19, 2008, 2:48:50 PM5/19/08
to pylons-...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm also using apache2 + mod_wsgi 2.0
http://www.modwsgi.org/

on a production server, and it works well and fast. And easy to use.

Vasco

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQEVAwUBSDHLkJTpjRZR9tbNAQIbFQf9G/MRtWZJPapI6cQngdh7Mdsc8+tVKMSE
4FpnQoJKmMEdC+ujBA/W2kdBdDsk6AAX2phqOF5g28/ml37uDyFsF2vEbZlbpWbI
hXaDkWTRsIgXVpVFuEVrgbG5Ou6ymqA88/o+w0rPQIukqN0U+vSqPq+JyOfOVcOU
LwI8Q8tSmsnJom/2Ps/z+bIclVxvWdoVnX+oLdTzyEbSbEB8OpeFD/23OGmGyDxq
J2Se29zqyutR6Lld3l5M49HvVv6ng8HnglS66tpAfBGm/HCSoCxeXZEv1e+ANyfq
HCxTViOs5tqJ/LfDRCxIcue6qQpXKXbhAuUVfRUFe8wruwBWVLSV5Q==
=p8It
-----END PGP SIGNATURE-----

Mike Orr

unread,
May 19, 2008, 3:36:11 PM5/19/08
to pylons-...@googlegroups.com
On Sun, May 18, 2008 at 12:44 PM, Ross Vandegrift <ro...@kallisti.us> wrote:
>
> On Fri, May 16, 2008 at 01:38:24PM -0700, Jonathan Vanasco wrote:
>>
>> I'm a little unclear on the better ways to deploy a Pylons app.
>>
>> My production servers run nginx -- is it better to use some fastcgi
>> support (if so, how?) or just do a "paster serve" and proxy to that
>> port?
>>
>> I've read a handful of ways on how-to-deploy apps, and all seem
>> different. I've yet to see a comparison or "this is THE way to do it"
>> document.

There is no THE way to do it. There are several ways which perform
well, and some of them may even work on your platform. I prefer HTTP
proxying because it's the closest to native request handling.

> This works really well. It seems a lot of people hate FCGI for
> different reasons, but I have found it to be pretty awesome.

People hate FCGI because it was buggy and error-prone for years.
Maybe it has gotten better now.

> Apps are
> very stable, no complicated proxying, and it's almost as performant as
> mod_python.

As you see, "complicated" is in the eye of the beholder. :) I would
say proxying is less complicated than *CGI.

> I have considered converting our deployments to mod_python, but only
> recently acquired a practical staging environment to test things like
> that.

There was a point in using mod_python before mod_wsgi existed. Now
that mod_wsgi exists, is more directly related to the task, and has a
better history of being reliable, why not use it?

--
Mike Orr <slugg...@gmail.com>

Jonathan Vanasco

unread,
May 20, 2008, 12:10:27 AM5/20/08
to pylons-discuss
so is Apache considered to be a good thing (through mod_wsgi ,
mod_python , or other ?)

i've been doing mod_perl dev for years, and have had some experience
with mod_python -- generally speaking, my experience is that if you
can avoid apache you're better off. i guess that's what is throwing
me off. i equate apache with "isn't there a better way now?"

Mike Orr

unread,
May 20, 2008, 1:33:00 AM5/20/08
to pylons-...@googlegroups.com

It's stable, gives you HTTPS and rewrite and named virtual hosts, and
gives clients a warm fuzzy feeling that you're using something they've
heard of. People say it also has a better knowledge of the quirky
useragents out there and can correct misformed requests better than
just exposing PasteHTTPServer or CherryPy directly, though I don't
know how true it is.

There are a few newer servers now (nginx, lighthttpd, cherokee) that
claim to be smaller, more efficient, and better organized than Apache.
On my production server I've found Apache sufficient so I haven't
bothered with them. But I do have a virtual server for our local
Python group, with Apache running Mailman and MoinMoin, and Apache
regularly dies there with an Out of Memory error, or the kernel's
memory terminator kills it and then hangs. So I've been meaning to
try nginx there and see if it works better.

--
Mike Orr <slugg...@gmail.com>

Graham Dumpleton

unread,
May 20, 2008, 1:58:09 AM5/20/08
to pylons-discuss
It really depends on what you want to do.

If you are running some small site more or less anything will do as it
generally isn't the web server that is your bottleneck. Of course, if
you are running in a memory constrained VPS you wouldn't use Apache
unless you properly investigate properly how you need to configure it
to work under such a constraint.

If you are going to run a large site which is able to respond well to
bursts in traffic, running Python embedded in Apache running prefork
MPM, with huge amounts of memory in the box is generally the best
approach. This is because although memory usage will be high, being
non multithreaded you can use any cpu/cores to best advantage, plus
you benefit from Apache's ability to create dynamically more processes
to handle demand when required and then reap them when no longer
required.

In other words, it is impossible to answer you question without really
knowing what your site is doing, how big it is, amount of traffic etc
etc.

Graham

lasizoillo

unread,
May 20, 2008, 7:40:19 AM5/20/08
to pylons-...@googlegroups.com
> There are a few newer servers now (nginx, lighthttpd, cherokee) that
> claim to be smaller, more efficient, and better organized than Apache.

Apache is a process/thread based server. Nginx, for example, is a
event-driven server. If you have worked with Twisted, you know that
even-driven code is harder to write. Nginx, lighthttpd, ... never have
the same number of modules than apache, because non-blocking code is
harder to write.

But, in front of pylons, you don't need a thousands of modules. Nginx
is very simple option with a fast static server, load balanced proxy
and fast-cgi suport (if you like use it). Another good functionality
of nginx: if you upload a file, it first write to disk and then
delegate to pylons. No more idle threads waiting for a slowly client
upload.

Another good option is Varnish[1]. A very fast reverse-proxy with an
expirable (purge) cache and (limited but interesting) ESI suport.

[1] http://varnish.projects.linpro.no/

Excuse my poor english:
Javi

ru...@decrop.net

unread,
May 20, 2008, 9:20:01 AM5/20/08
to pylons-discuss
I am running on my production server Apache and mod_scgi. Why?
Because when I looked to flup I saw I had the choice between scgi and
fcgi. I tried scgi first and it worked like a charm.

Using Apache was was a natural choice: I still have some php-based
content running on the same server

Ruben

Jonathan Vanasco

unread,
May 20, 2008, 11:09:03 AM5/20/08
to pylons-discuss
On May 20, 1:33 am, "Mike Orr" <sluggos...@gmail.com> wrote:
> People say it also has a better knowledge of the quirky
> useragents out there and can correct misformed requests better than
> just exposing PasteHTTPServer or CherryPy directly, though I don't
> know how true it is.

That's pretty much true.

> There are a few newer servers now (nginx, lighthttpd, cherokee) that
> claim to be smaller, more efficient, and better organized than Apache.

Those aren't claims.

> On my production server I've found Apache sufficient so I haven't
> bothered with them.

I've just been running nginx -> paster for personal projects &
internal dev. We're looking to launch a 100k requests/day min project
here, and I've got a client who I've sold onto Pylons and is looking
at building their entire web-service startup on it. Apache is pretty
much out-of-the-question... they'll need too many servers to handle
it's memory hogging and speed limitations


On May 20, 1:58 am, Graham Dumpleton <Graham.Dumple...@gmail.com>
wrote:

> It really depends on what you want to do.
Yes, that is correct. Sorry for being vague.

> If you are going to run a large site which is able to respond well to
> bursts in traffic, running Python embedded in Apache running prefork
> MPM, with huge amounts of memory in the box is generally the best
> approach. This is because although memory usage will be high, being
> non multithreaded you can use any cpu/cores to best advantage, plus
> you benefit from Apache's ability to create dynamically more processes
> to handle demand when required and then reap them when no longer
> required.

This is for a startup that will initially have 1pylons + 1 postgres
server, and scale out accordingly. They're a video startup, and have
some bigger names backing them, so I'd expect them to scale large
quickly.

However... I have years of experience with mod_perl, and have found
the overhead of apache to be nearly worthless. By proxying static
stuff off of apache onto nginx, and offloading code portions into
nginx/php or twisted, we were able to gain a lot of efficiency.
Apache does its job exceedingly well , but its bloated.



On May 20, 7:40 am, lasizoillo <lasizoi...@gmail.com> wrote:
> Nginx, lighthttpd, ... never have
> the same number of modules than apache, because non-blocking code is
> harder to write.

I think its also because they're only a few years old, and still the
underdog.
Ah... see, I ditched php off Apache years ago. It's very unnatural to
me. Running off of lighttpd or nginx i saw between 5 and 10x more r/s
possible. The only reason why I still use apache is for mod_perl
projects, and being able to program the server - not the webapp.

Jose Galvez

unread,
May 20, 2008, 1:21:47 PM5/20/08
to pylons-...@googlegroups.com
Anyone using mod_wsgi with Apache? how good is that for deployment, better/worse then mod_proxy with paster?
Jose

Ross Vandegrift

unread,
May 20, 2008, 1:42:27 PM5/20/08
to pylons-...@googlegroups.com

Good question. I have done some rudimentary testing on mod_python,
but not with mod_wsgi. It does make a lot of sense to me - I should
definitely test that out before I make any decisions on production
deployment changes.

Ross Vandegrift

unread,
May 20, 2008, 1:49:22 PM5/20/08
to pylons-...@googlegroups.com

I don't mean to come off as sounding curt, but I've often heard that
from people who haven't really maintained the alternatives. Or that
have an application that gets deployed in a highly isolated environment,
where interoperability means nothing.

Frankly, the flexability of Apache is what makes it so much better
than the alternatives. It has solid support for just about anything
related to the web. It has excellent, complete documentation
surrounded by a large, knowledgeable user community.

I equate Apache with "There is no better general solution", even
though various half-assed alternatives have been cooked up.

Dalius Dobravolskas

unread,
May 20, 2008, 3:03:36 PM5/20/08
to pylons-...@googlegroups.com
Jose Galvez wrote:
> Anyone using mod_wsgi with Apache? how good is that for deployment,
> better/worse then mod_proxy with paster?
> Jose
I'm using mod_wsgi with Apache for my personal server. I'm hosting two
pylons projects, trac and my public mercurial repositories (all WSGI
based). Together with MySQL this configuration requires about 100Mb RAM
- that's pretty low number, but it works fast enough. From other side
the traffic is not very high on my server about 35000 visits per month
(at least google analytics reports such number) - that means about one
request in 3 minutes (while one user request can require up to 20 files).

If you are interested in my configuration there is no magic:
I am using apache's worker MPM (default on Debian) - prefork eats too
much memory. As well I have set ThreadStackSize to 500000. That's all!
Works just perfect.

Actually there was similar discussion on this group already. Follow what
Graham Dumpleton says and you will be in the right path.

Regards,
Dalius
http://blog.sandbox.lt

Mike Orr

unread,
May 20, 2008, 4:33:38 PM5/20/08
to pylons-...@googlegroups.com
On Tue, May 20, 2008 at 8:09 AM, Jonathan Vanasco <jona...@findmeon.com> wrote:
> I've just been running nginx -> paster for personal projects &
> internal dev. We're looking to launch a 100k requests/day min project
> here, and I've got a client who I've sold onto Pylons and is looking
> at building their entire web-service startup on it. Apache is pretty
> much out-of-the-question... they'll need too many servers to handle
> it's memory hogging and speed limitations

One thing to keep in mind is that new sites often have unrealistic
assumptions about their growth and the hardware required for it. A
video site has special requirements due to the huge files it handles,
but for ordinary sites with text and JPGs and documents up to 15 MB
each, 100,000 requests/day is not that many. That's 4166/hour or
70/minute. Any non-anemic server can do that in its sleep. Our
server has two sites each doing more than that several times a day,
plus three smaller sites.

Granted, the server is ridiculously overpowered: 2 CPU, 2 GB RAM, 300
GB HD, Ubuntu 7.10. But one site inefficiently duplicates a 120 MB
object database per process (and runs 3 processes minimum, so that's
240 MB overhead), so that's where some of the memory is going. The
server load right now is 0.11; free memory is 18 MB (+269 MB in
discardable buffers and 1.2 GB "cached"). That's with Apache -> scgi
-> Quixote for the largest site and Apache -> mod_proxy ->
PasteHTTPServer -> Pylons for some of the others. I'm migrating all
the sites to Pylons one by one.

We use PHP only for PHPMyAdmin, for which I've been unable to find an
adequate alternative. We use it only so our non-technical appadmins
can make occasional changes and reports.

--
Mike Orr <slugg...@gmail.com>

Cliff Wells

unread,
May 20, 2008, 5:01:26 PM5/20/08
to pylons-...@googlegroups.com

On Tue, 2008-05-20 at 13:49 -0400, Ross Vandegrift wrote:
> On Mon, May 19, 2008 at 09:10:27PM -0700, Jonathan Vanasco wrote:
> > so is Apache considered to be a good thing (through mod_wsgi ,
> > mod_python , or other ?)
> >
> > i've been doing mod_perl dev for years, and have had some experience
> > with mod_python -- generally speaking, my experience is that if you
> > can avoid apache you're better off. i guess that's what is throwing
> > me off. i equate apache with "isn't there a better way now?"
>
> I don't mean to come off as sounding curt, but I've often heard that
> from people who haven't really maintained the alternatives. Or that
> have an application that gets deployed in a highly isolated environment,
> where interoperability means nothing.

Funny, over the last couple of years, I've deployed Nginx in front of
nearly a hundred websites, ranging from Pylons and TurboGears to PHP and
Wordpress down to simple static HTML.

According to Netcraft, Nginx is now deployed in front of over 1 million
domains. Not nearly as much as Apache, but clearly not all of those are
"highly isolated environments". In fact, many sites with heavy traffic
are moving to Nginx due to it's vastly superior scalability.

Some notables that use Nginx:

wordpress.com
youtube.com
hulu.com
rambler.ru
torrentreactor.net
kongregate.com

Where did you get your research from? (Actually, don't answer that, I
can guess).

> Frankly, the flexability of Apache is what makes it so much better
> than the alternatives. It has solid support for just about anything
> related to the web. It has excellent, complete documentation
> surrounded by a large, knowledgeable user community.

I'd qualify this paragraph as "some of Apache's strengths are", rather
than a blanket "it's better". For some people, in some settings, it is
better. For others it isn't. If you need high scalability, it isn't
the best. If you need a small memory footprint it's not the best. If
you prefer a sane configuration syntax it isn't the best. If you need
all three then it's arguably amongst the worst.

> I equate Apache with "There is no better general solution", even
> though various half-assed alternatives have been cooked up.

I certainly intend to sound curt:

There are places where Apache is the only possible solution. Personally
I've found that to be the minority of deployments where you need
specialized modules that are only available on Apache. When you choose
Apache you gain a wide array of modules and add-ons that might neatly
solve a particular problem, but you sacrifice efficiency, scalability,
and simplicity of configuration. Choosing a web server, like any other
software selection, is a matter of weighing pros and cons and selecting
the one that fits your needs the best.

We all have strong opinions about software, but try to stick to reasoned
arguments and leave the insults at home, especially when it's abundantly
clear you've never even tried (or perhaps even read about) the
alternatives.

Cliff


Graham Dumpleton

unread,
May 20, 2008, 8:48:14 PM5/20/08
to pylons-discuss
Cliff Wells wrote:
> On Tue, 2008-05-20 at 13:49 -0400, Ross Vandegrift wrote:
> > On Mon, May 19, 2008 at 09:10:27PM -0700, Jonathan Vanasco wrote:
> > > so is Apache considered to be a good thing (through mod_wsgi ,
> > > mod_python , or other ?)
> > >
> > > i've been doing mod_perl dev for years, and have had some experience
> > > with mod_python -- generally speaking, my experience is that if you
> > > can avoid apache you're better off. i guess that's what is throwing
> > > me off. i equate apache with "isn't there a better way now?"
> >
> > I don't mean to come off as sounding curt, but I've often heard that
> > from people who haven't really maintained the alternatives. Or that
> > have an application that gets deployed in a highly isolated environment,
> > where interoperability means nothing.

Using mod_perl as an indicator is actually a bad idea. This is because
it tends to be the worst of the bunch when it comes to bloating out
Apache. Thus saying that all solutions which embed an interpreter in
Apache are bad based on what one sees with mod_perl is not a good
idea.

The mod_python module does carry a bit more bloat in its base level
memory profile than it needs to and this has been identified and
discussed on mod_python lists, although nothing down about it yet even
though changes relatively simple to address it.

The main reason though why mod_python has got a bad reputation for
memory usage is threefold.

The first is that older versions of mod_python leaked memory. This has
been addressed in later versions.

The second is that the Python installation being used didn't provide a
usable shared library for Python itself and instead there was only a
static library. This mean that a static copy of Python was being
included in mod_python module. Because of shared library issues this
was being converted to local process memory rather than being shared
and so base level process sizes were more than they need to. So,
install your Python properly and you will avoid this issue. It has
also only been the last year or so that Linux distributions have been
fixing their Python installations to provide a usable Python shared
library.

The third reason is simply that Python web applications can get fat,
especially when people developing the applications don't take memory
usage into consideration at all. Unfortunately people see the large
processes sizes caused by their own application, especially when
viewed in conjunction with fact that Apache is a multi processor
server, and so there are multiple copies, and wrongly think it is
mod_python that is causing their problems. Instead it is their own fat
application and a lack of understanding of how Apache works and how to
configure it so as to limit resource usage.

So, although mod_python is at fault a little bit, in the main it is
actually factors outside of mod_python.

FWIW, mod_wsgi uses even less base level memory than mod_python. As
with mod_python, it is still affected by poor Python installations and
fat applications being run on it. The mod_wsgi module provides better
flexibility however in being able to control memory usage through
process recycling.

> > Frankly, the flexability of Apache is what makes it so much better
> > than the alternatives. It has solid support for just about anything
> > related to the web. It has excellent, complete documentation
> > surrounded by a large, knowledgeable user community.
>
> I'd qualify this paragraph as "some of Apache's strengths are", rather
> than a blanket "it's better". For some people, in some settings, it is
> better. For others it isn't. If you need high scalability, it isn't
> the best. If you need a small memory footprint it's not the best. If
> you prefer a sane configuration syntax it isn't the best. If you need
> all three then it's arguably amongst the worst.

What is silly is people who think that any one web server is going to
be the complete solution anyway. One gets truly sick of people asking
what is the fastest web server to use as there is so much more to
that. Speed does not equate to scalability, and just because one
server is fast at serving static files doesn't mean it is good for
dynamic web applications.

So, servers like nginx and lighttpd will be better than Apache for
handling serving of static files. Just because they are though,
doesn't mean you should also then use them to host your dynamic web
application code.

For a Python web site with a requirement to also serve up a lot of
static media files, one would definitely want to use a separate nginx
or lighttpd server, or even use a specialised CDN. This should be run
on its own servers and not on the same server as your web application.

As to your dynamic Python code, it would generally be best served as I
said before by running Apache prefork MPM with code running directly
in Apache processes, ie., embedded mode of mod_wsgi. That server would
be dedicated to running just the Python web application code, no
static files sharing, no sharing the server with PHP or any other
stuff.

With a dynamic application, you can also benefit from doing things
like turning off keep alive as in general one page will not generate
successive requests against the Python web application. You would
still use keep alive on on the static media servers though as that is
where you would benefit from it. Using it for dynamic web application,
you are only going to serve to tie up connections longer than you need
to and thus waste resources.

Anyway, there are a lot of other things you can do as well, especially
in your application space and how you construct your HTML code so as
to allow browsers to parallelise loading as much as possible.

So, things are not as simple as most think it is going to be. There is
a lot to building what is required for large scalable sites and
starting with the belief that a single server type is going to do
everything you want isn't going to help as down the line it may just
cause you to hit any performance wall quicker.

In summary, use each tool for what it is good at. Don't try and make
one thing do everything.

Graham

Graham Dumpleton

unread,
May 20, 2008, 9:13:17 PM5/20/08
to pylons-discuss
On May 21, 1:09 am, Jonathan Vanasco <jonat...@findmeon.com> wrote:
> > If you are going to run a large site which is able to respond well to
> > bursts in traffic, running Python embedded in Apache running prefork
> > MPM, with huge amounts of memory in the box is generally the best
> > approach. This is because although memory usage will be high, being
> > non multithreaded you can use any cpu/cores to best advantage, plus
> > you benefit from Apache's ability to create dynamically more processes
> > to handle demand when required and then reap them when no longer
> > required.
>
> This is for a startup that will initially have 1pylons + 1 postgres
> server, and scale out accordingly.  They're a video startup, and have
> some bigger names backing them, so I'd expect them to scale large
> quickly.
>
> However... I have years of experience with mod_perl, and have found
> the overhead of apache to be nearly worthless. By proxying static
> stuff off of apache onto nginx, and offloading code portions into
> nginx/php or twisted, we were able to gain a lot of efficiency.
> Apache does its job exceedingly well , but its bloated.

Also see my comments in other post, but when you say 'proxy' I hope
you don't really mean 'proxy'.

Although I explained that it is a good idea to off load static media
to a separate server, I then really don't understand, as a lot of
people do do, why you would also then want to make that static media
server the front end and 'proxy' requests for the dynamic web
application through it to something else, such as Apache, or even
fastcgi processes.

All you are doing in doing this is adding additional latency into your
requests due to the extra hop. If building a large site and are using
a separate media server then use different IPs for each. Preferably
this should be a completely different machine. If it has to be the
same machine and you can't get additional IPs to allow port 80 to be
used for each, then use different ports.

If one really has to use a software proxy, then also perhaps look at
dedicated solutions like Pound. It may be the case that nginx serves
okay as a proxy, but it isn't purpose built for that and so solutions
like Pound may provide a more flexible solution which is easier to
configure and manage when load balancing across many machines. Better
still perhaps is to use hardware based solutions, or DNS features, if
needing to load balance across a cluster of machines.

Proxying also adds additional overhead even for the fastcgi like case.
This is in part why Apache prefork MPM and an embedded solution is
better if you want to squeeze out as much performance as possible.
That is, the process accepting the connection is the one handling the
request, not some backend process it is in turn talking to. Yes you
loose ability to run code as a non Apache user, but if the Apache is
dedicated to the application, you can make the whole server run as the
required user instead of the Apache user anyway, so not an issue.

Graham

Jonathan Vanasco

unread,
May 21, 2008, 2:42:45 AM5/21/08
to pylons-discuss


On May 20, 9:13 pm, Graham Dumpleton <Graham.Dumple...@gmail.com>
wrote:

> Also see my comments in other post, but when you say 'proxy' I hope
> you don't really mean 'proxy'.

I wrote 'proyxing' when I meant 'pushing'

Let me rephrase...

In my standard setups,

nginx on port 80
maps static content routes to webroots (ie: public pylons folder)
proxies other requests to application servers for dynamic content -
to free them up for requests

nginx is a great app - it does proxy and static html serving really
well

i was able to get a mod_perl app to go from
50 r/s - 1 mp server
90 r/s - apache vanilla on :80 for static, proxy to apache :8081
for dynamic
140 r/s - nginx on :80 for static, proxy to :8081 for dynamic

granted, in there i did a lot of juggling with min/max servers and
memory... but the continual offloading from apache freed up more
resources and let me do that.

Jonathan Vanasco

unread,
May 21, 2008, 2:54:16 AM5/21/08
to pylons-discuss

On May 20, 4:33 pm, "Mike Orr" <sluggos...@gmail.com> wrote:

> each, 100,000 requests/day is not that many.  That's 4166/hour or
> 70/minute.  Any non-anemic server can do that in its sleep.  Our
> server has two sites each doing more than that several times a day,
> plus three smaller sites.

when you take in peak times though, its really 200/minute during some
hours and 10-30/min on others. (at least for US only targeteed sites)

most sites do have unrealistic expectations -- but this one is tied to
an online and offline marketing campaign. so i'm trying to be
prepared.

> We use PHP only for PHPMyAdmin, for which I've been unable to find an
> adequate alternative.  We use it only so our non-technical appadmins
> can make occasional changes and reports.

there's nothing wrong about having php running. i still use php for
random stuff too.

the only issue can be how you're running it. if its mod_php, you'll be
much better off dropping to the php as (f)cgi option, or running
things through nginx. i only need to budget < 90mb if i run it on
nginx -> 4 fcgi processes + 64mb accelerator cache. the only
performance issue i ever encountered with php has been it bloating/
slowing apache. i only ran it as a FCGI a few times for tests though;
it was easier to install/maintain on nginx and had a faster bench --
wasn't worth tweaking anymore.


Cliff Wells

unread,
May 21, 2008, 3:25:30 AM5/21/08
to pylons-...@googlegroups.com

On Tue, 2008-05-20 at 18:13 -0700, Graham Dumpleton wrote:
> If one really has to use a software proxy, then also perhaps look at
> dedicated solutions like Pound. It may be the case that nginx serves
> okay as a proxy, but it isn't purpose built for that and so solutions
> like Pound may provide a more flexible solution which is easier to
> configure and manage when load balancing across many machines.

I often agree with you in general (despite disagreeing on particular
details <wink>). However you really ought to quit making unfounded
generalizations about Nginx. It is, in fact, purpose-built to be a
proxy. It's also far better than Pound (which I also used before
switching to Nginx) as a proxy. It's true it doesn't (currently) have
very sophisticated load-balancing (round-robin only, unless you use a
3rd party module), but it's almost stupidly simple to configure and
easily outperforms Pound, especially under load.

Regards,
Cliff

Mike Orr

unread,
May 21, 2008, 4:09:23 AM5/21/08
to pylons-...@googlegroups.com
On Tue, May 20, 2008 at 11:54 PM, Jonathan Vanasco
<jona...@findmeon.com> wrote:
>
>
> On May 20, 4:33 pm, "Mike Orr" <sluggos...@gmail.com> wrote:
>
>> each, 100,000 requests/day is not that many. That's 4166/hour or
>> 70/minute. Any non-anemic server can do that in its sleep. Our
>> server has two sites each doing more than that several times a day,
>> plus three smaller sites.
>
> when you take in peak times though, its really 200/minute during some
> hours and 10-30/min on others. (at least for US only targeteed sites)
>
> most sites do have unrealistic expectations -- but this one is tied to
> an online and offline marketing campaign. so i'm trying to be
> prepared.

Well, for a Pylons site with Postgres that wants to be scalable up
front, a three-server setup makes sense. One for the Pylons app, one
for the static content, and one for the database. By putting them on
three servers, you've already proven they can be split up. Then you
can add a second server to any of the trio when it starts approaching
capacity. A second static server is of course easy. A second Pylons
server means you have to write the app carefully; e.g., store state
data in cookies, cookie-based sessions, or the database, not on the
local filesystem.

A second database server is usually the hardest part; you'd need a
database engine that propagates changes from any server to all the
others (I think it's called replication). Or designate one server the
master, but then again it's a single point of failure. If you have
some tables that are mostly-read and others that are highly
interactive but there's no relations between the two, you can put them
in different databases on different servers with different
master/slave and replication policies. This all shows the advantage
of having the database on its own server in the first place; you can
avoid the replication problem as long as possible.

The problem with static files is that often they have to be password
protected and only given to certain people (and their usage logged for
statistics). That rules out using a static server for them unless
your auth/logging system is in the webserver rather than in Pylons.
And those systems are usually specific to the webserver, which then
makes it harder to switch webservers.

--
Mike Orr <slugg...@gmail.com>

Graham Dumpleton

unread,
May 21, 2008, 4:13:55 AM5/21/08
to pylons-discuss
I stand corrected. I will admit that I am not a nginx, nor even a
Pound expert.

I would say however that mod_proxy module in Apache is also purpose
built for proxying, that doesn't mean it is a good idea to use it.

What I was trying to say was that where there are solutions whose only
purpose is to do something, also look at them as well as those which
may do it as one function of many. This is because the solutions which
try to do just the one job do often come with a better feature set for
that one task, that depending on what you are doing may make more
sense. Yes it may mean a drop in performance for some aspects of the
problem being solved, but this may be made up in other ways through
the other features it provides.

Sorry, I am generalising again of course, but when you really don't
know the exact details of what a person is wanting to setup and why,
it is hard to do anything else. :-)

Graham

Jonathan Vanasco

unread,
May 21, 2008, 10:31:45 AM5/21/08
to pylons-discuss


On May 21, 4:09 am, "Mike Orr" <sluggos...@gmail.com> wrote:
> On Tue, May 20, 2008 at 11:54 PM, Jonathan Vanasco
> Well, for a Pylons site with Postgres that wants to be scalable up
> front, a three-server setup makes sense.  One for the Pylons app, one
> for the static content, and one for the database.

I'd disagree. You generally shouldn't need to do a dedicated box for
static content - just use a lightweight static server like nginx or
lighttpd. You should be able to just do nginx + app on server 1 , and
database on server 2 -- the DB will almost always create a
'bottleneck' of sorts in web-development that needs to be addressed
first. As your app needs to grow, the HTTP box becomes more of a load-
balancer/gateway, and you network webservers and db servers behind
it. I've done that a dozen of times with ease.

> A second Pylons server means you have to write the app carefully; e.g., store state
> data in cookies, cookie-based sessions, or the database, not on the
> local filesystem.
That's less careful programming and more smart planning and
forethought.

> A second database server is usually the hardest part; you'd need a
> database engine that propagates changes from any server to all the
> others
I always spec apps to be replication-ready: separate handles for read
& write, and maintaining strict documentation and code reviews as to
which handles are used where. Usually a 'write' should only be used
in account management situations. I even go so far as to make sure
all logging functionality is on a different handle (sometimes even to
a different DB). Approaches like this at the start make clustering
and replication simple.

Most file-access isssues can be handled with an 'authticket' style
approach across servers. If you're dealing with a specific per-file
per-access approach, then yeah - you're likely better to have your
pylons appserver handle that ( though you might be able speed things
up with some custom plugins or hooks in nginx/lighttpd)

Jonathan Vanasco

unread,
May 21, 2008, 11:02:34 AM5/21/08
to pylons-discuss


On May 20, 8:48 pm, Graham Dumpleton <Graham.Dumple...@gmail.com>
wrote:

> Using mod_perl as an indicator is actually a bad idea. This is because
> it tends to be the worst of the bunch when it comes to bloating out
> Apache. Thus saying that all solutions which embed an interpreter in
> Apache are bad based on what one sees with mod_perl is not a good
> idea.

I'm not using mod_perl as an indicator -- I'm using it as a reference
point and a learning example. Configuring and optimizing mod_perl
taught me the range of bloat and inefficiency in Apache; and taught me
how to best optimize the server. When I HAVE to run apache, I
routinely run multiple versions on each system ( single binary/install
-- different .conf files / ports / startups ) - to better optimize
memory usage and resources. Because Apache is a slower server, tends
to eat a bit of memory on its own - and lots when you turn modules on
-- you can get much better performance by segmenting your
functionality.

> The mod_python module does carry a bit more bloat in its base level
> memory profile than it needs to and this has been identified and
> discussed on mod_python lists, although nothing down about it yet even
> though changes relatively simple to address it.

mod_perl doesn't have bad bloat; I wouldn't say mod_python does
either. but in the perl world, the system acheives massive speed
increases by compiling code into in-memory functions, cloning mem from
the master, and reserving memory as more variables and functions are
called.

a typical child cycle is:
start - 9MB
request 1 - 18MB
request 2 - 24MB
request 3 - 39MB
request 500 - 40MB

then it dies, and we start from scratch

IIRC, you see something similar with mod_php ; by pushing mod_php to
fcgi and using some agressive memory sharing + accelerator caching,
you can make php 4x faster; eliminating apache can make you 10x
faster.

> So, servers like nginx and lighttpd will be better than Apache for
> handling serving of static files. Just because they are though,
> doesn't mean you should also then use them to host your dynamic web
> application code.

That's true.

But they're also better for fastcgi integration, for misc modules, and
many other tasks.

Apache has a ton of documentation, knowledge, and a rich module
library & community. It's a standard, but that's it.

I think this is where we'll have to agree to disagree - you seem very
experienced in Apache and I'm grateful for your insight -- but your
knowledge and understanding of alternative systems is very different.
You've misunderstood the capabilities and focus of servers like Nginx
and Lighttpd , which makes it hard to think of your insight in terms
of comparison.


> With a dynamic application, you can also benefit from doing things
> like turning off keep alive as in general one page will not generate
> successive requests against the Python web application. You would
> still use keep alive on on the static media servers though as that is
> where you would benefit from it.

If you put nginx/lighty/apache on port 80, and proxy to your
webapplication on port 8080 or in a cluster, you can get tremendous
improvements through keepalive tweaking and general request
architecture. you can have your port 80 connection slurp the request,
push it to the app, and slurp the response -- making your appserver
only handle the process time -- instead of waiting for slow clients to
send/receive the request.

> In summary, use each tool for what it is good at. Don't try and make one thing do everything.

That's absolutely true.

I think you should really look into nginx and lighttpd -- i think they
might surprise you. They've quickly become the essential components
to all the high-performance wep applications , as servers , proxies,
balancers or gateways -- and for a reason.

Ross Vandegrift

unread,
May 21, 2008, 11:55:16 AM5/21/08
to pylons-...@googlegroups.com
On Tue, May 20, 2008 at 02:01:26PM -0700, Cliff Wells wrote:
> According to Netcraft, Nginx is now deployed in front of over 1 million
> domains. Not nearly as much as Apache, but clearly not all of those are
> "highly isolated environments". In fact, many sites with heavy traffic
> are moving to Nginx due to it's vastly superior scalability.
>
> Some notables that use Nginx:
>
> wordpress.com
> youtube.com
> hulu.com
> rambler.ru
> torrentreactor.net
> kongregate.com
>
> Where did you get your research from? (Actually, don't answer that, I
> can guess).

Sites that are amongst the largest on the internet fall into a corner
case in my mind. As Mike pointed out, sites have an unrealistic
expectation of traffic. I've been involved in the average cases.

My claims come from years in the service provider industry,
watching various deployments. I've been an Apache fan for a long
time, and have seen and deployed hundreds of servers, serving
thousands of sites on Apache. None are youtube.com - and I agree that
this is an important point.

Comparing my Apache deployments with deployments of other servers,
year after year, Apache won hands down:

1) Users of other HTTP servers are always fiddling with them,
restarting after crashes. This may be due to misuse, non-optimal
config - I'm not sure. But I've never had stability issues like this
with Apache.

2) Apache is well-understood by many more folks. There's an army of
support reps downstairs that are compentent, if not experts, at
maintaining and troubleshooting it. The other servers come across as
mysteries (despite often being highly trivial), and end up escalated
instead of fixed.

3) Documentation for Apache is through, searchable, and
understandable. It's full of examples, is available for multiple
versions of httpd. I have seen the Apache documentation turn
motivated people from competent levels to expert just by googling for
it. I'm not saying that other servers don't have decent docs - but
Apache's are amongst the best docs for any software I have ever read,
and I have seen them function in production for years.


> I'd qualify this paragraph as "some of Apache's strengths are", rather
> than a blanket "it's better". For some people, in some settings, it is
> better. For others it isn't. If you need high scalability, it isn't
> the best. If you need a small memory footprint it's not the best. If
> you prefer a sane configuration syntax it isn't the best. If you need
> all three then it's arguably amongst the worst.

Yea, you're right - I'm tacitly assuming that we're talking about the
average cases. Other http servers definitely excel at things, especially
for workloads they have been specially designed. But for every youtube
there's tens of thousands of websites with more average traffic and control
needs.

Mike Orr

unread,
May 21, 2008, 12:49:32 PM5/21/08
to pylons-...@googlegroups.com
On Wed, May 21, 2008 at 7:31 AM, Jonathan Vanasco <jona...@findmeon.com> wrote:
>
>
>
> On May 21, 4:09 am, "Mike Orr" <sluggos...@gmail.com> wrote:
>> On Tue, May 20, 2008 at 11:54 PM, Jonathan Vanasco
>> Well, for a Pylons site with Postgres that wants to be scalable up
>> front, a three-server setup makes sense. One for the Pylons app, one
>> for the static content, and one for the database.
>
> I'd disagree. You generally shouldn't need to do a dedicated box for
> static content - just use a lightweight static server like nginx or
> lighttpd.

He did say scalable and video, which I took to mean ultra-heavy use of
very large files, and overkill didn't matter. The disk space alone is
one reason why static content might want to be on a separate box, so
it can be plugged into a large disk array or replicated easier, etc.

>> A second database server is usually the hardest part; you'd need a
>> database engine that propagates changes from any server to all the
>> others
> I always spec apps to be replication-ready: separate handles for read
> & write, and maintaining strict documentation and code reviews as to
> which handles are used where. Usually a 'write' should only be used
> in account management situations. I even go so far as to make sure
> all logging functionality is on a different handle (sometimes even to
> a different DB). Approaches like this at the start make clustering
> and replication simple.

By handle you mean a database connection?

So how do you handle writes? You direct them all to one master server
and let it propagate the changes to the slaves? Have you found a good
replicable database among the free ones that work with SQLAlchemy?

> Most file-access isssues can be handled with an 'authticket' style
> approach across servers. If you're dealing with a specific per-file
> per-access approach, then yeah - you're likely better to have your
> pylons appserver handle that ( though you might be able speed things
> up with some custom plugins or hooks in nginx/lighttpd)

What do you mean by authticket?

--
Mike Orr <slugg...@gmail.com>

Cliff Wells

unread,
May 21, 2008, 1:22:09 PM5/21/08
to pylons-...@googlegroups.com

On Wed, 2008-05-21 at 01:13 -0700, Graham Dumpleton wrote:
> On May 21, 5:25 pm, Cliff Wells <cl...@develix.com> wrote:

> I would say however that mod_proxy module in Apache is also purpose
> built for proxying, that doesn't mean it is a good idea to use it.

The advantage Nginx brings over even purpose-built proxies is its async
model. Pound is quite capable and easy to use but has limited
scalability due to being threaded. Nginx can handle many thousand
requests per second while using only a few megabytes of RAM and a
relatively small amount of CPU.

> What I was trying to say was that where there are solutions whose only
> purpose is to do something, also look at them as well as those which
> may do it as one function of many. This is because the solutions which
> try to do just the one job do often come with a better feature set for
> that one task, that depending on what you are doing may make more
> sense. Yes it may mean a drop in performance for some aspects of the
> problem being solved, but this may be made up in other ways through
> the other features it provides.

Again, in general I'd agree (and I really used to like Pound a lot), but
in this case, unless you need sophisticated load-balancing algorithms,
it's hard to beat Nginx.

> Sorry, I am generalising again of course, but when you really don't
> know the exact details of what a person is wanting to setup and why,
> it is hard to do anything else. :-)

Just because we don't necessarily know exactly what we're arguing about
doesn't mean we have to stop ;-)

Regards,
Cliff

Jonathan Vanasco

unread,
May 21, 2008, 2:09:58 PM5/21/08
to pylons-discuss


On May 21, 12:49 pm, "Mike Orr" <sluggos...@gmail.com> wrote:
> He did say scalable and video, which I took to mean ultra-heavy use of
> very large files, and overkill didn't matter. The disk space alone is
> one reason why static content might want to be on a separate box, so
> it can be plugged into a large disk array or replicated easier, etc.

Ah yes ( that was my , btw ). At first its not going to be much
content in quantity - just a few large files. A sep box might be a
good idea. Thanks for looking out to me!

> By handle you mean a database connection?
Yep. Read/Write/Log/Session
I'm still trying to figure out a way to consolidate them all into
'cloned' handles under pylons - for small apps hitting 1 db its a bit
of overkill. The last few times I had to cluster an app and built
with this forethought though, all I had to do was create

> So how do you handle writes?  You direct them all to one master server
> and let it propagate the changes to the slaves? Have you found a good
> replicable database among the free ones that work with SQLAlchemy?
Postgres (through extensions)- though I haven't used replication under
SQLAlchemy or Pylons yet. I don't assume there will be any problems.

This doesn't solve the need to partition - but let's be honest... if
you're successful you need to Cluster/Replicate; if you SUCESSFUL!!!
you need to partition - and by that time you have the resources to
have a team of engineers handle that exclusively.

> What do you mean by authticket?

http://www.openfusion.com.au/labs/mod_auth_tkt/

it's a great plugin for apache; perl support is official, there are py
php and other contribs.

Ross Vandegrift

unread,
May 21, 2008, 2:55:00 PM5/21/08
to pylons-...@googlegroups.com
On Wed, May 21, 2008 at 09:49:32AM -0700, Mike Orr wrote:
> So how do you handle writes? You direct them all to one master server
> and let it propagate the changes to the slaves? Have you found a good
> replicable database among the free ones that work with SQLAlchemy?

Postgres was mentioned; I'll add that MySQL's replication is also
quite good for splitting out reads against replicas if your
application isn't write bound.

MySQL Cluster is now GA, with provides for HA db servers and
performance scalability without splitting reads and writes between
different servers. Downside is that you need at least four
servers....

Cliff Wells

unread,
May 21, 2008, 3:20:18 PM5/21/08
to pylons-...@googlegroups.com

On Wed, 2008-05-21 at 11:55 -0400, Ross Vandegrift wrote:
> On Tue, May 20, 2008 at 02:01:26PM -0700, Cliff Wells wrote:
> > According to Netcraft, Nginx is now deployed in front of over 1 million
> > domains. Not nearly as much as Apache, but clearly not all of those are
> > "highly isolated environments". In fact, many sites with heavy traffic
> > are moving to Nginx due to it's vastly superior scalability.
> >
> > Some notables that use Nginx:
> >
> > wordpress.com
> > youtube.com
> > hulu.com
> > rambler.ru
> > torrentreactor.net
> > kongregate.com
> >
> > Where did you get your research from? (Actually, don't answer that, I
> > can guess).
>
> Sites that are amongst the largest on the internet fall into a corner
> case in my mind. As Mike pointed out, sites have an unrealistic
> expectation of traffic. I've been involved in the average cases.

As have I. But I'm going to disassemble this argument below.

> My claims come from years in the service provider industry,
> watching various deployments. I've been an Apache fan for a long
> time, and have seen and deployed hundreds of servers, serving
> thousands of sites on Apache.

I think this is true for all of us. The difference is that the world
has changed in the last couple of years and now there's more options to
choose from. And by "options" I don't mean "a smaller, less capable
Apache clone", I mean a paradigm shift in how to handle high loads.
It's well known that threaded/process based servers cannot scale beyond
a reasonable point. Nginx and Lighttpd are async and are specifically
written to address the C10K problem.

As you point out, not all sites need this sort of scalability (certainly
none that I've written or hosted), however there's a fallout benefit to
this work: these servers can scale specifically because they don't use
threads which means they also use *considerably* less memory and also
tend to use much less CPU. I challenge you to name any system that
won't benefit from reduced RAM and CPU utilization.

To give you a concrete example: last week a colleague of mine converted
a server running Apache and mod_php to Nginx and FastCGI. Prior to the
conversion the server was using almost 1.2GB of RAM. After the
conversion he was using 200MB. Is it probable that Apache was
misconfigured? Hard to know unless you have spent years tuning Apache,
but I'll concede it's possible. Is Nginx misconfigured? Well, frankly
it doesn't seem to matter ;-)

> None are youtube.com - and I agree that
> this is an important point.
>
> Comparing my Apache deployments with deployments of other servers,
> year after year, Apache won hands down:
>
> 1) Users of other HTTP servers are always fiddling with them,
> restarting after crashes. This may be due to misuse, non-optimal
> config - I'm not sure. But I've never had stability issues like this
> with Apache.

I had many issues with Lighttpd, but I've had none with Nginx. I'd also
have to question your use of "always" in the above sentence. I strongly
suspect aren't speaking from experience here, rather just hearsay.

> 2) Apache is well-understood by many more folks. There's an army of
> support reps downstairs that are compentent, if not experts, at
> maintaining and troubleshooting it. The other servers come across as
> mysteries (despite often being highly trivial), and end up escalated
> instead of fixed.

And it's poorly understood by just as many, if not more. I first
switched from Apache not due to scalability concerns (like you, I've not
encountered them), but because I find Apache's configuration to be
overwhelming and convoluted. When I first started using Nginx, all the
documentation was in Russian and yet I managed to convert an entire
shared hosting box from a mix of Apache, Pound, and Lighttpd to Nginx in
two days by simply reading the examples. I'd challenge any newcomer to
Apache to do the same.
The fact that you need an army of support reps isn't really advancing
your argument ;-)

> 3) Documentation for Apache is through, searchable, and
> understandable. It's full of examples, is available for multiple
> versions of httpd. I have seen the Apache documentation turn
> motivated people from competent levels to expert just by googling for
> it. I'm not saying that other servers don't have decent docs - but
> Apache's are amongst the best docs for any software I have ever read,
> and I have seen them function in production for years.

I'd never argue it doesn't. In fact, Apache's documentation is clearly
far more extensive than Nginx's. I'd expect no less after being the
workhorse of LAMP for the last decade. I'll happily admit (well, not
too happily) that Apache's documentation is far better than Nginx's.

Unfortunately superior documentation doesn't make for superior software.
Documentation is fixable, but Apache's process model isn't.

> > I'd qualify this paragraph as "some of Apache's strengths are", rather
> > than a blanket "it's better". For some people, in some settings, it is
> > better. For others it isn't. If you need high scalability, it isn't
> > the best. If you need a small memory footprint it's not the best. If
> > you prefer a sane configuration syntax it isn't the best. If you need
> > all three then it's arguably amongst the worst.
>
> Yea, you're right - I'm tacitly assuming that we're talking about the
> average cases. Other http servers definitely excel at things, especially
> for workloads they have been specially designed. But for every youtube
> there's tens of thousands of websites with more average traffic and control
> needs.

Well, here's where I find that most Apache proponent's arguments fall
apart: either they claim that Nginx is best for small-scale websites or
they claim that it's only needed for large-scale websites. They are
both wrong. It's best for both. Nginx scales both up *and* down. It
can run youtube.com or you can embed it in a cellphone. The challenges
of running Apache in a 96MB VPS have been documented on this very list.
The challenges of getting Apache to deal with C10K aren't often
discussed because it isn't possible without getting into very high-end
hardware.

This makes Apache best for... medium-sized sites that don't care about
resource utilization? This is a ridiculous claim, so I'll assert
instead that Apache is best if you need a *specialized* service, such as
mod_svn or mod_jakarta. Apache proponents will point out the wealth of
modules as evidence that Apache is the best for general purpose web
serving. But being best at fronting *particular* applications doesn't
make it best *in general*. So it's not Nginx that's specialized for a
particular workload, it's Apache that's specialized.

Nginx is like a finely-balanced chef's knife: suitable for a variety of
tasks, large and small, as long as they all involve slicing. Apache, on
the other hand, is the swiss-army knife of webservers: bulky, full of
odd specialty tools, and on occasion, marginally useful as a knife.

In either case, apparently they both make for a funny lump in some
people's pockets ;-)

Anyway, I think we've gone way OT for long enough. We can continue
offlist if you like.

Regards,
Cliff

Peter Hansen

unread,
May 21, 2008, 5:39:26 PM5/21/08
to pylons-...@googlegroups.com
Cliff Wells wrote:
> Anyway, I think we've gone way OT for long enough. We can continue
> offlist if you like.

While it may be off-topic** I want to say I've found the background
discussion from everyone involved (who are all more experienced and
knowledgeable in this area than I am) to be very interesting and
educational and valuable.

I'm running Pylons "direct" for a beta site, and behind lighttpd for
some others, but this is one of the few threads in which I've read all
the content lately, and I hope it isn't really considered so off-topic
that you feel you have to stop. And if you do, well, thanks for all the
fish while it lasted, anyway.

-Peter

Mike Orr

unread,
May 21, 2008, 6:06:30 PM5/21/08
to pylons-...@googlegroups.com
On Wed, May 21, 2008 at 2:39 PM, Peter Hansen <pe...@engcorp.com> wrote:
>
> Cliff Wells wrote:
>> Anyway, I think we've gone way OT for long enough. We can continue
>> offlist if you like.
>
> While it may be off-topic** I want to say I've found the background
> discussion from everyone involved (who are all more experienced and
> knowledgeable in this area than I am) to be very interesting and
> educational and valuable.

Yes, this is a good thread. The discussion about what kind of
deployment is best for what kind of site, and how much it matters,
affects all of us. Plus the points about database replication and
multi-server authorization, which I thank Jonathan for.

The only noise was about whether Apache is too bloated for its own
good, and whether Nginx is better than everything else. But knowing
what other Pylons sysadmins think of the merits of each is still
worthwhile, even though we don't want *too* many messages on the
topic. All this info would be good to put into one of the deployment
articles in the Pylons Cookbook, if anybody feels inclined. (I would
but I've got my hands full with Pylons tasks right now.)

--
Mike Orr <slugg...@gmail.com>

Jonathan Vanasco

unread,
May 21, 2008, 6:31:31 PM5/21/08
to pylons-discuss
On May 21, 6:06 pm, "Mike Orr" <sluggos...@gmail.com> wrote:
> The only noise was about whether Apache is too bloated for its own
> good, and whether Nginx is better than everything else.  But knowing
> what other Pylons sysadmins think of the merits of each is still
> worthwhile, even though we don't want *too* many messages on the
> topic.  All this info would be good to put into one of the deployment
> articles in the Pylons Cookbook, if anybody feels inclined.  (I would
> but I've got my hands full with Pylons tasks right now.)

I've always believed in the 'right tool for the right job' -- which
has had me in the past offloading mod_perl apps via fancy URL
dispatching into PHP and Python 'microsites' (with a whole lot of
session mangling in between!) to get optimal server performance.

nginx and lighttpd are stripped down, streamlined werbservers.
they're not full featured and can be a PITA to get going. they also
have varying levels of support and different strengths/weaknesses.
that being said, they are designed to be speedy little demons that are
lightweight and robust. you don't get nearly the amount of features
you do with Apache -- but if you're doing some sort of webapp that is
going to be handled by fcgi or a daemon -- you can drop a lot of dead
weight.

we could go into hook functions, process models, the request phase and
all of that... but we can also just look at the source:
apache: 5.9 mb
lighttpd 0.796 mb
nginx: 0.509 mb

They're all different projects, all great servers, and every web
engineer should be familiar with them all -- they each exist as
amazing tools in your kit.

In terms of databases-

I'm hugely biased against mysql. i had some major issues with
integrity in the 4.x branch. I know the 5.x branches have addressed
this - but truth be told, I lost my trust in them -- and i don't think
they'll ever be able to gain it back. this is because it wasn't an
issue with 'bugs' but with the project's culture and prioritization of
goals.

For postgres stuff,
http://slony.info - replication system
https://developer.skype.com/SkypeGarage/DbProjects/ - Skype released
their connection pooler, partitioning system, extended PL/Python and
other neat stuff. they're very python friendly ;)

Graham Dumpleton

unread,
May 21, 2008, 8:53:07 PM5/21/08
to pylons-discuss
On May 22, 5:20 am, Cliff Wells <cl...@develix.com> wrote:
> > Sites that are amongst the largest on the internet fall into a corner
> > case in my mind.  As Mike pointed out, sites have an unrealistic
> > expectation of traffic.  I've been involved in the average cases.
>
> As have I.  But I'm going to disassemble this argument below.
>
> > My claims come from years in the service provider industry,
> > watching various deployments.  I've been an Apache fan for a long
> > time, and have seen and deployed hundreds of servers, serving
> > thousands of sites on Apache.
>
> I think this is true for all of us.  The difference is that the world
> has changed in the last couple of years and now there's more options to
> choose from.  And by "options" I don't mean "a smaller, less capable
> Apache clone", I mean a paradigm shift in how to handle high loads.
> It's well known that threaded/process based servers cannot scale beyond
> a reasonable point.  Nginx and Lighttpd are async and are specifically
> written to address the C10K problem.

There are two approaches one can use for addressing scalability, they
are vertical scaling and horizontal scaling.

In vertical scaling one just upgrades your existing single machine
with a bigger more capable machine. For this path then yes, nginx and
lighttpd may give your more head room than Apache. The problem with
vertical scaling is cost, plus that you will hit the limit of what the
hardware can achieve much sooner than with horizontal scaling.

With horizontal scaling you keep your existing machine and just add
more machines. For horizontal scaling, the limit is going to be how
easy it is to accommodate your application across a growing number of
machines. The scalability of Apache here isn't generally going to be
an issue as you would have sufficient machines to spread the load so
as to not unduly overload a single machine.

Although one is buying more hardware with horizontal scaling, the cost/
performance curve would generally increases at a lessor rate than with
vertical scaling. This however is tempered by increasing maintenance
costs from having to support multiple machines. If machines are all
identical though, and treated as an appliance which are either rebuilt
or replaced when a failure occurs, even that isn't really a problem.

Of course, there is still a whole lot more to it than that as you need
to consider power costs, networking costs for hardware/software load
balancing, failover and possible need for multiple data centres
distributed over different geographic locations.

So, what limitations exist and what other issues come into
consideration really depend on how you scale up your system.

One thing that keeps troubling me about some of the discussions here
as to what solution may be better than another is that it appear to
focus on what solution may be best for static file sharing or proxying
etc. One has to keep in mind that Python web applications have
different requirements than these use cases. Python web applications
also have different needs to PHP applications.

As I originally pointed out, for Python web applications, in general
any solution will do as it isn't the network or the web server
arrangement that will be the bottleneck. What does it matter if one
solution is twice as fast than another for a simple hello world
program, when the actual request time saved by that solution when
applied to a real world application is far less than 1% of overall
request time.

For non database Python web applications issues such as the GIL, and
how multithreading and/or multiple processes are used is going to be a
bigger concern and have more impact on performance. This is in as much
as running a single multithreaded process isn't going to cut it when
scaling. Thus ease of configuring use of multiple processes is more
important as is the ability to recycle processes to avoid issues with
increasing memory usage. There is a also the balance between having
fixed numbers of processes as is necessary when using fastcgi like
approaches, or the ability in something like Apache to dynamically
adjust the number of processes to handle requests. Add databases into
the mix and you get into a whole new bunch of issues, which others are
already discussing.

Memory usage in all of this is a big issue and granted that for static
file serving nginx and httpd will consume less memory. The difference
though for a dynamic Python web application isn't going to be that
marked. If you are running a 80MB Python web application process, it
is still going to be about that size whatever hosting solution you
use. This is because the memory usage is from the Python web
application, not the underlying web server. The problem is more to do
with how you manage the multiple instances of that 80MB process.

There have been discussions over on the Python WEB-SIG about making
WSGI better support asynchronous web servers. Part of their rational
was that it gave better scalability because it could handle more
concurrent requests and wouldn't be restricted by number of threads
being used. The problem that was pointed out to them which they then
didn't address is that where one is handling more concurrent requests,
the transient memory requirements of your process then theoretically
can be more. At least where you have a set number of threads you can
get a handle on what maximum memory usage may be by looking at the
maximum transient requirements of your worst request handler. With an
asynchronous model where theoretically an unbounded number of
concurrent requests could be handled at the same time, you could
really blow out your memory requirements if they all hit the same
memory hungry request handler at the same time. Thus a more
traditional synchronous model can thus give you more predictability,
which for large systems in itself can be an important consideration.

Anyway, this is getting a fair bit off topic and since others are
seeing my rambles as such, I'll try and refrain in future. :-)

Graham

Cliff Wells

unread,
May 22, 2008, 3:43:01 AM5/22/08
to pylons-...@googlegroups.com

On Wed, 2008-05-21 at 17:53 -0700, Graham Dumpleton wrote:
> On May 22, 5:20 am, Cliff Wells <cl...@develix.com> wrote:
> > I think this is true for all of us. The difference is that the world
> > has changed in the last couple of years and now there's more options to
> > choose from. And by "options" I don't mean "a smaller, less capable
> > Apache clone", I mean a paradigm shift in how to handle high loads.
> > It's well known that threaded/process based servers cannot scale beyond
> > a reasonable point. Nginx and Lighttpd are async and are specifically
> > written to address the C10K problem.
>
> There are two approaches one can use for addressing scalability, they
> are vertical scaling and horizontal scaling.
>
> In vertical scaling one just upgrades your existing single machine
> with a bigger more capable machine. For this path then yes, nginx and
> lighttpd may give your more head room than Apache. The problem with
> vertical scaling is cost, plus that you will hit the limit of what the
> hardware can achieve much sooner than with horizontal scaling.

Except that vertical scaling doesn't preclude horizontal scaling, it
merely postpones the necessity for implementing it (if not the planning)
and helps limit the scope of it. If Nginx provides superior vertical
scaling, then it will also provide superior horizontal scaling since
vertically scaled systems are the building blocks of a horizontally
scaled system.

> With horizontal scaling you keep your existing machine and just add
> more machines. For horizontal scaling, the limit is going to be how
> easy it is to accommodate your application across a growing number of
> machines. The scalability of Apache here isn't generally going to be
> an issue as you would have sufficient machines to spread the load so
> as to not unduly overload a single machine.
> Although one is buying more hardware with horizontal scaling, the cost/
> performance curve would generally increases at a lessor rate than with
> vertical scaling.

Again, I think this contrast is artificial. You are setting up vertical
scaling and horizontal scaling as mutually exclusive when they are
anything but, and unless you have endlessly deep pockets, you should
prefer to control the growth of your horizontal scaling.

> Of course, there is still a whole lot more to it than that as you need
> to consider power costs, networking costs for hardware/software load
> balancing, failover and possible need for multiple data centres
> distributed over different geographic locations.

Absolutely. And while hardware costs are dropping, hosting and power
costs are going up. My colocation fees have increased an average of 10%
per year, and power fees have quadrupled since I started. I don't
expect this trend to change any time soon.

> One thing that keeps troubling me about some of the discussions here
> as to what solution may be better than another is that it appear to
> focus on what solution may be best for static file sharing or proxying
> etc. One has to keep in mind that Python web applications have
> different requirements than these use cases. Python web applications
> also have different needs to PHP applications.

Given that an average web page is probably 70% or more static or cached
content, I think this is a critical aspect.

> As I originally pointed out, for Python web applications, in general
> any solution will do as it isn't the network or the web server
> arrangement that will be the bottleneck. What does it matter if one
> solution is twice as fast than another for a simple hello world
> program, when the actual request time saved by that solution when
> applied to a real world application is far less than 1% of overall
> request time.

If you try to scale a dynamic application and are going to pass part of
the request off to Python on every request you are going to either fail
spectacularly or spend an awful lot of money scaling horizontally.
There's a reason people have successfully deployed huge Rails apps and
it's not often by having 300 servers. They manage it by making sure
that Rails is only called when absolutely necessary and letting a fast
webserver handle most of the load.

In any case, the same techniques are going to be applied regardless of
which web server you choose. The question is more "how much of my
limited and expensive resources is this single part of my stack going to
consume and what benefit will I be getting for it?" Unless you require
a specific module, Nginx and Apache are more-or-less functionally
equivalent, except that one uses a fraction of the resources of the
other.

> For non database Python web applications issues such as the GIL, and
> how multithreading and/or multiple processes are used is going to be a
> bigger concern and have more impact on performance. This is in as much
> as running a single multithreaded process isn't going to cut it when
> scaling. Thus ease of configuring use of multiple processes is more
> important as is the ability to recycle processes to avoid issues with
> increasing memory usage.

I'd consider "increasing memory usage" to be a bug in the application
and outside the scope of discussion. As far as ease of configuring
multiple processes, I use Nginx's built-in load balancing and a 4 line
shell script to start my application. Don't get me wrong, I think
Apache's process management is quite nice and I'd like to see something
similar added to Nginx, but it's hardly a show-stopper.

> There is a also the balance between having
> fixed numbers of processes as is necessary when using fastcgi like
> approaches, or the ability in something like Apache to dynamically
> adjust the number of processes to handle requests.

Remember you said this (see below*).

> Add databases into
> the mix and you get into a whole new bunch of issues, which others are
> already discussing.
> Memory usage in all of this is a big issue and granted that for static
> file serving nginx and httpd will consume less memory. The difference
> though for a dynamic Python web application isn't going to be that
> marked.

I disagree. As I mentioned earlier, someone I know recently took an
Apache/mod_php application consuming 1.2GB of RAM down to 200MB using
Nginx/FastCGI with no loss in performance or functionality. It's not
clear to me why a Python application would be much different.

> If you are running a 80MB Python web application process, it
> is still going to be about that size whatever hosting solution you
> use. This is because the memory usage is from the Python web
> application, not the underlying web server. The problem is more to do
> with how you manage the multiple instances of that 80MB process.

Sort of. However consider this: if I am running Nginx I can reasonably
*fill* a single server with Python processes and not worry too much
about how much memory Nginx consumes. The resources are available for
running the *application* rather than the webserver. Because the Python
application will undoubtedly be one of the first bottlenecks (database
next), the ability to horizontally scale the application (by running
multiple instances) is critical. By using up system resources, Apache
limits the number of instances of the application that can be run on a
single machine, and by extension across multiple machines.

> There have been discussions over on the Python WEB-SIG about making
> WSGI better support asynchronous web servers. Part of their rational
> was that it gave better scalability because it could handle more
> concurrent requests and wouldn't be restricted by number of threads
> being used. The problem that was pointed out to them which they then
> didn't address is that where one is handling more concurrent requests,
> the transient memory requirements of your process then theoretically
> can be more.

> At least where you have a set number of threads you can
> get a handle on what maximum memory usage may be by looking at the
> maximum transient requirements of your worst request handler.

Then you agree that dynamically adjusting the process pool size is bad
since it would have the same net effect? This appears (to me) to
contradict what you claimed as a feature earlier [*].

> With an
> asynchronous model where theoretically an unbounded number of
> concurrent requests could be handled at the same time, you could
> really blow out your memory requirements if they all hit the same
> memory hungry request handler at the same time. Thus a more
> traditional synchronous model can thus give you more predictability,
> which for large systems in itself can be an important consideration.

Of course, this is where your earlier suggestion of using a hardware
load-balancer would be a good idea. I think a much better use of
resources (read "money") would be spending some of it on a dedicated
load-balancing solution which can control how requests are distributed
rather than repurposing inefficiency into a feature.

At any rate, I don't actually think the above has much to do with Nginx
vs Apache as Pylons deployment options. Because Pylons tends to be run
as a threaded app (is anyone doing otherwise?), we still have the same
predictability. In fact our predictability is easier since we don't
need to calculate the cost of the web server's memory explosion in
addition to our application's needs.

In all of the above, I haven't seen any explanation from you as to why
Apache would be superior to Nginx as a deployment option, only that it
wouldn't be the worst bottleneck in your application stack. Not
terribly convincing. If we were discussing a closed-source solution
versus an open source solution, this might be sufficient ("good
enough"), but that's not the case here.

I'll give you a quick list of actual benefits I see from using Nginx:

1) low CPU overhead
2) small memory footprint
3) consistent latency for responses
4) scalable in all directions
5) simple and and syntactically consistent configuration

Benefits I see for Apache:

1) excellent documentation
2) wide array of modules, especially esoteric ones
3) mod_wsgi provides a slightly more efficient communication gateway to
Python backends
4) automatic process management (restarting backends)

Of Apache's benefits I see

1) as mostly moot due to Nginx's simplicity
2) completely moot since I don't use them
3) not enough to overcome the efficiency lost elsewhere
4) as mostly moot because it's simple to solve in other ways

This probably doesn't exactly match other people's requirements and
certainly there are other considerations that might tip the scales one
way or the other.

> Anyway, this is getting a fair bit off topic and since others are
> seeing my rambles as such, I'll try and refrain in future. :-)

Please don't. You happen to be one of the few feather-heads I don't
mind hearing from, even if I find your arguments kind of slippery ;-)
And incidentally, congrats on your baby =)

For people who care more about numbers than theoretical discussions (aka
"obstinate") please refer to the following which provides a fairly
decent overview of resource utilization between the two servers:

http://www.joeandmotorboat.com/2008/02/28/apache-vs-nginx-web-server-performance-deathmatch/


Regards,
Cliff

Cliff Wells

unread,
May 22, 2008, 4:16:57 AM5/22/08
to pylons-...@googlegroups.com

On Thu, 2008-05-22 at 00:43 -0700, Cliff Wells wrote:

> If you try to scale a dynamic application and are going to pass part of
> the request off to Python on every request you are going to either fail
> spectacularly or spend an awful lot of money scaling horizontally.
> There's a reason people have successfully deployed huge Rails apps and
> it's not often by having 300 servers. They manage it by making sure
> that Rails is only called when absolutely necessary and letting a fast
> webserver handle most of the load.

Since I think it's of specific interest, here's an interesting approach
that could probably be made to work with Apache as well:

http://blog.kovyrin.net/2007/08/05/using-nginx-ssi-and-memcache-to-make-your-web-applications-faster/

Regards,
Cliff

Alberto Valverde

unread,
May 22, 2008, 4:20:12 AM5/22/08
to pylons-...@googlegroups.com
Jose Galvez wrote:
> Anyone using mod_wsgi with Apache? how good is that for deployment,
> better/worse then mod_proxy with paster?

I'm using Apache2 + mod_wsgi 2.0 as a process controller and nginx to
serve static content and proxy dynamic requests to apache2.

Apache2 uses the worker (threaded) MPM and is configured to be pretty
lightweight [1] by only loading a minimal set of modules, turning
keep-alive off and having a limited amount of threads. This is possible
because it runs behind nginx which takes care of spoon-feeding the slow
clients over a keep-alive connection making dynamic requests to the
Apache backend quite fast since they're local. Since nginx does some
caching of the response, these requests are quite fast and don't tie up
a heavy python process for too long so a small pool of workers can
handle moderately high loads.

The reason I use it over paster+supervisord is because I find it *much*
easier to set up and maintain and more powerful (mod_wsgi can be
configured to spawn wsgi applications into separate processes under
their own user/group, restart them if they crash, kill them if they
deadlock, isolate them in their own virtualenv, etc...).

As you see, It's more or less the typical supervisord + paster + nginx
setup but changing paster and supervisord with apache+mod_wsgi because I
find the later much easier to configure and maintain.

Alberto

[1] The master process only eats around 4M resident size on my machine.
The slave processes which host the python app take up much more but I
guess it's roughly the same they would take using paster or any other
python webserver since it is the actual python app.

Jonathan Vanasco

unread,
May 22, 2008, 9:02:57 AM5/22/08
to pylons-discuss
On May 22, 4:20 am, Alberto Valverde <albe...@toscat.net> wrote:
> The reason I use it over paster+supervisord is because I find it *much*
> easier to set up and maintain and more powerful (mod_wsgi can be
> configured to spawn wsgi applications into separate processes under
> their own user/group, restart them if they crash, kill them if they
> deadlock, isolate them in their own virtualenv, etc...).

that's one of the best rationale's people have mentioned here ;)

Jonathan Vanasco

unread,
May 22, 2008, 9:23:41 AM5/22/08
to pylons-discuss

On May 22, 3:43 am, Cliff Wells <cl...@develix.com> wrote:

> Again, I think this contrast is artificial.  You are setting up vertical
> scaling and horizontal scaling as mutually exclusive when they are
> anything but, and unless you have endlessly deep pockets, you should
> prefer to control the growth of your horizontal scaling.

horizontal scaling is often better, but much more expensive. you need
2x the hardware ( 1 for real, one for redundancy ) and 2x the dev
hours.

it's also moot until you 'need' it - and you pretty much won't need it
until you can afford it.

> > As I originally pointed out, for Python web applications, in general
> > any solution will do as it isn't the network or the web server
> > arrangement that will be the bottleneck.
> If you try to scale a dynamic application and are going to pass part of
> the request off to Python on every request you are going to either fail
> spectacularly or spend an awful lot of money scaling horizontally.

The WebServer is often the bottleneck. The only real bottleneck in an
app should be the DB blocking and wait times. When you have bloated
frontends , or a small pool of workers, the server can be bottleneck.
Tools like nginx help, because they can stave off the slow clients,
and just have fastcgi/apache work to handle the dynamic request at-
once -- making them more efficient. they also give you more effective
workers, because the have less bloat.

two things to note:
1- there is a comparison from a few years ago that shows apache +
lighty + nginx + thttpd + litespeed performance for every 100/r/s on
static content. you got to see where their strengths were.

2- there is a law of diminishing marginal utility with workers. on
my mod_perl setup, every worker i add after 1 gets me 80 more r/s; the
7 & 8 workers get me 20r/s. a 9th will get me 0. anything more will
degrade performance.


> I'd consider "increasing memory usage" to be a bug in the application
> and outside the scope of discussion.  

perhaps not. in many apache versions, memory is allocated to the
workers as a speed boost. like in mod_perl: each child will retain/
reserve memmory for each called function/variable so as not to
increase speed on future requests. it's a tradeoff on speed vs
memory. sometimes the speed isn't as necessary... and you'd rather
have the mem. but you can't turn that off.

> I disagree.  As I mentioned earlier, someone I know recently took an
> Apache/mod_php application consuming 1.2GB of RAM down to 200MB using
> Nginx/FastCGI with no loss in performance or functionality.  It's not
> clear to me why a Python application would be much different.

most likely that happened because of the phenomena above. you had
each apache mod_php process bloating on ram. running apache's php via
fastcgi can improve that, as you get better control of the memory
allocation and use... but its usually not as dramatic as going
straight to nginx.

>  By using up system resources, Apache
> limits the number of instances of the application that can be run on a
> single machine, and by extension across multiple machines.

very well articulated!

Cliff Wells

unread,
May 22, 2008, 2:59:00 PM5/22/08
to pylons-...@googlegroups.com
One more option I've not seen mentioned is Cherokee:

http://www.cherokee-project.com/

I've never used it in production (last time I experimented with it was a
couple years ago and it wasn't mature enough), but it's reported to be
quite fast, even edging out Nginx in several benchmarks.

http://www.alobbs.com/news/104

It also has native SCGI support and a management interface written in
Python.

The documentation isn't what it could be, but I expect the admin
interface helps out quite a bit on that count.

Regards,
Cliff

On Fri, 2008-05-16 at 13:38 -0700, Jonathan Vanasco wrote:
> I'm a little unclear on the better ways to deploy a Pylons app.
>
> My production servers run nginx -- is it better to use some fastcgi
> support (if so, how?) or just do a "paster serve" and proxy to that
> port?
>
> I've read a handful of ways on how-to-deploy apps, and all seem
> different. I've yet to see a comparison or "this is THE way to do it"
> document.
> >

Shannon -jj Behrens

unread,
May 22, 2008, 5:12:00 PM5/22/08
to pylons-...@googlegroups.com
Here's my two cents:

Has anyone tried out the mod_wsgi module for *Nginx*? Yeah, I know,
weird: http://wiki.codemongers.com/NginxNgxWSGIModule

Being asynchronous rules! That's why Erlang, Squid, IronPort servers,
Nginx, etc. are able to handle so many concurrent requests so easily.
Here's the link to the C10K paper referenced earlier:
http://www.kegel.com/c10k.html. It explains why a thread or process
model doesn't cut it if you want to handle 10K simultaneous requests.

If you're interested in doing asynchronous programming in Python but
without the painful callback style approach used by Twisted, check out
http://wiki.secondlife.com/wiki/Eventlet. It's based on the same
tricks used by Yahoo Groups, IronPort, and Slide.

As usual, I recommend that anyone who wants to talk about scalability
read "Scalable Internet Architectures". Gees, I probably sound like a
broken record concerning that book ;)

Finally, a plug for my article (if you don't mind). If you want to
learn more about concurrency approaches in Python, checkout my
article: http://www.ddj.com/linux-open-source/206103078

Thanks for your patience ;)
-jj

--
I, for one, welcome our new Facebook overlords!
http://jjinux.blogspot.com/

Jonathan Vanasco

unread,
May 22, 2008, 5:45:19 PM5/22/08
to pylons-discuss
On May 22, 2:59 pm, Cliff Wells <cl...@develix.com> wrote:
> One more option I've not seen mentioned is Cherokee:
>
> http://www.cherokee-project.com/
>
> I've never used it in production (last time I experimented with it was a
> couple years ago and it wasn't mature enough),

I remember that... and having the same experience.

when i first touched it, it had just been announced and was more like
hobbyware.

some other high-performance webservers have been

boa - http://www.boa.org/
okws - http://www.okws.org/doku.php
aolserver/naviserver

and a few more i can't recall. some of them are a bit 'tailored' for
very specific purposes... like osws is for building C-apps



Lawrence Oluyede

unread,
May 23, 2008, 3:32:40 AM5/23/08
to pylons-...@googlegroups.com
On Thu, May 22, 2008 at 11:12 PM, Shannon -jj Behrens <jji...@gmail.com> wrote:
> Has anyone tried out the mod_wsgi module for *Nginx*? Yeah, I know,
> weird: http://wiki.codemongers.com/NginxNgxWSGIModule

I personally know the author and I definitely recommend it. He's
focused and competent.

> Being asynchronous rules! That's why Erlang, Squid, IronPort servers,
> Nginx, etc. are able to handle so many concurrent requests so easily.
> Here's the link to the C10K paper referenced earlier:
> http://www.kegel.com/c10k.html. It explains why a thread or process
> model doesn't cut it if you want to handle 10K simultaneous requests.

There's just only a little problem with async: it does not scale on
multicore architectures or multiple nodes.
At least not by itself. You have to mix it with other kinds of
concurrency approaches to gain advantage of that.
Erlang is async but it does scale everywhere by the way it was built.

> If you're interested in doing asynchronous programming in Python but
> without the painful callback style approach used by Twisted, check out
> http://wiki.secondlife.com/wiki/Eventlet. It's based on the same
> tricks used by Yahoo Groups, IronPort, and Slide.

I will definitely look into it! Thanks. Have you tried it for
something "real-worldish" or just examples?

> As usual, I recommend that anyone who wants to talk about scalability
> read "Scalable Internet Architectures". Gees, I probably sound like a
> broken record concerning that book ;)

Noted. I am reading Building Scalable Websites at the moment but I
will buy it afterwards


--
Lawrence, stacktrace.it - oluyede.org - neropercaso.it
"It is difficult to get a man to understand
something when his salary depends on not
understanding it" - Upton Sinclair

Shannon -jj Behrens

unread,
May 23, 2008, 5:05:24 AM5/23/08
to pylons-...@googlegroups.com
On Fri, May 23, 2008 at 12:32 AM, Lawrence Oluyede <l.ol...@gmail.com> wrote:
>
> On Thu, May 22, 2008 at 11:12 PM, Shannon -jj Behrens <jji...@gmail.com> wrote:
>> Has anyone tried out the mod_wsgi module for *Nginx*? Yeah, I know,
>> weird: http://wiki.codemongers.com/NginxNgxWSGIModule
>
> I personally know the author and I definitely recommend it. He's
> focused and competent.

Oh, cool!

>> Being asynchronous rules! That's why Erlang, Squid, IronPort servers,
>> Nginx, etc. are able to handle so many concurrent requests so easily.
>> Here's the link to the C10K paper referenced earlier:
>> http://www.kegel.com/c10k.html. It explains why a thread or process
>> model doesn't cut it if you want to handle 10K simultaneous requests.
>
> There's just only a little problem with async: it does not scale on
> multicore architectures or multiple nodes.
> At least not by itself. You have to mix it with other kinds of
> concurrency approaches to gain advantage of that.
> Erlang is async but it does scale everywhere by the way it was built.

Haha, when you said it didn't scale for multiple cores or multiple
nodes, I was going to knee jerk and say, "What about Erlang!" You
beat me to the punch ;) Yes, async is a technique. Just like event
based programming in general is a technique. You still have to use
that technique in smart ways to build big systems.

>> If you're interested in doing asynchronous programming in Python but
>> without the painful callback style approach used by Twisted, check out
>> http://wiki.secondlife.com/wiki/Eventlet. It's based on the same
>> tricks used by Yahoo Groups, IronPort, and Slide.
>
> I will definitely look into it! Thanks. Have you tried it for
> something "real-worldish" or just examples?

I haven't used Eventlet yet. It's based on the same ideas that
IronPort uses, and clearly, IronPort servers are all over the world.
It's nice because it "feels" like threads, but it "acts" like async.

>> As usual, I recommend that anyone who wants to talk about scalability
>> read "Scalable Internet Architectures". Gees, I probably sound like a
>> broken record concerning that book ;)
>
> Noted. I am reading Building Scalable Websites at the moment but I
> will buy it afterwards

I read that one too. It's a big long and boring, eh? "Scalable
Internet Architectures" was a bit more to the point, and it includes a
lot more stuff about scalability per se.

I blogged about both of them:
http://jjinux.blogspot.com/search?q=scalable+internet+architectures
http://jjinux.blogspot.com/2006/11/book-review-building-scalable-web.html

Happy Hacking!

Lawrence Oluyede

unread,
May 23, 2008, 5:19:28 AM5/23/08
to pylons-...@googlegroups.com
On Fri, May 23, 2008 at 11:05 AM, Shannon -jj Behrens <jji...@gmail.com> wrote:
>> I personally know the author and I definitely recommend it. He's
>> focused and competent.
>
> Oh, cool!

He also gave a talk about nginx's mod_wsgi at the PyCon Italy
<http://www.pycon.it/pycon2/schedule/talk/una-implementazione-di-wsgi-per-nginx/>
(webpage in italian)

> Haha, when you said it didn't scale for multiple cores or multiple
> nodes, I was going to knee jerk and say, "What about Erlang!" You
> beat me to the punch ;) Yes, async is a technique. Just like event
> based programming in general is a technique. You still have to use
> that technique in smart ways to build big systems.

Right.

> I haven't used Eventlet yet. It's based on the same ideas that
> IronPort uses, and clearly, IronPort servers are all over the world.
> It's nice because it "feels" like threads, but it "acts" like async.

Like pyprocessing feels like threads but acts like processes ;)
I am fond of libraries/frameworks like that lowering the bar of
scalability adoption.
I think it is something we should not ignore anymore as developers, of any kind

>> Noted. I am reading Building Scalable Websites at the moment but I
>> will buy it afterwards
>
> I read that one too. It's a big long and boring, eh?

Yup. There's some interesting stuff but it's a little bit too focused
on PHP+MySQL (after all is based on the Flickr experience).
Sometimes it feels like "Python do not have this kind of problem,
let's move along"

> "Scalable
> Internet Architectures" was a bit more to the point, and it includes a
> lot more stuff about scalability per se.

Wow. What about High Performance Websites by Souders? I bought it at
the O'Reilly booth 2 weeks ago at PyCon Italy

Nice

> http://jjinux.blogspot.com/2006/11/book-review-building-scalable-web.html

"I was tearing my
hair out when Cal spent five pages explaining what source control is and
listing its basic features." hahahaha I thought the same thing.
I found nice the deep intro to encodings and utf, by the way

Bye

Ross Vandegrift

unread,
May 23, 2008, 10:05:28 AM5/23/08
to pylons-...@googlegroups.com
On Wed, May 21, 2008 at 12:20:18PM -0700, Cliff Wells wrote:
> On Wed, 2008-05-21 at 11:55 -0400, Ross Vandegrift wrote:
> > 1) Users of other HTTP servers are always fiddling with them,
> > restarting after crashes. This may be due to misuse, non-optimal
> > config - I'm not sure. But I've never had stability issues like this
> > with Apache.
>
> I had many issues with Lighttpd, but I've had none with Nginx. I'd also
> have to question your use of "always" in the above sentence. I strongly
> suspect aren't speaking from experience here, rather just hearsay.

Hmmm, now that you mention it, I think all of those deployments may
have been lighttpd. I had to hear a lot of the fallout - lighttpd
was being used to generate tokens on servers that would be used for
instantiating authentication credentials in a single sign-on server
for admins.

So there was much gnashing of teeth whenever this would crap out.
I was fortunate enough to not have this be my baby, and mostly didn't
have to deal with it.

But I probably shouldn't continue to take that experience as
indicitive of everything that's not Apache.

> And it's poorly understood by just as many, if not more. I first
> switched from Apache not due to scalability concerns (like you, I've not
> encountered them), but because I find Apache's configuration to be
> overwhelming and convoluted.

Really? I can see it being overwhelming, but it seems very
understandable to me. Paired with their documentation, I don't think
I've ever had a real problem getting Apache to do something I knew it
could.

Well, unless you count kinda crazy, obscure mod_rewrite stuff - but of
course that's a black art just because the rabbit hole goes as deep as
you care to follow :).

> The fact that you need an army of support reps isn't really advancing
> your argument ;-)

Heh, well, for every change needed to Apache, there's 1000 people that
need help configuring their POP3 client. Apache is hardly the reason
the reason there's an army :).

> This makes Apache best for... medium-sized sites that don't care about
> resource utilization? This is a ridiculous claim, so I'll assert
> instead that Apache is best if you need a *specialized* service, such as
> mod_svn or mod_jakarta.

I don't think that's such a ridiciulous claim! Consider the
application server that hosts the apps that I write for my company's
internal use. It hosts four or six Pylons applications and one Rails
app. One of these apps handles around 1000 uses a day, one around
100, one around 10. The Rails app is an AJAX form that just pushes
collected data to the browser, so is usually busy despite only having
an average of 1 user a day.

The server these apps are housed on is gratuitously overpowered.
Apache's flexability makes this use-case trivial.

Maybe this deployment pattern is uncommon?

> Apache proponents will point out the wealth of
> modules as evidence that Apache is the best for general purpose web
> serving. But being best at fronting *particular* applications doesn't
> make it best *in general*. So it's not Nginx that's specialized for a
> particular workload, it's Apache that's specialized.

Eh, I wouldn't make that claim about Apache modules. Many of them are
irrelevant to me, some seem downright pointless.

> Nginx is like a finely-balanced chef's knife: suitable for a variety of
> tasks, large and small, as long as they all involve slicing. Apache, on
> the other hand, is the swiss-army knife of webservers: bulky, full of
> odd specialty tools, and on occasion, marginally useful as a knife.
>
> In either case, apparently they both make for a funny lump in some
> people's pockets ;-)

I wouldn't want the lump of a chef's knife anywhere near my pocket,
lest I be bleeding out all over the floor!

> Anyway, I think we've gone way OT for long enough. We can continue
> offlist if you like.

I'm more or less done - I think you've convinced me that Nginx is
probably worth another look at some point. After all, there's nothing
wrong with having another tool around to solve some problem, even if
Apache is where I'd go first.

Cliff Wells

unread,
May 23, 2008, 11:34:58 AM5/23/08
to pylons-...@googlegroups.com

On Fri, 2008-05-23 at 10:05 -0400, Ross Vandegrift wrote:
> I don't think that's such a ridiciulous claim! Consider the
> application server that hosts the apps that I write for my company's
> internal use. It hosts four or six Pylons applications and one Rails
> app. One of these apps handles around 1000 uses a day, one around
> 100, one around 10. The Rails app is an AJAX form that just pushes
> collected data to the browser, so is usually busy despite only having
> an average of 1 user a day.
>
> The server these apps are housed on is gratuitously overpowered.
> Apache's flexability makes this use-case trivial.
>
> Maybe this deployment pattern is uncommon?

No, I think it's quite common, however I'm one of those obstinate types
who refuses to equate popularity with correctness =)

> I'm more or less done - I think you've convinced me that Nginx is
> probably worth another look at some point. After all, there's nothing
> wrong with having another tool around to solve some problem, even if
> Apache is where I'd go first.

Well, I don't want to discount the value of experience. If you know
Apache well then that can be a perfectly valid reason to stick with it,
especially if you aren't hitting any limitations with it.

My main concern in this thread has been to dispell the idea that Nginx
is only appropriate in specialized deployments or the inverse that
Apache is the best general-purpose webserver. I believe neither to be
true, but that doesn't mean that I believe Apache is a *bad* choice,
only that unless you are already heavily invested in Apache (existing
deployments, trained staff, etc) then perhaps you should consider
alternatives.

Regards,
Cliff

lasizoillo

unread,
May 23, 2008, 11:39:32 AM5/23/08
to pylons-...@googlegroups.com
Hi

2008/5/22 Shannon -jj Behrens <jji...@gmail.com>:


>
> Here's my two cents:
>
> Has anyone tried out the mod_wsgi module for *Nginx*? Yeah, I know,
> weird: http://wiki.codemongers.com/NginxNgxWSGIModule

But you need run cooperative wsgi app :-(

Twisted's people handle this issue running the wsgi call in a thead pool.

>
> If you're interested in doing asynchronous programming in Python but
> without the painful callback style approach used by Twisted, check out
> http://wiki.secondlife.com/wiki/Eventlet. It's based on the same
> tricks used by Yahoo Groups, IronPort, and Slide.
>

Reading http://wiki.secondlife.com/wiki/Eventlet/Documentation is interesting

They comment the NginxNgxWSGIModule and his use. Its interesting too the
path to python sockets. And the problem with database C implemented sockets.
They solve the issue whith a threaded pool (like Twisted's people do).

They haven't a "eventlet's monkey patching" for file access (I think).
Dont use it
if you access files by NFS.

It's seem a good tip, but not an easy way to work :-(

>
> Thanks for your patience ;)

Thanks for you tip and excuse my poor english:

Javi

Mike Orr

unread,
May 23, 2008, 3:48:24 PM5/23/08
to pylons-...@googlegroups.com
On Fri, May 23, 2008 at 8:39 AM, lasizoillo <lasiz...@gmail.com> wrote:
>> If you're interested in doing asynchronous programming in Python but
>> without the painful callback style approach used by Twisted, check out
>> http://wiki.secondlife.com/wiki/Eventlet. It's based on the same
>> tricks used by Yahoo Groups, IronPort, and Slide.
>>
>
> Reading http://wiki.secondlife.com/wiki/Eventlet/Documentation is interesting

This is quite interesting. I've been looking for a way to build a
site scraper (something analogous to an aggregator but more
site-specific) that could eventually become asynchronous, and this
looks a lot easier than Twisted. It's like a cron scheduler for
interruptible functions, which fits my brain. The WSGI interface and
backdoor (interactive) interface could also serve as a UI for
accessing the data down the road, although I'm not sure that needs to
be in the same process, or whether to use a web or GUI interface or
both.

One thing that's unclear is what "executed within the main loop's
coroutine" means (call_after() function). That means it's inserting
Python statements into the other routine, or using its locals and
globals? The example shows a delayed timeout example, where the
exception is delivered to the current coroutine (the caller) rather
than to the main loop's coroutine. Or is everything "executed in the
main loop's coroutine"?

> They comment the NginxNgxWSGIModule and his use. Its interesting too the
> path to python sockets. And the problem with database C implemented sockets.
> They solve the issue whith a threaded pool (like Twisted's people do).

I wonder if SQLAlchemy can use it. There are engine args 'creator'
and 'pool' which provide some support for external connection
factories and customized pools.

--
Mike Orr <slugg...@gmail.com>

Jonathan Vanasco

unread,
May 23, 2008, 5:26:56 PM5/23/08
to pylons-discuss
On May 23, 3:48 pm, "Mike Orr" <sluggos...@gmail.com> wrote:
> This is quite interesting.  I've been looking for a way to build a
> site scraper (something analogous to an aggregator but more
> site-specific) that could eventually become asynchronous, and this
> looks a lot easier than Twisted.  

FindMeOn's spiders are Twisted; importing contacts/relationships/
profiles from 40+ social networks.

We're looking at redoing it in Erlang - Twisted was too slow.

I'd be happy to share some code with you privately if it'll help you
get your own project on track.

Shannon -jj Behrens

unread,
May 24, 2008, 5:10:12 AM5/24/08
to pylons-...@googlegroups.com

Bob Ippolito was telling me once that he took a server in Twisted and
rewrote it in stackless. He got some performance gains, but then he
rewrote it in Erlang. It dropped from 40% CPU utilization to almost
nothing, and it was a heck of a lot faster. In some situations,
Erlang is a really nice tool. Now, if only the syntax wasn't so
ridiculously ugly ;)

Jonathan Vanasco

unread,
May 24, 2008, 5:18:10 PM5/24/08
to pylons-discuss

On May 24, 5:10 am, "Shannon -jj Behrens" <jji...@gmail.com> wrote:
> Bob Ippolito was telling me once that he took a server in Twisted and
> rewrote it in stackless.  He got some performance gains, but then he
> rewrote it in Erlang.  It dropped from 40% CPU utilization to almost
> nothing, and it was a heck of a lot faster.  In some situations,
> Erlang is a really nice tool.  Now, if only the syntax wasn't so
> ridiculously ugly ;)

I think his performance gain was quite a bit more... Bob is a good old
friend (turned boss for a few years, turned good friend again), I was
privvy to daily updates on his transitions and performance gains - and
he completely sold me.

Twisted is really great ( though the syntax can be a bit of a
bitch )... we just outgrew it (it became the bottleneck, not our DB or
network architecture). I think it maxxed out at indexing 1MM online
identities a day ( which translated to about 15MM web queries ) (per
machine, of course). At one point we had 100MM profiles tracked, and
a backlog of 500MM to get to -- under twisted there was no way we
could start re-queing and updating our index as fast as we'd like.
The next level would be C.. or erlang.
Reply all
Reply to author
Forward
0 new messages