In my deployment Paster is serving directly to the world.
I'm not sure anyone has taken up a comparison in the ways you speak
of, at least I have not come across it. I'm sure it would be a
welcomed test.
Probably because "THE" way can never satisfy everyone =)
Personally I proxy Nginx to paster or CP's wsgiserver. I find it a bit
easier to debug than FastCGI.
Cliff
All of my apps are deployed in a FastCGI environment with Apache. Our
live application server has a custom install of python2.4 with
appropriate module versions that we can care for beside the ones that
RedHat wants.
This works really well. It seems a lot of people hate FCGI for
different reasons, but I have found it to be pretty awesome. Apps are
very stable, no complicated proxying, and it's almost as performant as
mod_python.
I have considered converting our deployments to mod_python, but only
recently acquired a practical staging environment to test things like
that.
--
Ross Vandegrift
ro...@kallisti.us
"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37
apache + mod_wsgi is other option, powerful and easy.
Greetings.
Apache 2.0.52 from RHEL4, Python 2.4, Flup 0.5, some version of fcgi
that appears to have been custom installed and I cannot figure out the
version for the life of me....
Ross
--
Ross Vandegrift
ro...@kallisti.us
on a production server, and it works well and fast. And easy to use.
Vasco
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iQEVAwUBSDHLkJTpjRZR9tbNAQIbFQf9G/MRtWZJPapI6cQngdh7Mdsc8+tVKMSE
4FpnQoJKmMEdC+ujBA/W2kdBdDsk6AAX2phqOF5g28/ml37uDyFsF2vEbZlbpWbI
hXaDkWTRsIgXVpVFuEVrgbG5Ou6ymqA88/o+w0rPQIukqN0U+vSqPq+JyOfOVcOU
LwI8Q8tSmsnJom/2Ps/z+bIclVxvWdoVnX+oLdTzyEbSbEB8OpeFD/23OGmGyDxq
J2Se29zqyutR6Lld3l5M49HvVv6ng8HnglS66tpAfBGm/HCSoCxeXZEv1e+ANyfq
HCxTViOs5tqJ/LfDRCxIcue6qQpXKXbhAuUVfRUFe8wruwBWVLSV5Q==
=p8It
-----END PGP SIGNATURE-----
There is no THE way to do it. There are several ways which perform
well, and some of them may even work on your platform. I prefer HTTP
proxying because it's the closest to native request handling.
> This works really well. It seems a lot of people hate FCGI for
> different reasons, but I have found it to be pretty awesome.
People hate FCGI because it was buggy and error-prone for years.
Maybe it has gotten better now.
> Apps are
> very stable, no complicated proxying, and it's almost as performant as
> mod_python.
As you see, "complicated" is in the eye of the beholder. :) I would
say proxying is less complicated than *CGI.
> I have considered converting our deployments to mod_python, but only
> recently acquired a practical staging environment to test things like
> that.
There was a point in using mod_python before mod_wsgi existed. Now
that mod_wsgi exists, is more directly related to the task, and has a
better history of being reliable, why not use it?
--
Mike Orr <slugg...@gmail.com>
It's stable, gives you HTTPS and rewrite and named virtual hosts, and
gives clients a warm fuzzy feeling that you're using something they've
heard of. People say it also has a better knowledge of the quirky
useragents out there and can correct misformed requests better than
just exposing PasteHTTPServer or CherryPy directly, though I don't
know how true it is.
There are a few newer servers now (nginx, lighthttpd, cherokee) that
claim to be smaller, more efficient, and better organized than Apache.
On my production server I've found Apache sufficient so I haven't
bothered with them. But I do have a virtual server for our local
Python group, with Apache running Mailman and MoinMoin, and Apache
regularly dies there with an Out of Memory error, or the kernel's
memory terminator kills it and then hangs. So I've been meaning to
try nginx there and see if it works better.
--
Mike Orr <slugg...@gmail.com>
Apache is a process/thread based server. Nginx, for example, is a
event-driven server. If you have worked with Twisted, you know that
even-driven code is harder to write. Nginx, lighthttpd, ... never have
the same number of modules than apache, because non-blocking code is
harder to write.
But, in front of pylons, you don't need a thousands of modules. Nginx
is very simple option with a fast static server, load balanced proxy
and fast-cgi suport (if you like use it). Another good functionality
of nginx: if you upload a file, it first write to disk and then
delegate to pylons. No more idle threads waiting for a slowly client
upload.
Another good option is Varnish[1]. A very fast reverse-proxy with an
expirable (purge) cache and (limited but interesting) ESI suport.
[1] http://varnish.projects.linpro.no/
Excuse my poor english:
Javi
Good question. I have done some rudimentary testing on mod_python,
but not with mod_wsgi. It does make a lot of sense to me - I should
definitely test that out before I make any decisions on production
deployment changes.
I don't mean to come off as sounding curt, but I've often heard that
from people who haven't really maintained the alternatives. Or that
have an application that gets deployed in a highly isolated environment,
where interoperability means nothing.
Frankly, the flexability of Apache is what makes it so much better
than the alternatives. It has solid support for just about anything
related to the web. It has excellent, complete documentation
surrounded by a large, knowledgeable user community.
I equate Apache with "There is no better general solution", even
though various half-assed alternatives have been cooked up.
If you are interested in my configuration there is no magic:
I am using apache's worker MPM (default on Debian) - prefork eats too
much memory. As well I have set ThreadStackSize to 500000. That's all!
Works just perfect.
Actually there was similar discussion on this group already. Follow what
Graham Dumpleton says and you will be in the right path.
Regards,
Dalius
http://blog.sandbox.lt
One thing to keep in mind is that new sites often have unrealistic
assumptions about their growth and the hardware required for it. A
video site has special requirements due to the huge files it handles,
but for ordinary sites with text and JPGs and documents up to 15 MB
each, 100,000 requests/day is not that many. That's 4166/hour or
70/minute. Any non-anemic server can do that in its sleep. Our
server has two sites each doing more than that several times a day,
plus three smaller sites.
Granted, the server is ridiculously overpowered: 2 CPU, 2 GB RAM, 300
GB HD, Ubuntu 7.10. But one site inefficiently duplicates a 120 MB
object database per process (and runs 3 processes minimum, so that's
240 MB overhead), so that's where some of the memory is going. The
server load right now is 0.11; free memory is 18 MB (+269 MB in
discardable buffers and 1.2 GB "cached"). That's with Apache -> scgi
-> Quixote for the largest site and Apache -> mod_proxy ->
PasteHTTPServer -> Pylons for some of the others. I'm migrating all
the sites to Pylons one by one.
We use PHP only for PHPMyAdmin, for which I've been unable to find an
adequate alternative. We use it only so our non-technical appadmins
can make occasional changes and reports.
--
Mike Orr <slugg...@gmail.com>
Funny, over the last couple of years, I've deployed Nginx in front of
nearly a hundred websites, ranging from Pylons and TurboGears to PHP and
Wordpress down to simple static HTML.
According to Netcraft, Nginx is now deployed in front of over 1 million
domains. Not nearly as much as Apache, but clearly not all of those are
"highly isolated environments". In fact, many sites with heavy traffic
are moving to Nginx due to it's vastly superior scalability.
Some notables that use Nginx:
wordpress.com
youtube.com
hulu.com
rambler.ru
torrentreactor.net
kongregate.com
Where did you get your research from? (Actually, don't answer that, I
can guess).
> Frankly, the flexability of Apache is what makes it so much better
> than the alternatives. It has solid support for just about anything
> related to the web. It has excellent, complete documentation
> surrounded by a large, knowledgeable user community.
I'd qualify this paragraph as "some of Apache's strengths are", rather
than a blanket "it's better". For some people, in some settings, it is
better. For others it isn't. If you need high scalability, it isn't
the best. If you need a small memory footprint it's not the best. If
you prefer a sane configuration syntax it isn't the best. If you need
all three then it's arguably amongst the worst.
> I equate Apache with "There is no better general solution", even
> though various half-assed alternatives have been cooked up.
I certainly intend to sound curt:
There are places where Apache is the only possible solution. Personally
I've found that to be the minority of deployments where you need
specialized modules that are only available on Apache. When you choose
Apache you gain a wide array of modules and add-ons that might neatly
solve a particular problem, but you sacrifice efficiency, scalability,
and simplicity of configuration. Choosing a web server, like any other
software selection, is a matter of weighing pros and cons and selecting
the one that fits your needs the best.
We all have strong opinions about software, but try to stick to reasoned
arguments and leave the insults at home, especially when it's abundantly
clear you've never even tried (or perhaps even read about) the
alternatives.
Cliff
I often agree with you in general (despite disagreeing on particular
details <wink>). However you really ought to quit making unfounded
generalizations about Nginx. It is, in fact, purpose-built to be a
proxy. It's also far better than Pound (which I also used before
switching to Nginx) as a proxy. It's true it doesn't (currently) have
very sophisticated load-balancing (round-robin only, unless you use a
3rd party module), but it's almost stupidly simple to configure and
easily outperforms Pound, especially under load.
Regards,
Cliff
Well, for a Pylons site with Postgres that wants to be scalable up
front, a three-server setup makes sense. One for the Pylons app, one
for the static content, and one for the database. By putting them on
three servers, you've already proven they can be split up. Then you
can add a second server to any of the trio when it starts approaching
capacity. A second static server is of course easy. A second Pylons
server means you have to write the app carefully; e.g., store state
data in cookies, cookie-based sessions, or the database, not on the
local filesystem.
A second database server is usually the hardest part; you'd need a
database engine that propagates changes from any server to all the
others (I think it's called replication). Or designate one server the
master, but then again it's a single point of failure. If you have
some tables that are mostly-read and others that are highly
interactive but there's no relations between the two, you can put them
in different databases on different servers with different
master/slave and replication policies. This all shows the advantage
of having the database on its own server in the first place; you can
avoid the replication problem as long as possible.
The problem with static files is that often they have to be password
protected and only given to certain people (and their usage logged for
statistics). That rules out using a static server for them unless
your auth/logging system is in the webserver rather than in Pylons.
And those systems are usually specific to the webserver, which then
makes it harder to switch webservers.
--
Mike Orr <slugg...@gmail.com>
Sites that are amongst the largest on the internet fall into a corner
case in my mind. As Mike pointed out, sites have an unrealistic
expectation of traffic. I've been involved in the average cases.
My claims come from years in the service provider industry,
watching various deployments. I've been an Apache fan for a long
time, and have seen and deployed hundreds of servers, serving
thousands of sites on Apache. None are youtube.com - and I agree that
this is an important point.
Comparing my Apache deployments with deployments of other servers,
year after year, Apache won hands down:
1) Users of other HTTP servers are always fiddling with them,
restarting after crashes. This may be due to misuse, non-optimal
config - I'm not sure. But I've never had stability issues like this
with Apache.
2) Apache is well-understood by many more folks. There's an army of
support reps downstairs that are compentent, if not experts, at
maintaining and troubleshooting it. The other servers come across as
mysteries (despite often being highly trivial), and end up escalated
instead of fixed.
3) Documentation for Apache is through, searchable, and
understandable. It's full of examples, is available for multiple
versions of httpd. I have seen the Apache documentation turn
motivated people from competent levels to expert just by googling for
it. I'm not saying that other servers don't have decent docs - but
Apache's are amongst the best docs for any software I have ever read,
and I have seen them function in production for years.
> I'd qualify this paragraph as "some of Apache's strengths are", rather
> than a blanket "it's better". For some people, in some settings, it is
> better. For others it isn't. If you need high scalability, it isn't
> the best. If you need a small memory footprint it's not the best. If
> you prefer a sane configuration syntax it isn't the best. If you need
> all three then it's arguably amongst the worst.
Yea, you're right - I'm tacitly assuming that we're talking about the
average cases. Other http servers definitely excel at things, especially
for workloads they have been specially designed. But for every youtube
there's tens of thousands of websites with more average traffic and control
needs.
He did say scalable and video, which I took to mean ultra-heavy use of
very large files, and overkill didn't matter. The disk space alone is
one reason why static content might want to be on a separate box, so
it can be plugged into a large disk array or replicated easier, etc.
>> A second database server is usually the hardest part; you'd need a
>> database engine that propagates changes from any server to all the
>> others
> I always spec apps to be replication-ready: separate handles for read
> & write, and maintaining strict documentation and code reviews as to
> which handles are used where. Usually a 'write' should only be used
> in account management situations. I even go so far as to make sure
> all logging functionality is on a different handle (sometimes even to
> a different DB). Approaches like this at the start make clustering
> and replication simple.
By handle you mean a database connection?
So how do you handle writes? You direct them all to one master server
and let it propagate the changes to the slaves? Have you found a good
replicable database among the free ones that work with SQLAlchemy?
> Most file-access isssues can be handled with an 'authticket' style
> approach across servers. If you're dealing with a specific per-file
> per-access approach, then yeah - you're likely better to have your
> pylons appserver handle that ( though you might be able speed things
> up with some custom plugins or hooks in nginx/lighttpd)
What do you mean by authticket?
--
Mike Orr <slugg...@gmail.com>
> I would say however that mod_proxy module in Apache is also purpose
> built for proxying, that doesn't mean it is a good idea to use it.
The advantage Nginx brings over even purpose-built proxies is its async
model. Pound is quite capable and easy to use but has limited
scalability due to being threaded. Nginx can handle many thousand
requests per second while using only a few megabytes of RAM and a
relatively small amount of CPU.
> What I was trying to say was that where there are solutions whose only
> purpose is to do something, also look at them as well as those which
> may do it as one function of many. This is because the solutions which
> try to do just the one job do often come with a better feature set for
> that one task, that depending on what you are doing may make more
> sense. Yes it may mean a drop in performance for some aspects of the
> problem being solved, but this may be made up in other ways through
> the other features it provides.
Again, in general I'd agree (and I really used to like Pound a lot), but
in this case, unless you need sophisticated load-balancing algorithms,
it's hard to beat Nginx.
> Sorry, I am generalising again of course, but when you really don't
> know the exact details of what a person is wanting to setup and why,
> it is hard to do anything else. :-)
Just because we don't necessarily know exactly what we're arguing about
doesn't mean we have to stop ;-)
Regards,
Cliff
Postgres was mentioned; I'll add that MySQL's replication is also
quite good for splitting out reads against replicas if your
application isn't write bound.
MySQL Cluster is now GA, with provides for HA db servers and
performance scalability without splitting reads and writes between
different servers. Downside is that you need at least four
servers....
As have I. But I'm going to disassemble this argument below.
> My claims come from years in the service provider industry,
> watching various deployments. I've been an Apache fan for a long
> time, and have seen and deployed hundreds of servers, serving
> thousands of sites on Apache.
I think this is true for all of us. The difference is that the world
has changed in the last couple of years and now there's more options to
choose from. And by "options" I don't mean "a smaller, less capable
Apache clone", I mean a paradigm shift in how to handle high loads.
It's well known that threaded/process based servers cannot scale beyond
a reasonable point. Nginx and Lighttpd are async and are specifically
written to address the C10K problem.
As you point out, not all sites need this sort of scalability (certainly
none that I've written or hosted), however there's a fallout benefit to
this work: these servers can scale specifically because they don't use
threads which means they also use *considerably* less memory and also
tend to use much less CPU. I challenge you to name any system that
won't benefit from reduced RAM and CPU utilization.
To give you a concrete example: last week a colleague of mine converted
a server running Apache and mod_php to Nginx and FastCGI. Prior to the
conversion the server was using almost 1.2GB of RAM. After the
conversion he was using 200MB. Is it probable that Apache was
misconfigured? Hard to know unless you have spent years tuning Apache,
but I'll concede it's possible. Is Nginx misconfigured? Well, frankly
it doesn't seem to matter ;-)
> None are youtube.com - and I agree that
> this is an important point.
>
> Comparing my Apache deployments with deployments of other servers,
> year after year, Apache won hands down:
>
> 1) Users of other HTTP servers are always fiddling with them,
> restarting after crashes. This may be due to misuse, non-optimal
> config - I'm not sure. But I've never had stability issues like this
> with Apache.
I had many issues with Lighttpd, but I've had none with Nginx. I'd also
have to question your use of "always" in the above sentence. I strongly
suspect aren't speaking from experience here, rather just hearsay.
> 2) Apache is well-understood by many more folks. There's an army of
> support reps downstairs that are compentent, if not experts, at
> maintaining and troubleshooting it. The other servers come across as
> mysteries (despite often being highly trivial), and end up escalated
> instead of fixed.
And it's poorly understood by just as many, if not more. I first
switched from Apache not due to scalability concerns (like you, I've not
encountered them), but because I find Apache's configuration to be
overwhelming and convoluted. When I first started using Nginx, all the
documentation was in Russian and yet I managed to convert an entire
shared hosting box from a mix of Apache, Pound, and Lighttpd to Nginx in
two days by simply reading the examples. I'd challenge any newcomer to
Apache to do the same.
The fact that you need an army of support reps isn't really advancing
your argument ;-)
> 3) Documentation for Apache is through, searchable, and
> understandable. It's full of examples, is available for multiple
> versions of httpd. I have seen the Apache documentation turn
> motivated people from competent levels to expert just by googling for
> it. I'm not saying that other servers don't have decent docs - but
> Apache's are amongst the best docs for any software I have ever read,
> and I have seen them function in production for years.
I'd never argue it doesn't. In fact, Apache's documentation is clearly
far more extensive than Nginx's. I'd expect no less after being the
workhorse of LAMP for the last decade. I'll happily admit (well, not
too happily) that Apache's documentation is far better than Nginx's.
Unfortunately superior documentation doesn't make for superior software.
Documentation is fixable, but Apache's process model isn't.
> > I'd qualify this paragraph as "some of Apache's strengths are", rather
> > than a blanket "it's better". For some people, in some settings, it is
> > better. For others it isn't. If you need high scalability, it isn't
> > the best. If you need a small memory footprint it's not the best. If
> > you prefer a sane configuration syntax it isn't the best. If you need
> > all three then it's arguably amongst the worst.
>
> Yea, you're right - I'm tacitly assuming that we're talking about the
> average cases. Other http servers definitely excel at things, especially
> for workloads they have been specially designed. But for every youtube
> there's tens of thousands of websites with more average traffic and control
> needs.
Well, here's where I find that most Apache proponent's arguments fall
apart: either they claim that Nginx is best for small-scale websites or
they claim that it's only needed for large-scale websites. They are
both wrong. It's best for both. Nginx scales both up *and* down. It
can run youtube.com or you can embed it in a cellphone. The challenges
of running Apache in a 96MB VPS have been documented on this very list.
The challenges of getting Apache to deal with C10K aren't often
discussed because it isn't possible without getting into very high-end
hardware.
This makes Apache best for... medium-sized sites that don't care about
resource utilization? This is a ridiculous claim, so I'll assert
instead that Apache is best if you need a *specialized* service, such as
mod_svn or mod_jakarta. Apache proponents will point out the wealth of
modules as evidence that Apache is the best for general purpose web
serving. But being best at fronting *particular* applications doesn't
make it best *in general*. So it's not Nginx that's specialized for a
particular workload, it's Apache that's specialized.
Nginx is like a finely-balanced chef's knife: suitable for a variety of
tasks, large and small, as long as they all involve slicing. Apache, on
the other hand, is the swiss-army knife of webservers: bulky, full of
odd specialty tools, and on occasion, marginally useful as a knife.
In either case, apparently they both make for a funny lump in some
people's pockets ;-)
Anyway, I think we've gone way OT for long enough. We can continue
offlist if you like.
Regards,
Cliff
While it may be off-topic** I want to say I've found the background
discussion from everyone involved (who are all more experienced and
knowledgeable in this area than I am) to be very interesting and
educational and valuable.
I'm running Pylons "direct" for a beta site, and behind lighttpd for
some others, but this is one of the few threads in which I've read all
the content lately, and I hope it isn't really considered so off-topic
that you feel you have to stop. And if you do, well, thanks for all the
fish while it lasted, anyway.
-Peter
Yes, this is a good thread. The discussion about what kind of
deployment is best for what kind of site, and how much it matters,
affects all of us. Plus the points about database replication and
multi-server authorization, which I thank Jonathan for.
The only noise was about whether Apache is too bloated for its own
good, and whether Nginx is better than everything else. But knowing
what other Pylons sysadmins think of the merits of each is still
worthwhile, even though we don't want *too* many messages on the
topic. All this info would be good to put into one of the deployment
articles in the Pylons Cookbook, if anybody feels inclined. (I would
but I've got my hands full with Pylons tasks right now.)
--
Mike Orr <slugg...@gmail.com>
Except that vertical scaling doesn't preclude horizontal scaling, it
merely postpones the necessity for implementing it (if not the planning)
and helps limit the scope of it. If Nginx provides superior vertical
scaling, then it will also provide superior horizontal scaling since
vertically scaled systems are the building blocks of a horizontally
scaled system.
> With horizontal scaling you keep your existing machine and just add
> more machines. For horizontal scaling, the limit is going to be how
> easy it is to accommodate your application across a growing number of
> machines. The scalability of Apache here isn't generally going to be
> an issue as you would have sufficient machines to spread the load so
> as to not unduly overload a single machine.
> Although one is buying more hardware with horizontal scaling, the cost/
> performance curve would generally increases at a lessor rate than with
> vertical scaling.
Again, I think this contrast is artificial. You are setting up vertical
scaling and horizontal scaling as mutually exclusive when they are
anything but, and unless you have endlessly deep pockets, you should
prefer to control the growth of your horizontal scaling.
> Of course, there is still a whole lot more to it than that as you need
> to consider power costs, networking costs for hardware/software load
> balancing, failover and possible need for multiple data centres
> distributed over different geographic locations.
Absolutely. And while hardware costs are dropping, hosting and power
costs are going up. My colocation fees have increased an average of 10%
per year, and power fees have quadrupled since I started. I don't
expect this trend to change any time soon.
> One thing that keeps troubling me about some of the discussions here
> as to what solution may be better than another is that it appear to
> focus on what solution may be best for static file sharing or proxying
> etc. One has to keep in mind that Python web applications have
> different requirements than these use cases. Python web applications
> also have different needs to PHP applications.
Given that an average web page is probably 70% or more static or cached
content, I think this is a critical aspect.
> As I originally pointed out, for Python web applications, in general
> any solution will do as it isn't the network or the web server
> arrangement that will be the bottleneck. What does it matter if one
> solution is twice as fast than another for a simple hello world
> program, when the actual request time saved by that solution when
> applied to a real world application is far less than 1% of overall
> request time.
If you try to scale a dynamic application and are going to pass part of
the request off to Python on every request you are going to either fail
spectacularly or spend an awful lot of money scaling horizontally.
There's a reason people have successfully deployed huge Rails apps and
it's not often by having 300 servers. They manage it by making sure
that Rails is only called when absolutely necessary and letting a fast
webserver handle most of the load.
In any case, the same techniques are going to be applied regardless of
which web server you choose. The question is more "how much of my
limited and expensive resources is this single part of my stack going to
consume and what benefit will I be getting for it?" Unless you require
a specific module, Nginx and Apache are more-or-less functionally
equivalent, except that one uses a fraction of the resources of the
other.
> For non database Python web applications issues such as the GIL, and
> how multithreading and/or multiple processes are used is going to be a
> bigger concern and have more impact on performance. This is in as much
> as running a single multithreaded process isn't going to cut it when
> scaling. Thus ease of configuring use of multiple processes is more
> important as is the ability to recycle processes to avoid issues with
> increasing memory usage.
I'd consider "increasing memory usage" to be a bug in the application
and outside the scope of discussion. As far as ease of configuring
multiple processes, I use Nginx's built-in load balancing and a 4 line
shell script to start my application. Don't get me wrong, I think
Apache's process management is quite nice and I'd like to see something
similar added to Nginx, but it's hardly a show-stopper.
> There is a also the balance between having
> fixed numbers of processes as is necessary when using fastcgi like
> approaches, or the ability in something like Apache to dynamically
> adjust the number of processes to handle requests.
Remember you said this (see below*).
> Add databases into
> the mix and you get into a whole new bunch of issues, which others are
> already discussing.
> Memory usage in all of this is a big issue and granted that for static
> file serving nginx and httpd will consume less memory. The difference
> though for a dynamic Python web application isn't going to be that
> marked.
I disagree. As I mentioned earlier, someone I know recently took an
Apache/mod_php application consuming 1.2GB of RAM down to 200MB using
Nginx/FastCGI with no loss in performance or functionality. It's not
clear to me why a Python application would be much different.
> If you are running a 80MB Python web application process, it
> is still going to be about that size whatever hosting solution you
> use. This is because the memory usage is from the Python web
> application, not the underlying web server. The problem is more to do
> with how you manage the multiple instances of that 80MB process.
Sort of. However consider this: if I am running Nginx I can reasonably
*fill* a single server with Python processes and not worry too much
about how much memory Nginx consumes. The resources are available for
running the *application* rather than the webserver. Because the Python
application will undoubtedly be one of the first bottlenecks (database
next), the ability to horizontally scale the application (by running
multiple instances) is critical. By using up system resources, Apache
limits the number of instances of the application that can be run on a
single machine, and by extension across multiple machines.
> There have been discussions over on the Python WEB-SIG about making
> WSGI better support asynchronous web servers. Part of their rational
> was that it gave better scalability because it could handle more
> concurrent requests and wouldn't be restricted by number of threads
> being used. The problem that was pointed out to them which they then
> didn't address is that where one is handling more concurrent requests,
> the transient memory requirements of your process then theoretically
> can be more.
> At least where you have a set number of threads you can
> get a handle on what maximum memory usage may be by looking at the
> maximum transient requirements of your worst request handler.
Then you agree that dynamically adjusting the process pool size is bad
since it would have the same net effect? This appears (to me) to
contradict what you claimed as a feature earlier [*].
> With an
> asynchronous model where theoretically an unbounded number of
> concurrent requests could be handled at the same time, you could
> really blow out your memory requirements if they all hit the same
> memory hungry request handler at the same time. Thus a more
> traditional synchronous model can thus give you more predictability,
> which for large systems in itself can be an important consideration.
Of course, this is where your earlier suggestion of using a hardware
load-balancer would be a good idea. I think a much better use of
resources (read "money") would be spending some of it on a dedicated
load-balancing solution which can control how requests are distributed
rather than repurposing inefficiency into a feature.
At any rate, I don't actually think the above has much to do with Nginx
vs Apache as Pylons deployment options. Because Pylons tends to be run
as a threaded app (is anyone doing otherwise?), we still have the same
predictability. In fact our predictability is easier since we don't
need to calculate the cost of the web server's memory explosion in
addition to our application's needs.
In all of the above, I haven't seen any explanation from you as to why
Apache would be superior to Nginx as a deployment option, only that it
wouldn't be the worst bottleneck in your application stack. Not
terribly convincing. If we were discussing a closed-source solution
versus an open source solution, this might be sufficient ("good
enough"), but that's not the case here.
I'll give you a quick list of actual benefits I see from using Nginx:
1) low CPU overhead
2) small memory footprint
3) consistent latency for responses
4) scalable in all directions
5) simple and and syntactically consistent configuration
Benefits I see for Apache:
1) excellent documentation
2) wide array of modules, especially esoteric ones
3) mod_wsgi provides a slightly more efficient communication gateway to
Python backends
4) automatic process management (restarting backends)
Of Apache's benefits I see
1) as mostly moot due to Nginx's simplicity
2) completely moot since I don't use them
3) not enough to overcome the efficiency lost elsewhere
4) as mostly moot because it's simple to solve in other ways
This probably doesn't exactly match other people's requirements and
certainly there are other considerations that might tip the scales one
way or the other.
> Anyway, this is getting a fair bit off topic and since others are
> seeing my rambles as such, I'll try and refrain in future. :-)
Please don't. You happen to be one of the few feather-heads I don't
mind hearing from, even if I find your arguments kind of slippery ;-)
And incidentally, congrats on your baby =)
For people who care more about numbers than theoretical discussions (aka
"obstinate") please refer to the following which provides a fairly
decent overview of resource utilization between the two servers:
http://www.joeandmotorboat.com/2008/02/28/apache-vs-nginx-web-server-performance-deathmatch/
Regards,
Cliff
> If you try to scale a dynamic application and are going to pass part of
> the request off to Python on every request you are going to either fail
> spectacularly or spend an awful lot of money scaling horizontally.
> There's a reason people have successfully deployed huge Rails apps and
> it's not often by having 300 servers. They manage it by making sure
> that Rails is only called when absolutely necessary and letting a fast
> webserver handle most of the load.
Since I think it's of specific interest, here's an interesting approach
that could probably be made to work with Apache as well:
Regards,
Cliff
I'm using Apache2 + mod_wsgi 2.0 as a process controller and nginx to
serve static content and proxy dynamic requests to apache2.
Apache2 uses the worker (threaded) MPM and is configured to be pretty
lightweight [1] by only loading a minimal set of modules, turning
keep-alive off and having a limited amount of threads. This is possible
because it runs behind nginx which takes care of spoon-feeding the slow
clients over a keep-alive connection making dynamic requests to the
Apache backend quite fast since they're local. Since nginx does some
caching of the response, these requests are quite fast and don't tie up
a heavy python process for too long so a small pool of workers can
handle moderately high loads.
The reason I use it over paster+supervisord is because I find it *much*
easier to set up and maintain and more powerful (mod_wsgi can be
configured to spawn wsgi applications into separate processes under
their own user/group, restart them if they crash, kill them if they
deadlock, isolate them in their own virtualenv, etc...).
As you see, It's more or less the typical supervisord + paster + nginx
setup but changing paster and supervisord with apache+mod_wsgi because I
find the later much easier to configure and maintain.
Alberto
[1] The master process only eats around 4M resident size on my machine.
The slave processes which host the python app take up much more but I
guess it's roughly the same they would take using paster or any other
python webserver since it is the actual python app.
http://www.cherokee-project.com/
I've never used it in production (last time I experimented with it was a
couple years ago and it wasn't mature enough), but it's reported to be
quite fast, even edging out Nginx in several benchmarks.
http://www.alobbs.com/news/104
It also has native SCGI support and a management interface written in
Python.
The documentation isn't what it could be, but I expect the admin
interface helps out quite a bit on that count.
Regards,
Cliff
On Fri, 2008-05-16 at 13:38 -0700, Jonathan Vanasco wrote:
> I'm a little unclear on the better ways to deploy a Pylons app.
>
> My production servers run nginx -- is it better to use some fastcgi
> support (if so, how?) or just do a "paster serve" and proxy to that
> port?
>
> I've read a handful of ways on how-to-deploy apps, and all seem
> different. I've yet to see a comparison or "this is THE way to do it"
> document.
> >
Has anyone tried out the mod_wsgi module for *Nginx*? Yeah, I know,
weird: http://wiki.codemongers.com/NginxNgxWSGIModule
Being asynchronous rules! That's why Erlang, Squid, IronPort servers,
Nginx, etc. are able to handle so many concurrent requests so easily.
Here's the link to the C10K paper referenced earlier:
http://www.kegel.com/c10k.html. It explains why a thread or process
model doesn't cut it if you want to handle 10K simultaneous requests.
If you're interested in doing asynchronous programming in Python but
without the painful callback style approach used by Twisted, check out
http://wiki.secondlife.com/wiki/Eventlet. It's based on the same
tricks used by Yahoo Groups, IronPort, and Slide.
As usual, I recommend that anyone who wants to talk about scalability
read "Scalable Internet Architectures". Gees, I probably sound like a
broken record concerning that book ;)
Finally, a plug for my article (if you don't mind). If you want to
learn more about concurrency approaches in Python, checkout my
article: http://www.ddj.com/linux-open-source/206103078
Thanks for your patience ;)
-jj
--
I, for one, welcome our new Facebook overlords!
http://jjinux.blogspot.com/
I personally know the author and I definitely recommend it. He's
focused and competent.
> Being asynchronous rules! That's why Erlang, Squid, IronPort servers,
> Nginx, etc. are able to handle so many concurrent requests so easily.
> Here's the link to the C10K paper referenced earlier:
> http://www.kegel.com/c10k.html. It explains why a thread or process
> model doesn't cut it if you want to handle 10K simultaneous requests.
There's just only a little problem with async: it does not scale on
multicore architectures or multiple nodes.
At least not by itself. You have to mix it with other kinds of
concurrency approaches to gain advantage of that.
Erlang is async but it does scale everywhere by the way it was built.
> If you're interested in doing asynchronous programming in Python but
> without the painful callback style approach used by Twisted, check out
> http://wiki.secondlife.com/wiki/Eventlet. It's based on the same
> tricks used by Yahoo Groups, IronPort, and Slide.
I will definitely look into it! Thanks. Have you tried it for
something "real-worldish" or just examples?
> As usual, I recommend that anyone who wants to talk about scalability
> read "Scalable Internet Architectures". Gees, I probably sound like a
> broken record concerning that book ;)
Noted. I am reading Building Scalable Websites at the moment but I
will buy it afterwards
--
Lawrence, stacktrace.it - oluyede.org - neropercaso.it
"It is difficult to get a man to understand
something when his salary depends on not
understanding it" - Upton Sinclair
Oh, cool!
>> Being asynchronous rules! That's why Erlang, Squid, IronPort servers,
>> Nginx, etc. are able to handle so many concurrent requests so easily.
>> Here's the link to the C10K paper referenced earlier:
>> http://www.kegel.com/c10k.html. It explains why a thread or process
>> model doesn't cut it if you want to handle 10K simultaneous requests.
>
> There's just only a little problem with async: it does not scale on
> multicore architectures or multiple nodes.
> At least not by itself. You have to mix it with other kinds of
> concurrency approaches to gain advantage of that.
> Erlang is async but it does scale everywhere by the way it was built.
Haha, when you said it didn't scale for multiple cores or multiple
nodes, I was going to knee jerk and say, "What about Erlang!" You
beat me to the punch ;) Yes, async is a technique. Just like event
based programming in general is a technique. You still have to use
that technique in smart ways to build big systems.
>> If you're interested in doing asynchronous programming in Python but
>> without the painful callback style approach used by Twisted, check out
>> http://wiki.secondlife.com/wiki/Eventlet. It's based on the same
>> tricks used by Yahoo Groups, IronPort, and Slide.
>
> I will definitely look into it! Thanks. Have you tried it for
> something "real-worldish" or just examples?
I haven't used Eventlet yet. It's based on the same ideas that
IronPort uses, and clearly, IronPort servers are all over the world.
It's nice because it "feels" like threads, but it "acts" like async.
>> As usual, I recommend that anyone who wants to talk about scalability
>> read "Scalable Internet Architectures". Gees, I probably sound like a
>> broken record concerning that book ;)
>
> Noted. I am reading Building Scalable Websites at the moment but I
> will buy it afterwards
I read that one too. It's a big long and boring, eh? "Scalable
Internet Architectures" was a bit more to the point, and it includes a
lot more stuff about scalability per se.
I blogged about both of them:
http://jjinux.blogspot.com/search?q=scalable+internet+architectures
http://jjinux.blogspot.com/2006/11/book-review-building-scalable-web.html
Happy Hacking!
He also gave a talk about nginx's mod_wsgi at the PyCon Italy
<http://www.pycon.it/pycon2/schedule/talk/una-implementazione-di-wsgi-per-nginx/>
(webpage in italian)
> Haha, when you said it didn't scale for multiple cores or multiple
> nodes, I was going to knee jerk and say, "What about Erlang!" You
> beat me to the punch ;) Yes, async is a technique. Just like event
> based programming in general is a technique. You still have to use
> that technique in smart ways to build big systems.
Right.
> I haven't used Eventlet yet. It's based on the same ideas that
> IronPort uses, and clearly, IronPort servers are all over the world.
> It's nice because it "feels" like threads, but it "acts" like async.
Like pyprocessing feels like threads but acts like processes ;)
I am fond of libraries/frameworks like that lowering the bar of
scalability adoption.
I think it is something we should not ignore anymore as developers, of any kind
>> Noted. I am reading Building Scalable Websites at the moment but I
>> will buy it afterwards
>
> I read that one too. It's a big long and boring, eh?
Yup. There's some interesting stuff but it's a little bit too focused
on PHP+MySQL (after all is based on the Flickr experience).
Sometimes it feels like "Python do not have this kind of problem,
let's move along"
> "Scalable
> Internet Architectures" was a bit more to the point, and it includes a
> lot more stuff about scalability per se.
Wow. What about High Performance Websites by Souders? I bought it at
the O'Reilly booth 2 weeks ago at PyCon Italy
> I blogged about both of them:
> http://jjinux.blogspot.com/search?q=scalable+internet+architectures
Nice
> http://jjinux.blogspot.com/2006/11/book-review-building-scalable-web.html
"I was tearing my
hair out when Cal spent five pages explaining what source control is and
listing its basic features." hahahaha I thought the same thing.
I found nice the deep intro to encodings and utf, by the way
Bye
Hmmm, now that you mention it, I think all of those deployments may
have been lighttpd. I had to hear a lot of the fallout - lighttpd
was being used to generate tokens on servers that would be used for
instantiating authentication credentials in a single sign-on server
for admins.
So there was much gnashing of teeth whenever this would crap out.
I was fortunate enough to not have this be my baby, and mostly didn't
have to deal with it.
But I probably shouldn't continue to take that experience as
indicitive of everything that's not Apache.
> And it's poorly understood by just as many, if not more. I first
> switched from Apache not due to scalability concerns (like you, I've not
> encountered them), but because I find Apache's configuration to be
> overwhelming and convoluted.
Really? I can see it being overwhelming, but it seems very
understandable to me. Paired with their documentation, I don't think
I've ever had a real problem getting Apache to do something I knew it
could.
Well, unless you count kinda crazy, obscure mod_rewrite stuff - but of
course that's a black art just because the rabbit hole goes as deep as
you care to follow :).
> The fact that you need an army of support reps isn't really advancing
> your argument ;-)
Heh, well, for every change needed to Apache, there's 1000 people that
need help configuring their POP3 client. Apache is hardly the reason
the reason there's an army :).
> This makes Apache best for... medium-sized sites that don't care about
> resource utilization? This is a ridiculous claim, so I'll assert
> instead that Apache is best if you need a *specialized* service, such as
> mod_svn or mod_jakarta.
I don't think that's such a ridiciulous claim! Consider the
application server that hosts the apps that I write for my company's
internal use. It hosts four or six Pylons applications and one Rails
app. One of these apps handles around 1000 uses a day, one around
100, one around 10. The Rails app is an AJAX form that just pushes
collected data to the browser, so is usually busy despite only having
an average of 1 user a day.
The server these apps are housed on is gratuitously overpowered.
Apache's flexability makes this use-case trivial.
Maybe this deployment pattern is uncommon?
> Apache proponents will point out the wealth of
> modules as evidence that Apache is the best for general purpose web
> serving. But being best at fronting *particular* applications doesn't
> make it best *in general*. So it's not Nginx that's specialized for a
> particular workload, it's Apache that's specialized.
Eh, I wouldn't make that claim about Apache modules. Many of them are
irrelevant to me, some seem downright pointless.
> Nginx is like a finely-balanced chef's knife: suitable for a variety of
> tasks, large and small, as long as they all involve slicing. Apache, on
> the other hand, is the swiss-army knife of webservers: bulky, full of
> odd specialty tools, and on occasion, marginally useful as a knife.
>
> In either case, apparently they both make for a funny lump in some
> people's pockets ;-)
I wouldn't want the lump of a chef's knife anywhere near my pocket,
lest I be bleeding out all over the floor!
> Anyway, I think we've gone way OT for long enough. We can continue
> offlist if you like.
I'm more or less done - I think you've convinced me that Nginx is
probably worth another look at some point. After all, there's nothing
wrong with having another tool around to solve some problem, even if
Apache is where I'd go first.
No, I think it's quite common, however I'm one of those obstinate types
who refuses to equate popularity with correctness =)
> I'm more or less done - I think you've convinced me that Nginx is
> probably worth another look at some point. After all, there's nothing
> wrong with having another tool around to solve some problem, even if
> Apache is where I'd go first.
Well, I don't want to discount the value of experience. If you know
Apache well then that can be a perfectly valid reason to stick with it,
especially if you aren't hitting any limitations with it.
My main concern in this thread has been to dispell the idea that Nginx
is only appropriate in specialized deployments or the inverse that
Apache is the best general-purpose webserver. I believe neither to be
true, but that doesn't mean that I believe Apache is a *bad* choice,
only that unless you are already heavily invested in Apache (existing
deployments, trained staff, etc) then perhaps you should consider
alternatives.
Regards,
Cliff
2008/5/22 Shannon -jj Behrens <jji...@gmail.com>:
>
> Here's my two cents:
>
> Has anyone tried out the mod_wsgi module for *Nginx*? Yeah, I know,
> weird: http://wiki.codemongers.com/NginxNgxWSGIModule
But you need run cooperative wsgi app :-(
Twisted's people handle this issue running the wsgi call in a thead pool.
>
> If you're interested in doing asynchronous programming in Python but
> without the painful callback style approach used by Twisted, check out
> http://wiki.secondlife.com/wiki/Eventlet. It's based on the same
> tricks used by Yahoo Groups, IronPort, and Slide.
>
Reading http://wiki.secondlife.com/wiki/Eventlet/Documentation is interesting
They comment the NginxNgxWSGIModule and his use. Its interesting too the
path to python sockets. And the problem with database C implemented sockets.
They solve the issue whith a threaded pool (like Twisted's people do).
They haven't a "eventlet's monkey patching" for file access (I think).
Dont use it
if you access files by NFS.
It's seem a good tip, but not an easy way to work :-(
>
> Thanks for your patience ;)
Thanks for you tip and excuse my poor english:
Javi
This is quite interesting. I've been looking for a way to build a
site scraper (something analogous to an aggregator but more
site-specific) that could eventually become asynchronous, and this
looks a lot easier than Twisted. It's like a cron scheduler for
interruptible functions, which fits my brain. The WSGI interface and
backdoor (interactive) interface could also serve as a UI for
accessing the data down the road, although I'm not sure that needs to
be in the same process, or whether to use a web or GUI interface or
both.
One thing that's unclear is what "executed within the main loop's
coroutine" means (call_after() function). That means it's inserting
Python statements into the other routine, or using its locals and
globals? The example shows a delayed timeout example, where the
exception is delivered to the current coroutine (the caller) rather
than to the main loop's coroutine. Or is everything "executed in the
main loop's coroutine"?
> They comment the NginxNgxWSGIModule and his use. Its interesting too the
> path to python sockets. And the problem with database C implemented sockets.
> They solve the issue whith a threaded pool (like Twisted's people do).
I wonder if SQLAlchemy can use it. There are engine args 'creator'
and 'pool' which provide some support for external connection
factories and customized pools.
--
Mike Orr <slugg...@gmail.com>
Bob Ippolito was telling me once that he took a server in Twisted and
rewrote it in stackless. He got some performance gains, but then he
rewrote it in Erlang. It dropped from 40% CPU utilization to almost
nothing, and it was a heck of a lot faster. In some situations,
Erlang is a really nice tool. Now, if only the syntax wasn't so
ridiculously ugly ;)