Scalability / Traffic-Volume Numbers

Jimmy

unread,

May 9, 2007, 9:21:44 AM5/9/07

to django...@googlegroups.com

I am trying to get good stats for sites run off of Django to prove that
Django can really be a business solution and can scale well.

So far the only good articles I have found are:
http://wiki.rubyonrails.com/rails/pages/Framework+Performance
http://www.alrond.com/en/2007/jan/25/performance-test-of-6-leading-frameworks/

Does anyone know of any other posts about this topic? or if you run a
big Django website and would care to share your Traffic Load and Server
Setup with me that would be awesome.

Jimmy

Tom

unread,

May 9, 2007, 11:44:53 AM5/9/07

to Django users

I too would love to see some ideas on what others are doing for
scaling in all aspects. There was another recent post looking for
ideas on scaling the back end. As well there have been a number of
discussions centered on the concept multiple database support,
SQLRelay, SQL Alchemy ...

I'll start the discussion using my site and ideas as a reference.

So here is my Django powered site http://www.sodahead.com.
I wrote about the initial development in a blog located at
http://www.sodahead.com/blog/30/
Please read the blog, it covers the core of the technology stack.

SodaHead is a reasonably conventional social site. 100% content
driven. We are ~6 weeks old so have not experienced heavy traffic yet
but we built for capacity. We have made a number of changes to a
number of aspects of Django to enable capacity and to scale. The
biggest performance improvements were, making sure the schema was
object centric and heavily leveraged surrogate keys, triggers for
counting functions, Jetty/Solr for Full Text Search and memcache for
all object caching AND all template caching. We have a reasonably
conventional LAMP infrustructure located in a colo in LA. We are
monitoring performance and tuning currently.

I can throw out stats for the site, but as with all benchmarks and
stats they can be tweaked for presentation. Solutions really depend on
the problem domain and what you are trying to do.

On May 9, 6:21 am, Jimmy <jimj...@syska-inc.com> wrote:
> I am trying to get good stats for sites run off of Django to prove that
> Django can really be a business solution and can scale well.
>

> So far the only good articles I have found are:http://wiki.rubyonrails.com/rails/pages/Framework+Performancehttp://www.alrond.com/en/2007/jan/25/performance-test-of-6-leading-fr...

Joseph Heck

unread,

May 9, 2007, 3:04:10 PM5/9/07

to django...@googlegroups.com

Back in February, TrenchMice was hit with a reasonable load from a
front-page Slashdot article, and they wrote it up at
http://www.cogitooptimus.com/2007/02/11/wow-we-made-it/

I helped them with some load testing before the site ever went live,
and it looked to be able to handle it with aplomb. They are making use
of memcached and anonymous view caching, by the way.

-joe

David Cancel

unread,

May 10, 2007, 3:01:27 PM5/10/07

to Django users

We use Django for all our Compete.com websites, performance is great,
We've been DUGG and Slashdot'd several times. We are averaging
200-300k users / per month and about 25k django-powered pageviews per
day.

Cheers,
David

Ray Dookie

unread,

May 16, 2007, 9:03:47 PM5/16/07

to Django users

Guys guide me a bit here..
I'm interested to know how much load can a django box say with 512MB
ram handle.
I'm brining out a site soon, and was planning on taking either the
256mb or 512mb (VPS) setup that www.slicehost.com offers.

what i was wondering, is with a box with those specs (on average),
what i could expect in terms of what is the vists/pageviews i could
handle on that single sever (VPS) set up before i have need to get
more servers.

Also.. from you expereiences. For any single server (dedicated or VPS)
setup..
what specs can handle what load?

Thanks.
Cheers

Kelvin Nicholson

unread,

May 16, 2007, 9:44:30 PM5/16/07

to django...@googlegroups.com

Ray Dookie wrote:
> Guys guide me a bit here..
> I'm interested to know how much load can a django box say with 512MB
> ram handle.
> I'm brining out a site soon, and was planning on taking either the
> 256mb or 512mb (VPS) setup that www.slicehost.com offers.
>

I'm certainly not a guru, but one question popped up automatically: what
type of data are you going to serve (how dynamic will your site be), and
what will your level of caching be?

> what i was wondering, is with a box with those specs (on average),
> what i could expect in terms of what is the vists/pageviews i could
> handle on that single sever (VPS) set up before i have need to get
> more servers.
>

You might be interested to see the following benchmark, yet remember it
is just a benchmark.

http://superjared.com/entry/quick-django-benching/

Cheers,

Kelvin

--
Kelvin Nicholson
Voice: +886 9 52152 336
Voice: +1 503 715 5535
GPG Keyid: 27449C8C
Data: kel...@kelvinism.com
Skype: yj_kelvin
Site: http://www.kelvinism.com

James Bennett

unread,

May 16, 2007, 9:54:49 PM5/16/07

to django...@googlegroups.com

On 5/16/07, Ray Dookie <rayd...@gmail.com> wrote:
> I'm interested to know how much load can a django box say with 512MB
> ram handle.

The biggest question to ask here is what level of traffic you're
expecting, and how quickly you expect that traffic to go; there are
lots of things you can do to your setup to optimize for different
traffic levels, so it's very hard to give a concise answer to "what
can this server handle".

Generally your biggest RAM sinks are going to be (not necessarily in
this order):

* memcached, if you're using it
* Apache/mod_python, if you opt for that as the web server setup
* Your database server

For most situations, focus first on how much RAM you give to the
database, because the database is almost always the first bottleneck
in scaling (and because a huge memcached instance with a blazing-fast
web server is no good if you can't get at the data to put it in the
cache), then look at other concerns.

If memory is a significant concern, you may also want to look at using
lighttpd and FastCGI; lighttpd isn't always the ideal choice, but you
can squeeze a little extra performance out of it by taking advantage
of its relatively low memory use compared to Apache with prefork MPM
(though bear in mind that you'll still have memory overhead from
FastCGI processes).

--
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."

Ray Dookie

unread,

May 16, 2007, 10:31:07 PM5/16/07

to Django users

Thanks... I'll check out the article.
As for what kinda of data i'm serving, the site would be pretty
dynamic and as for the level of caching, i'm not sure as yet.

Ray Dookie

unread,

May 16, 2007, 10:38:48 PM5/16/07

to Django users

I'm starting off with everything on 1 server.
As for traffic at the start I'm looking at 2000-3000 hits/10K - 20K
page views per day.

Maybe I should have asked this instead :
If you used one of www.slicehost.com VPS's (say the 512MB ram one)..
from any experience you may have on similar setups, running everything
on a single server.. how would you configure it for optiminal
performance?
assuming it was a very dynamic Django + MySql site.

And.. what you think visits/pageviews wise... how many this server
would be able to handle (just off the top of your head - no hard facts
needed)..

thanks
cheers.

Caz

unread,

May 17, 2007, 2:44:25 PM5/17/07

to Django users

(Posting attempt no 3, apologies for duplicates)
I run 3 sites on my vps at www.redwoodvirtual.com

Here's some stats for a vps with 64meg ram and 64meg swap
Here's the output of top:

top - 16:31:30 up 53 days, 4:19, 1 user, load average: 0.09, 0.35,
0.79
Tasks: 64 total, 1 running, 57 sleeping, 0 stopped, 6 zombie
Cpu(s): 0.0% user, 0.7% system, 0.0% nice, 99.3% idle
Mem: 60368k total, 55452k used, 4916k free, 4344k
buffers
Swap: 65528k total, 39296k used, 26232k free, 15992k
cached

Here's my apache config
<IfModule worker.c>
StartServers 2
MaxClients 50
MinSpareThreads 10
MaxSpareThreads 25
ThreadsPerChild 25
MaxRequestsPerChild 0
</IfModule>

It still starts up 29 processes as given by:
# ps -e | grep apache2 | wc
29 116 928
If someone could point me to an explanation of how it gets to 29
process with that config I would appreciate it.

I ran the performance test remotely from a server with a 20ms ping
time to my vps. (Itself also vps but with another provider)

The test basically fetches the url specified for n many times using m
many threads.
Here's the code: http://dpaste.com/hold/10514/

The page i'm fetching is about 12kb
-rw-rw-r-- 1 caz caz 12801 May 17 16:32 index.html
It includes 15 database news entries. Each one chopped after a couple
words. It has a quote of the day which get fetched from the db.
Regular django templating. It uses postgresql.
No caching enabled whatsoever. No optimisation.

This obviously ignores fetching all the images and other static
content. However those kinda stats should be available. And i'm not
using django to serve them.

The times mentioned are seconds.

Using 25 threads and 40 requests per thread:
$ python http_getter.py 25 40
Some 50 requests were not served but returned 500:Server error and it
had to swap out severely.
Total requests:1000 Time:143.0

Lowering the concurrency by 5 times speeds things up and also no more
server errors and no more swapping. I need to tune my config some more
it would seem.
$ python http_getter.py 5 40
Total requests:200 Time:28.2

And it scales pretty linearly.
$ python http_getter.py 5 400
Total requests:2000 Time:214.6

Here's some calls to a similar page on the same vps but on another
virtual host. This one has file caching enabled in django tho and the
page is 7kb in size:
$ python http_getter.py 25 40
Total requests:1000 Time:38.7
Looks like about 26 requests per second.

With caching enabled I suspect serving all 10000 requests for your day
inside an hour will be fine. 10000/3600=2.7 requests per second needed
and my cached tiny vps manages 26/second. Might even make it given the
other request for static content Which should be cached after the fist
hit.

All of the above relies heavily on my little load testing script. And
the load is all from the same machine. Please have a look at the
script before taking these stats as meaningful :)

Tim Chase

unread,

May 17, 2007, 2:51:18 PM5/17/07

to django...@googlegroups.com

> <IfModule worker.c>
> StartServers 2
> MaxClients 50
> MinSpareThreads 10
> MaxSpareThreads 25
> ThreadsPerChild 25
> MaxRequestsPerChild 0
> </IfModule>
>
> It still starts up 29 processes as given by:
> # ps -e | grep apache2 | wc
> 29 116 928

Well, my first thought is, if you omit the "| wc", what do you
see? What's the output of

# ps -e | grep apache2

Frequently, one of those processes is the grep process. That
gets you down to 28. There might be other items in those results
that give a forehead-smacking "duh" to the answer.

-tim

Caz

unread,

May 18, 2007, 4:56:59 AM5/18/07

to Django users

Good point, tho i do ps -e which only lists the process name, not its
params. So greps not included.

Here's the count specifically excluding greps...
$ ps -ef | grep apache2 | grep -v grep | wc
29 319 2349

Here's the straight grepped output with f added
caz@pilot:~$ ps -ef | grep apache2
root 686 1 0 May13 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22687 686 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22688 22687 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22690 22688 0 May17 ? 00:00:01 /usr/sbin/apache2 -k
start -DSSL
www-data 22691 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22692 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22693 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22694 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22695 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22696 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22697 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22698 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22699 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22700 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22701 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22702 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22703 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22704 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22705 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22706 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22707 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22708 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22709 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22710 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22711 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22712 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22713 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22714 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL
www-data 22715 22688 0 May17 ? 00:00:00 /usr/sbin/apache2 -k
start -DSSL

Looks like there's a parent process with a single child which then has
27 children of his own...

I'm assuming apache's setup is running worker mpm since i get worker
when i list apache's modules

# apache2 -l
Compiled in modules:
core.c
mod_access.c
mod_auth.c
mod_log_config.c
mod_logio.c
mod_env.c
mod_setenvif.c
worker.c
http_core.c
mod_mime.c
mod_status.c
mod_autoindex.c
mod_negotiation.c
mod_dir.c
mod_alias.c
mod_so.c

>From the apache docs(http://httpd.apache.org/docs/2.0/mod/
worker.html):
"A single control process (the parent) is responsible for launching
child processes. Each child process creates a fixed number of server
threads as specified in the ThreadsPerChild directive,...."

So thats 1 Process. The parent.

"The number of processes that will initially launched is set by the
StartServers directive. Then during operation, Apache assesses the
total number of idle threads in all processes, and forks or kills
processes to keep this number within the boundaries specified by
MinSpareThreads and MaxSpareThreads.....The maximum number of active
child processes is determined by the MaxClients directive divided by
the ThreadsPerChild directive."

According to my config the max active child processes should be
MaxClients(50)/ThreadsPerChild(25)=2 child processes. Which totals to
3 when u include the parent. However after startup it goes up to 29
total. It used to be somewhere in the 50ties till i tuned MaxClietns
down to 50, MaxSpareThreads down to 25 and MinSpareThreads down to 10.

Caz

unread,

May 18, 2007, 5:58:43 AM5/18/07

to Django users

I suspect I've found a clue. My vps is a user mode linux vps. I'm
beginning to suspect that under uml or at least the one i'm operating
under the threads are handled 'special' since i can see the apache
threads on my real box using ps.

In fact i ran my load tester locally on the vps and each thread showed
up as a separate process using ps. Which on a normal box shows up as
threads...

Reply all

Reply to author

Forward