Django Performance Discoveries Part 1

8 views
Skip to first unread message

Prairie Dogg

unread,
Apr 27, 2008, 1:16:28 AM4/27/08
to Django users
Hey Everybody,

I've been using django for almost a year now and I've been spending
some time recently trying to optimize the slicehost VPS(s) that I use
to run several django sites I've developed. I wanted to share my
findings with the larger group in hopes that my oversights can be
pointed out and whatever 'findings' I've made can be useful to folks
who are just starting off. I've been developing a blow-by-blow of my
slicehost setup - I gained a lot from the "dreamier django dream
server" blog post a while back. But to make things brief for the
first post, I'll just summarize my setup here:

512 meg slicehost slice w/ Hardy Heron
memcached with cmemcached bindings doin' its cache thang with 256 megs
of RAM
nginx on port 80 serving static files
apache mpm worker on 8080 w / mod_wsgi serving dynamic content
postgres 8.3 w/ geo libraries
django_gis (thanks justin!)
my application

I'll keep it to 3 sections of musings for this post:

triage troubles
memcached musings
context-processor conundrum

triage troubles

At pycon someone asked Jacob KM what he used to performance test his
websites and he said "siege". A quick google search turned it up
(http://www.joedog.org/JoeDog/Siege).
I seem to recall Jacob mentioning that this was his preferred method
because it was more of a "real life" test than perhaps benchmarking
tools that would profile the code. Compiling and using siege was a
snap. My test was of a site I wrote that does a lot of database
queries to draw up any given page (mostly because of a complex
sidebar) when I turned it on, real easy like, to a dev server, the
server crumbled with only 10 simultaneous users and anything higher
than 5 clicks per user.

Observation #1: Make sure your debug settings are turned off.

After I turned debug settings off, performance maybe doubled, but
still was nothing that could handle even moderate traffic gracefully.
20 simultaneous users on 3 clicks per user were getting up into the
20+ second wait for a response range. Basically awful. Not shocked,
because I knew that my db querying was horrendously inefficient. This
was OK, because I had memcached up my sleeve. An observation that I
made on the first test that was constant throughout all subsequent
tests, was that initial queries were the fastest and subsequent
queries became progressively slower and slower. I'm assuming this is
because of something like queries queuing up at that db, or running
through memory, but I don't have enough context or knowledge of the
whole stack to isolate the problem, more on this later.

memcached musings

I went on and compiled cmemcache because the consensus opnion on the
internets is that its fastest. I'll just assume that's so because it
has 'c' in the name and if you read it on the internets, it must be
true.

I put in all the cache settings, put in the Cache middleware and ran
siege again, waiting for the glorius results. Blam. Exactly the
same. Actually, a little worse. I scratched my head for about 3
hours before I realized that I had mistyped the memcached port number
in the settings. After that, much improved. I could do 300
simultaneous visitors doing 3-5 clicks apiece with tolerable
performace. 1000 visits doing 1 click each also held up very well,
the longest response time being in the 4-6 second range. Without
fail, the earliest requests were the shortest wait, many well under a
second, the last requests were the longest waits. Also, as I
ratcheted up pressure from siege, I was running top on the 'beseiged'
server watching the running processes. I notice a ton of postgres
processes. This challenged my notion of how memcached worked. I
thought that memcached would take the resulting page for a given view
and spit it back out if the url was requested again with no database
involved. I was still hitting the db _alot_.

Observation #2 Is this thing on?: Memcached really does dramatically
improve your sites responsiveness under load, if you don't see massive
improvement, you haven't gotten memcached configured correctly.

context-processor conundrum

Then I remembered that I had written a custom context processor that
was doing the bulk of the nasty database querying. I reckon that
whatever the order of operations was for request / response handling,
the result of the context processing was not getting cached. So I
wrote 4-5 lines to check / set the cache in my custom
context_processors.py and voila, that instantly knocked all queries
to the db down to zero. Despite the absense of postgres processes
stacking up, the same phenom of early queries fast, subsequent queries
slow still applied, at this point I'm not exactly sure what's causing
it. It's not that it's surprising, its just that I'd like to
understand exactly why its happening.

Observation #3: Low level cachin' works well in cases like
context_processors, or other expensive non-view functions.

OK - I'll stop here for now, I hope this was useful or at least
amusing. I'd love to hear stories from other "optimization" newbies
or suggestions from the experts about how folks go about their
optimizing their own projects.

Perhaps more on this to come.

Almir Karic

unread,
Apr 27, 2008, 2:32:24 AM4/27/08
to django...@googlegroups.com
On Sun, Apr 27, 2008 at 7:16 AM, Prairie Dogg <wiley....@gmail.com> wrote:
> Perhaps more on this to come.

yes please :-)

thanks for posting it

--
error: one bad user found in front of screen

rich

unread,
Apr 27, 2008, 7:17:37 AM4/27/08
to Django users
Thanks for sharing!

My setup is similar to yours except I don't use nginx at all - just
another apache virtual host for media.mysite.com. Not sure which is
best, but one less moving part from my point of view?

I haven't done any load testing, but I really like the way mod_wsgi
works; I use it in daemon mode (with worker MPM Apache) - it's never
caused me a problem and **feels** tidier than fcgi.

Also I have much less memcached - only 16MB, but I'm on a 256Mb
slicehost slice, for now; I haven't explored any optimisations here as
I'm still building core features in my first django project.

I've had one drama where Gutsy crashed: out of memory, unfortunately I
didn't realise until all log evidence fell off the end of the syslog
cliff.

Happy optimising
Rich

Prairie Dogg

unread,
Apr 27, 2008, 9:08:15 AM4/27/08
to Django users
I'm still trying to wrap my head around what the advantages of
worker MPM are, I've read a couple articles that have started me
down this road - the consensus view seems to be worker MPM
w/ mod_wsgi is the best way to go from a memory and separtion
of concerns POV, the only potential drawback being that your
django app needs to be 'thread safe'. Sadly I'm too much of
a novice to really understand what that means in terms of my
code or what sorts of patterns I should be using or avoiding.

rich

unread,
Apr 27, 2008, 11:52:09 PM4/27/08
to Django users
Yes, I too am at a similar level of confusion as to when django is not
thread safe.

I assume this could happen only if I explicitly create new threads
myself, or if I use some non-django module that isn't itself thread
safe.

Would be fantastic if someone could clarify this!

many thanks

Richard

Christian Vest Hansen

unread,
Apr 28, 2008, 3:38:19 AM4/28/08
to django...@googlegroups.com
On 4/28/08, rich <atki...@gmail.com> wrote:
>
> Yes, I too am at a similar level of confusion as to when django is not
> thread safe.

With the python GIL, is it even possible to create a python program
that isn't thread-safe? I thought that was the whole point of having a
GIL in the first place; make concurrency a non-issue.

But maybe mod_wsgi throws that assumption out the door. I wouldn't
know about that.


--
Venlig hilsen / Kind regards,
Christian Vest Hansen.

Jarek Zgoda

unread,
Apr 28, 2008, 3:44:12 AM4/28/08
to django...@googlegroups.com
Christian Vest Hansen napisał(a):

> On 4/28/08, rich <atki...@gmail.com> wrote:
>> Yes, I too am at a similar level of confusion as to when django is not
>> thread safe.
>
> With the python GIL, is it even possible to create a python program
> that isn't thread-safe? I thought that was the whole point of having a
> GIL in the first place; make concurrency a non-issue.

Yes, it is still possible. Create an object with global state, alter it
from different threads without locking and there you go, the state of
object is not consistent (threads can not rely on the state), you might
even get race condition. GIL protects only internal state of VM, not
your objects' state.

--
Jarek Zgoda
Skype: jzgoda | GTalk: zg...@jabber.aster.pl | voice: +48228430101

"We read Knuth so you don't have to." (Tim Peters)

Graham Dumpleton

unread,
Apr 28, 2008, 7:21:22 AM4/28/08
to Django users
On Apr 28, 5:44 pm, Jarek Zgoda <jarek.zg...@sensisoft.com> wrote:
> Christian Vest Hansen napisał(a):
>
> > On 4/28/08, rich <atkins...@gmail.com> wrote:
> >>  Yes, I too am at a similar level of confusion as to when django is not
> >>  thread safe.
>
> > With the python GIL, is it even possible to create a python program
> > that isn't thread-safe? I thought that was the whole point of having a
> > GIL in the first place; make concurrency a non-issue.
>
> Yes, it is still possible. Create an object with global state, alter it
> from different threads without locking and there you go, the state of
> object is not consistent (threads can not rely on the state), you might
> even get race condition. GIL protects only internal state of VM, not
> your objects' state.

Correct. For some background on when multithreading issues apply with
mod_wsgi, see:

http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading

In particular, for any configuration where wsgi.multithread is True,
you need to know your code is thread safe.

The document includes a brief summary at the end about building
portable applications that can deal with both multithread and
multiprocess web servers.

Graham

James Matthews

unread,
Apr 28, 2008, 4:13:06 PM4/28/08
to django...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages