Sticky sessions/session affinity with mod_wsgi.

301 views
Skip to first unread message

Graham Dumpleton

unread,
May 19, 2009, 12:03:29 AM5/19/09
to mod...@googlegroups.com
Since I am not sure when I might get to blog and/or document this
properly, here is a example configuration for how to implement sticky
sessions, also referred to as session affinity, when using mod_wsgi
daemon mode.

Note that this just shows the basic concept. One could get more
elaborate again by using 'prg' RewriteMap to use a separate program to
determine which process should be used rather than it being random.
You might for example choose process which is having least requests
passed through to it for existing sessions.

If anyone wants to discuss it, by all means go ahead. My expectation
at the moment is that only those who know what session stickiness is
all about might do that. In other words, I don't want to necessarily
have to go explain what it is. If you don't know, then do a Google
search on 'session affinity' or similar.

# Define multiple process groups each with a single multithreaded process.

WSGIDaemonProcess sticky01 processes=1 threads=10 display-name=%{GROUP}
WSGIDaemonProcess sticky02 processes=1 threads=10 display-name=%{GROUP}
WSGIDaemonProcess sticky03 processes=1 threads=10 display-name=%{GROUP}
WSGIDaemonProcess sticky04 processes=1 threads=10 display-name=%{GROUP}
WSGIDaemonProcess sticky05 processes=1 threads=10 display-name=%{GROUP}

# Mount our WSGI application at the sub URL of '/sticky'.

WSGIScriptAlias /sticky /Users/grahamd/Sites/sticky/site.wsgi

# Specify a rewrite map which randomly selects one of the process groups.
# The contents of the file should be:
#
# processes sticky01|sticky02|sticky03|sticky04|sticky05
#
# That is, an entry for each of the named process groups in appropriate
# format for a random rewrite map.

RewriteMap sticky rnd:/Users/grahamd/Sites/sticky/processes.txt

<Directory /Users/grahamd/Sites/sticky>

# Lots of rewrite magic to follow.

RewriteEngine On

# Extract the name of the process group from cookie sent in the request.

RewriteCond %{HTTP_COOKIE} process=([^;]+)
RewriteRule . - [E=PROCESS:%1]

# Validate the name of the process group and if not valid then set it
# to one of the process groups randomly.

RewriteCond %{ENV:PROCESS} !^sticky0[12345]$
RewriteRule . - [E=PROCESS:${sticky:processes}]

# Set the cookie so that life time of the cookie always pushed out when
# active. The cookie will expire after defined number of minutes of
# inactvity and stickiness lost. Using a stickiness timeout of 60 minutes.

RewriteRule . - [CO=process:%{ENV:PROCESS}:%{HTTP_HOST}:60:/sticky]

# For the WSGI application, indicate that process group should be selected
# based on value for the cookie or where appropriate as set randomly. As
# extra security measure, even though we validate name above, limit what
# process groups could be selected in case there might be others configured
# for same server.

WSGIRestrictProcess sticky01 sticky02 sticky03 sticky04 sticky05
WSGIProcessGroup %{ENV:PROCESS}

</Directory>

Graham

gert

unread,
May 19, 2009, 6:24:31 PM5/19/09
to modwsgi
On May 19, 6:03 am, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:
First stupid question:
Whats wrong with WSGIDaemonProcess sticky processes=5 threads=10
display-name=%{GROUP} ?
They are identical processes, so why stick with process 1 when
processes 2 has nothing to do ?

Second stupid question:
I fail to see how this can be used to cluster more then one apache
server ?

And final question:
Bubble drawings please :)



Graham Dumpleton

unread,
May 19, 2009, 7:02:28 PM5/19/09
to mod...@googlegroups.com
2009/5/20 gert <gert.c...@gmail.com>:

First stupid response. Did you do any research about what sticky
sessions or session affinity is intended to achieve? Can you explain
what the benefit is of having subsequent requests from same user going
back to the same process?

> Second stupid question:
> I fail to see how this can be used to cluster more then one apache
> server ?

Second stupid response. It is not intended to cluster Apache servers.

> And final question:
> Bubble drawings please :)

OoOoOoOoO vs O O O O O

:-)

Graham

gert

unread,
May 19, 2009, 7:30:22 PM5/19/09
to modwsgi
On May 20, 1:02 am, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:
> 2009/5/20 gert <gert.cuyk...@gmail.com>:
The benefit would be you can store session id's into a global var
array instead of a database. What raises the question, why would you
do that ?
1) Your wsgi code, only works on servers that can do this
2) You can create bottle necks for some users because they are on the
same process of the one user who is doing this select * from GIGANTIC
3) Makes clustering apache servers impossible

Graham Dumpleton

unread,
May 19, 2009, 7:45:56 PM5/19/09
to mod...@googlegroups.com
2009/5/20 gert <gert.c...@gmail.com>:

Keeping session data in memory, and subsequently making it a lot
faster, is one thing you could do. Yes that would bind you to that
architecture. But there are other things that could be done which
wouldn't.

Start thinking about caching in general. Imagine that a database query
was done which was specific to a user, but that the data set was large
enough not to easily be returnable in one page. On a subsequent
request to get more of the results, if it can go back to the same
process it may be able to benefit from any caching of database results
implemented by a database client or ORM layer and not actually have to
resubmit the query. If the request could have gone to another process,
then no choice to have to do the query again.

So, there can be benefits in area of transient caching of data to
avoid database queries. This can improve performance, but also may
avoid having multiple copies of same transient cached information kept
in multiple processes for same multipart user request, thereby
reducing overall memory usage.

> 2) You can create bottle necks for some users because they are on the
> same process of the one user who is doing this select * from GIGANTIC

That would only be an issue if using single threaded daemon processes
and not multithreaded daemon processes.

> 3) Makes clustering apache servers impossible

Not everyone uses clusters of Apache servers. Where they do, they use
something like Pound or similar in front, which implements session
stickiness in dedicated proxy/load balancer. Even with such a thing in
front, probably entirely reasonable to create a two tier stickiness.
That is, Pound gets you to the correct Apache instance and the example
I gave you gets you to the actual process.

Graham

gert

unread,
May 19, 2009, 8:09:21 PM5/19/09
to modwsgi
On May 20, 1:45 am, Graham Dumpleton <graham.dumple...@gmail.com>
Ok you win, but only if you can show me wsgi code that can tell a
request to go to a other process in case they are no more threads with
some sort of ENV transfer session id :P

Michael Schurter

unread,
May 19, 2009, 8:12:11 PM5/19/09
to mod...@googlegroups.com
On Tue, May 19, 2009 at 5:09 PM, gert <gert.c...@gmail.com> wrote:
> Ok you win, but only if you can show me wsgi code that can tell a
> request to go to a other process in case they are no more threads with
> some sort of ENV transfer session id :P

Create a new thread. I'm sure transferring the ENV would take more resources.

If you're out of threads per process on the OS level, I'm guessing you
have bigger problems than your choice of scaling techniques.

Graham Dumpleton

unread,
May 19, 2009, 8:37:05 PM5/19/09
to mod...@googlegroups.com
2009/5/20 Michael Schurter <michael....@gmail.com>:

The number of threads in this case is the fixed number specified for
'threads' option to WSGIDaemonProcess. So, not exhausting OS level
limit, just the limit on number of concurrent requests you allowed
each daemon process.

So, it is just a matter of properly sizing the number of threads per
process to begin with to cope with expected number of concurrent
requests. What this may need to be will depend on various factors,
including how long requests take to be processed.

Most of the time even a few threads will be sufficient for load most
peoples applications get. People get carried away with trying to
prematurely architect a system for some perfect storm of traffic. If
they seriously are going to get huge traffic volumes, there are lots
of other techniques they should be looking at. I dare not mention them
though as then gert will believe he has to use them and so would have
to endure more and more questions and don't have time at the moment.
:-)

Graham

gert

unread,
May 19, 2009, 8:51:27 PM5/19/09
to modwsgi
On May 20, 2:12 am, Michael Schurter <michael.schur...@gmail.com>
wrote:
I agree, still it would prefer to find solutions, to share memorie
between processes then to make sure the same one gets picked.

Graham Dumpleton

unread,
May 19, 2009, 8:56:12 PM5/19/09
to mod...@googlegroups.com
2009/5/20 gert <gert.c...@gmail.com>:

BTW, have you integrated use of memcached already into your system?

Graham

Michael Schurter

unread,
May 19, 2009, 8:59:15 PM5/19/09
to mod...@googlegroups.com

Ah mea culpa. Been working in CherryPy for too long which maintains
its own threadpool.

gert

unread,
May 19, 2009, 9:00:19 PM5/19/09
to modwsgi


On May 20, 2:37 am, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:
> 2009/5/20 Michael Schurter <michael.schur...@gmail.com>:
I just want to add concurrent request over a time span of 60minutes
or more depending the session id gets extended :P

Graham Dumpleton

unread,
May 19, 2009, 9:07:11 PM5/19/09
to mod...@googlegroups.com

CherryPy still I believe has its own internal limit on the number of
threads in that thread pool. So it will reach that maximum before it
reaches OS limit. Thus, the only real difference is that CherryPy will
try and shutdown threads if it thinks it has too many idle in order to
try and free up the per thread stack memory. Obviously will create
them again if demand picks up. The thread pool size in Apache/mod_wsgi
is static.

Overall I am not sure the transient amount of memory saved from
reclaiming idle threads is really worth it.

Graham

gert

unread,
May 19, 2009, 9:14:57 PM5/19/09
to modwsgi
On May 20, 2:56 am, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:
> 2009/5/20 gert <gert.cuyk...@gmail.com>:
>
Can memcached store big query results as efficient ?

Graham Dumpleton

unread,
May 19, 2009, 9:21:40 PM5/19/09
to mod...@googlegroups.com
2009/5/20 gert <gert.c...@gmail.com>:

Improving performance isn't all about the big queries, especially if
they are infrequent. Eliminating the very very frequent small queries,
which always yield the same result over time, is just as important.
Getting the most out of memcached is about choosing the most
appropriate thing to put in it.

Anyway, your answer suggests you haven't tried to use memcached yet.

Graham

gert

unread,
May 19, 2009, 9:39:17 PM5/19/09
to modwsgi
On May 20, 3:21 am, Graham Dumpleton <graham.dumple...@gmail.com>
No i did not try it yet, because it should be built into a good
database engine anyway that also figures out to optimize the queries
for me. My brain is not big enough to make complex wsgi
infrastructures :-)

Graham Dumpleton

unread,
May 19, 2009, 9:50:40 PM5/19/09
to mod...@googlegroups.com
2009/5/20 gert <gert.c...@gmail.com>:

One of the main points of memcached is that what you cache is up to
you. Therefore you could cache database results which have had post
processing done on them, or even prerendered HTML pages or snippets of
HTML pages. That way you are also saving on the cost of that
processing and rendering.

For example, for home page of very busy site holding news stories or
blog post summaries, where information doesn't have to be exactly up
to date, you may generate the HTML snippet from database query and
then cache it in memcached so all processes can get it. That would be
set to timeout after 5 minutes, at which point would be recalculated
again.

For a well traffic'd site, you have therefore avoiding every visit to
home page, which is likely to be a large percentage of traffic,
hitting the database for everything every time.

Graham

Michael Schurter

unread,
May 20, 2009, 1:20:36 AM5/20/09
to mod...@googlegroups.com
On Tue, May 19, 2009 at 6:07 PM, Graham Dumpleton

Yeah, I didn't mean to suggest it as a feature. The ability to "kill"
daemon processes after a certain length of idle time seems to fit the
use case of: not letting low traffic apps use memory when idle.

People who are looking to auto-scale a site that gets suddenly slammed
with traffic are probably better off focusing on pre-modwsgi caching
(or pre-Apache proxying/caching) than expecting modwsgi to spawn
threads and then kill them when demand ebbs.

At any rate, I'll shutup now as I was out of my league to begin with. :-)

Graham Dumpleton

unread,
May 20, 2009, 1:28:31 AM5/20/09
to mod...@googlegroups.com

Wasn't taking it that way. My final comment was just my own general
observation and not a reason as to why I wouldn't implement it. There
are other reasons I wouldn't do it, the main being complexity that
supporting a dynamically resizing thread pool and the need to ensure
one properly cleans up cached thread state objects and thread local
data.

> The ability to "kill"
> daemon processes after a certain length of idle time seems to fit the
> use case of: not letting low traffic apps use memory when idle.
>
> People who are looking to auto-scale a site that gets suddenly slammed
> with traffic are probably better off focusing on pre-modwsgi caching
> (or pre-Apache proxying/caching) than expecting modwsgi to spawn
> threads and then kill them when demand ebbs.
>
> At any rate, I'll shutup now as I was out of my league to begin with.  :-)

Don't shut up. I don't know everything and sometimes these discussions
do throw up things I wasn't aware of. I'll then go off and research
those things and in some cases may give me ideas for how to do stuff
better in mod_wsgi.

Graham

Damjan

unread,
May 20, 2009, 11:55:51 AM5/20/09
to modwsgi

> >> Overall I am not sure the transient amount of memory saved from
> >> reclaiming idle threads is really worth it.
>
> > Yeah, I didn't mean to suggest it as a feature.
>
> Wasn't taking it that way. My final comment was just my own general
> observation and not a reason as to why I wouldn't implement it. There
> are other reasons I wouldn't do it, the main being complexity that
> supporting a dynamically resizing thread pool and the need to ensure
> one properly cleans up cached thread state objects and thread local
> data.

Doesn't Apache already have something like that that you could reuse?

Alec Shaner

unread,
May 20, 2009, 12:46:57 PM5/20/09
to mod...@googlegroups.com
On Tue, May 19, 2009 at 9:39 PM, gert <gert.c...@gmail.com> wrote:
> >> BTW, have you integrated use of memcached already into your system?
>
> > Can memcached store big query results as efficient ?
>
> Improving performance isn't all about the big queries, especially if
> they are infrequent. Eliminating the very very frequent small queries,
> which always yield the same result over time, is just as important.
> Getting the most out of memcached is about choosing the most
> appropriate thing to put in it.
>
> Anyway, your answer suggests you haven't tried to use memcached yet.

No i did not try it yet, because it should be built into a good
database engine anyway that also figures out to optimize the queries
for me. My brain is not big enough to make complex wsgi
infrastructures :-)

Gert,

I just wanted to point out that recently memcached did really solve a problem for me that wasn't necessarily about performance, rather arcitechture. I make heavy use of AJAX in my app and there are cases where subsequent requests rely on data generated and cached from a previous request. Before using memcached I had to switch mod_wsgi to daemon mode (which is probably a good thing anyway), and restrict it to one process with multiple threads so I could cache in python. With memcached I can now use multiple processes in mod_wsgi.  And if you're worried about complexity, memcached was up and running in no time.  Because the python-memcached module that you can use to interface with memcached uses the pickle module, it's very simple to just dump your entire object there. And it's also extremely handy that memcached lets you set an expire time on the data, so if the requester never comes back and gets that data it can just gracefully purge after a time.

I also use memcached to store a relativly large data set that only changes once per day (~20000 rows x ~15 columns) and the time it takes to retrieve it from cache is about .5 seconds compared to about 5 seconds to read from a postgresql database. However I must clarify that I've made very little attemp to optimize that database.

Graham Dumpleton

unread,
May 20, 2009, 7:35:09 PM5/20/09
to mod...@googlegroups.com
2009/5/21 Damjan <gda...@gmail.com>:

No.

For embedded mode none of the MPMs do that AFAIK and if they did would
be having lots of problems with mod_wsgi 3.0. I'll have to double
check.

For daemon mode I implement the thread pool, so purely my call how it is done.

Graham

Damjan

unread,
May 21, 2009, 7:52:58 PM5/21/09
to modwsgi
> > Doesn't Apache already have something like that that you could reuse?
>
> No.

I was thinking about something like these Thread Pool routines in APR-
UTIL
http://apr.apache.org/docs/apr-util/1.3/group___a_p_r___util___t_p.html

I thought you were using those..

but I know very little C and even less about apr/apr-util, so maybe
those are just unusable crap

Graham Dumpleton

unread,
May 21, 2009, 7:58:07 PM5/21/09
to mod...@googlegroups.com
2009/5/22 Damjan <gda...@gmail.com>:

Hmmm, didn't even know about those.

Only added in apr-util 1.3, which mostly rules it out as still have to
deal with systems which much older apr-util than that. Ie., Apache 2.0
still uses apr-util 0.9.

Doesn't mean I cant see how it is implemented in those routines and do
something similar if I needed to.

Thanks for pointing that one out.

Graham

Gloria

unread,
May 25, 2009, 11:54:39 AM5/25/09
to modwsgi
I am trying this redirect method, and I fail to get the same session
id twice.
I see the process cookie, I am even sent to the same WSGI process
twice, according to this cookie. But no matter what, I get a different
session id in the session cookie.
Do I possibly have some SSL setting conflicting with this method?
Not sure what other info you'd need to help me diagnose this, but just
ask and I'll include it.
Does mod_wsgi have debug levels?
Thanks in advance,
Gloria

On May 19, 12:03 am, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:

Gloria

unread,
May 28, 2009, 1:22:33 PM5/28/09
to modwsgi
Nevermind, this turned out to be a CherryPy bug!
http://www.cherrypy.org/changeset/2296
Thanks for your help,
Gloria
Reply all
Reply to author
Forward
0 new messages