Note that this just shows the basic concept. One could get more
elaborate again by using 'prg' RewriteMap to use a separate program to
determine which process should be used rather than it being random.
You might for example choose process which is having least requests
passed through to it for existing sessions.
If anyone wants to discuss it, by all means go ahead. My expectation
at the moment is that only those who know what session stickiness is
all about might do that. In other words, I don't want to necessarily
have to go explain what it is. If you don't know, then do a Google
search on 'session affinity' or similar.
# Define multiple process groups each with a single multithreaded process.
WSGIDaemonProcess sticky01 processes=1 threads=10 display-name=%{GROUP}
WSGIDaemonProcess sticky02 processes=1 threads=10 display-name=%{GROUP}
WSGIDaemonProcess sticky03 processes=1 threads=10 display-name=%{GROUP}
WSGIDaemonProcess sticky04 processes=1 threads=10 display-name=%{GROUP}
WSGIDaemonProcess sticky05 processes=1 threads=10 display-name=%{GROUP}
# Mount our WSGI application at the sub URL of '/sticky'.
WSGIScriptAlias /sticky /Users/grahamd/Sites/sticky/site.wsgi
# Specify a rewrite map which randomly selects one of the process groups.
# The contents of the file should be:
#
# processes sticky01|sticky02|sticky03|sticky04|sticky05
#
# That is, an entry for each of the named process groups in appropriate
# format for a random rewrite map.
RewriteMap sticky rnd:/Users/grahamd/Sites/sticky/processes.txt
<Directory /Users/grahamd/Sites/sticky>
# Lots of rewrite magic to follow.
RewriteEngine On
# Extract the name of the process group from cookie sent in the request.
RewriteCond %{HTTP_COOKIE} process=([^;]+)
RewriteRule . - [E=PROCESS:%1]
# Validate the name of the process group and if not valid then set it
# to one of the process groups randomly.
RewriteCond %{ENV:PROCESS} !^sticky0[12345]$
RewriteRule . - [E=PROCESS:${sticky:processes}]
# Set the cookie so that life time of the cookie always pushed out when
# active. The cookie will expire after defined number of minutes of
# inactvity and stickiness lost. Using a stickiness timeout of 60 minutes.
RewriteRule . - [CO=process:%{ENV:PROCESS}:%{HTTP_HOST}:60:/sticky]
# For the WSGI application, indicate that process group should be selected
# based on value for the cookie or where appropriate as set randomly. As
# extra security measure, even though we validate name above, limit what
# process groups could be selected in case there might be others configured
# for same server.
WSGIRestrictProcess sticky01 sticky02 sticky03 sticky04 sticky05
WSGIProcessGroup %{ENV:PROCESS}
</Directory>
Graham
First stupid response. Did you do any research about what sticky
sessions or session affinity is intended to achieve? Can you explain
what the benefit is of having subsequent requests from same user going
back to the same process?
> Second stupid question:
> I fail to see how this can be used to cluster more then one apache
> server ?
Second stupid response. It is not intended to cluster Apache servers.
> And final question:
> Bubble drawings please :)
OoOoOoOoO vs O O O O O
:-)
Graham
Keeping session data in memory, and subsequently making it a lot
faster, is one thing you could do. Yes that would bind you to that
architecture. But there are other things that could be done which
wouldn't.
Start thinking about caching in general. Imagine that a database query
was done which was specific to a user, but that the data set was large
enough not to easily be returnable in one page. On a subsequent
request to get more of the results, if it can go back to the same
process it may be able to benefit from any caching of database results
implemented by a database client or ORM layer and not actually have to
resubmit the query. If the request could have gone to another process,
then no choice to have to do the query again.
So, there can be benefits in area of transient caching of data to
avoid database queries. This can improve performance, but also may
avoid having multiple copies of same transient cached information kept
in multiple processes for same multipart user request, thereby
reducing overall memory usage.
> 2) You can create bottle necks for some users because they are on the
> same process of the one user who is doing this select * from GIGANTIC
That would only be an issue if using single threaded daemon processes
and not multithreaded daemon processes.
> 3) Makes clustering apache servers impossible
Not everyone uses clusters of Apache servers. Where they do, they use
something like Pound or similar in front, which implements session
stickiness in dedicated proxy/load balancer. Even with such a thing in
front, probably entirely reasonable to create a two tier stickiness.
That is, Pound gets you to the correct Apache instance and the example
I gave you gets you to the actual process.
Graham
Create a new thread. I'm sure transferring the ENV would take more resources.
If you're out of threads per process on the OS level, I'm guessing you
have bigger problems than your choice of scaling techniques.
The number of threads in this case is the fixed number specified for
'threads' option to WSGIDaemonProcess. So, not exhausting OS level
limit, just the limit on number of concurrent requests you allowed
each daemon process.
So, it is just a matter of properly sizing the number of threads per
process to begin with to cope with expected number of concurrent
requests. What this may need to be will depend on various factors,
including how long requests take to be processed.
Most of the time even a few threads will be sufficient for load most
peoples applications get. People get carried away with trying to
prematurely architect a system for some perfect storm of traffic. If
they seriously are going to get huge traffic volumes, there are lots
of other techniques they should be looking at. I dare not mention them
though as then gert will believe he has to use them and so would have
to endure more and more questions and don't have time at the moment.
:-)
Graham
BTW, have you integrated use of memcached already into your system?
Graham
Ah mea culpa. Been working in CherryPy for too long which maintains
its own threadpool.
CherryPy still I believe has its own internal limit on the number of
threads in that thread pool. So it will reach that maximum before it
reaches OS limit. Thus, the only real difference is that CherryPy will
try and shutdown threads if it thinks it has too many idle in order to
try and free up the per thread stack memory. Obviously will create
them again if demand picks up. The thread pool size in Apache/mod_wsgi
is static.
Overall I am not sure the transient amount of memory saved from
reclaiming idle threads is really worth it.
Graham
Improving performance isn't all about the big queries, especially if
they are infrequent. Eliminating the very very frequent small queries,
which always yield the same result over time, is just as important.
Getting the most out of memcached is about choosing the most
appropriate thing to put in it.
Anyway, your answer suggests you haven't tried to use memcached yet.
Graham
One of the main points of memcached is that what you cache is up to
you. Therefore you could cache database results which have had post
processing done on them, or even prerendered HTML pages or snippets of
HTML pages. That way you are also saving on the cost of that
processing and rendering.
For example, for home page of very busy site holding news stories or
blog post summaries, where information doesn't have to be exactly up
to date, you may generate the HTML snippet from database query and
then cache it in memcached so all processes can get it. That would be
set to timeout after 5 minutes, at which point would be recalculated
again.
For a well traffic'd site, you have therefore avoiding every visit to
home page, which is likely to be a large percentage of traffic,
hitting the database for everything every time.
Graham
Yeah, I didn't mean to suggest it as a feature. The ability to "kill"
daemon processes after a certain length of idle time seems to fit the
use case of: not letting low traffic apps use memory when idle.
People who are looking to auto-scale a site that gets suddenly slammed
with traffic are probably better off focusing on pre-modwsgi caching
(or pre-Apache proxying/caching) than expecting modwsgi to spawn
threads and then kill them when demand ebbs.
At any rate, I'll shutup now as I was out of my league to begin with. :-)
Wasn't taking it that way. My final comment was just my own general
observation and not a reason as to why I wouldn't implement it. There
are other reasons I wouldn't do it, the main being complexity that
supporting a dynamically resizing thread pool and the need to ensure
one properly cleans up cached thread state objects and thread local
data.
> The ability to "kill"
> daemon processes after a certain length of idle time seems to fit the
> use case of: not letting low traffic apps use memory when idle.
>
> People who are looking to auto-scale a site that gets suddenly slammed
> with traffic are probably better off focusing on pre-modwsgi caching
> (or pre-Apache proxying/caching) than expecting modwsgi to spawn
> threads and then kill them when demand ebbs.
>
> At any rate, I'll shutup now as I was out of my league to begin with. :-)
Don't shut up. I don't know everything and sometimes these discussions
do throw up things I wasn't aware of. I'll then go off and research
those things and in some cases may give me ideas for how to do stuff
better in mod_wsgi.
Graham
> >> BTW, have you integrated use of memcached already into your system?No i did not try it yet, because it should be built into a good
>
> > Can memcached store big query results as efficient ?
>
> Improving performance isn't all about the big queries, especially if
> they are infrequent. Eliminating the very very frequent small queries,
> which always yield the same result over time, is just as important.
> Getting the most out of memcached is about choosing the most
> appropriate thing to put in it.
>
> Anyway, your answer suggests you haven't tried to use memcached yet.
database engine anyway that also figures out to optimize the queries
for me. My brain is not big enough to make complex wsgi
infrastructures :-)
No.
For embedded mode none of the MPMs do that AFAIK and if they did would
be having lots of problems with mod_wsgi 3.0. I'll have to double
check.
For daemon mode I implement the thread pool, so purely my call how it is done.
Graham
Hmmm, didn't even know about those.
Only added in apr-util 1.3, which mostly rules it out as still have to
deal with systems which much older apr-util than that. Ie., Apache 2.0
still uses apr-util 0.9.
Doesn't mean I cant see how it is implemented in those routines and do
something similar if I needed to.
Thanks for pointing that one out.
Graham