old sessions...

41 views
Skip to first unread message

achipa

unread,
Sep 23, 2008, 6:01:47 AM9/23/08
to web2py Web Framework
Is there a way of expiring/deleting old sessions on the level of
web2py itself ? I'm deleting them from a cron task right now but I
feel web2py should also take them into account. On busy sites the
session dir really grows quick and daily purges are not elegant at
all. Sorry if this has been brought up earlier, in that case, a
pointer would be welcome...

voltron

unread,
Sep 23, 2008, 6:52:40 AM9/23/08
to web2py Web Framework
Hi Achipa,

I asked about this in the past too. I have thought about using the
python library "sched" to do some automotive tasks, not only clearing
out sesssions, I rerally have not gotten to it for lack of time.
We could team up on this if you like

achipa

unread,
Sep 23, 2008, 10:47:34 AM9/23/08
to web2py Web Framework
I'm interested as I can see this causing trouble on the long run. I
have mixed emotions about sched as it would require careful planning
and workarounds to be usable in all the environments web2py works in
(standalone, WSGI, fastcgi, multiple instances of it running
simultaneously under these, etc)... Still thinking about alternatives.

voltron

unread,
Sep 23, 2008, 10:57:18 AM9/23/08
to web2py Web Framework
Please keep me posted, I am very interested in this

Massimo Di Pierro

unread,
Sep 23, 2008, 11:10:43 AM9/23/08
to web...@googlegroups.com
There is a scripts about this in web2py/scripts folder and this
discussed in the manual.
I will also add this functionality to t2.

Massimo

achipa

unread,
Sep 23, 2008, 2:33:40 PM9/23/08
to web2py Web Framework
The potential problems I see with this is, if it's a daemon, it might
be inapropriate to run on some webhosts (who starts it and when ? what
if apache is the parent process ? what about places with script
timeouts ?). Minor issues would be that it takes up resources even
while sleeping and that it could get redundant if you're launching it
from web2py itself (imagine you have a busy site and use 50 web2py
processes under WSGI/fastcgi, if each launches it's own session2trash
it could get nasty...)

Massimo Di Pierro

unread,
Sep 23, 2008, 2:38:19 PM9/23/08
to web...@googlegroups.com
It does not have to be a deamon. All we need is a function (could be
called explicily or by the constructor of t2) that does simply

def clean(request,expiration=3600):
import os, stat, time
path=os.path.join(request.folder,'sessions')
for file in listdir(path):
filename=os.path.join(path,file)
if time.time()-os.stat(filename)
[stat.ST_MTIME]>expiration:
os.unlink(filename)

Massimo

achipa

unread,
Sep 24, 2008, 8:10:56 AM9/24/08
to web2py Web Framework
If I have hundreds of sessions active I don't want all those checks
there all the time - I'd suggest to use a lastcheck marker. So,
whenever a cleanup is done, a marker in the session dir is touched.
After that, for X minutes no login or explicit call will actually sift
through all the files until the marker is declared stale (i.e. older
then the M_TIME+expiration), at which point the first call to
encounter it would do the actual cleaning again and touch the marker.
This way only the stat of one file is checked often regardless of how
many active sessions you have.

A similar mechanism can also be used for 'soft' cron tasks (those that
not necessarily need to be run at the exact moment, or to put it
another way, running on the first next page load call is good enough).

mdipierro

unread,
Sep 24, 2008, 8:44:03 AM9/24/08
to web2py Web Framework
OK, send me the code.

achipa

unread,
Sep 25, 2008, 5:27:49 AM9/25/08
to web2py Web Framework
Take 1, this should probably be extended to handle DB based sessions,
but I have no experience with those so I'd welcome a functionally
identical patch for that part.

Index: gluon/globals.py
===================================================================
--- gluon/globals.py (revision 301)
+++ gluon/globals.py (working copy)
@@ -124,7 +124,7 @@
"""
defines the session object and the default values of its members
(None)
"""
- def
connect(self,request,response,db=None,tablename='web2py_session',masterapp=None):
+ def
connect(self,request,response,db=None,tablename='web2py_session',masterapp=None,expiration=3600):
self._unlock(response)
if not masterapp: masterapp=request.application
response.session_id_name='session_id_%s'%masterapp
@@ -147,6 +147,7 @@
response.session_id='%s.%s' %
(request.client,uuid.uuid4())

response.session_filename=os.path.join(up(request.folder),masterapp,'sessions',response.session_id)
response.session_new=True
+ self._remove_expired(request,expiration)
else:
table=db.define_table(tablename+'_'+masterapp,
db.Field('locked','boolean',default=False),
@@ -172,7 +173,26 @@
response.session_id='%s:%s' % (record_id,unique_key)

response.cookies[response.session_id_name]=response.session_id
response.cookies[response.session_id_name]['path']="/"
+ response.cookies[response.session_id_name]
['expires']=time.strftime('%a %d %b %Y %H:%M:%S GMT',
(datetime.datetime.now() + datetime.timedelta(seconds =
expiration)).timetuple())
if self.flash: response.flash, self.flash=self.flash, None
+ def _remove_expired(self, request, expiration):
+ path=os.path.join(request.folder,'sessions')
+ now=time.time()
+ marker=os.path.join(path,'last_cleaned_at')
+ try:
+ t=os.stat(filename)[stat.ST_MTIME]
+ except:
+ t=0
+ if now - t > expiration:
+ for file in os.listdir(path):
+ filename = os.path.join(path,file)
+ t = os.stat(filename)[stat.ST_MTIME]
+ if now - t > expiration and file.count(".") > 3:
+ os.unlink(filename)
+ mfile=open(marker,'wb')
+ mfile.close()
+
+
def secure(self):
self._secure=True
def forget(self):

achipa

unread,
Sep 30, 2008, 11:35:09 AM9/30/08
to web2py Web Framework
Bump... Any comments ?

mdipierro

unread,
Sep 30, 2008, 12:23:32 PM9/30/08
to web2py Web Framework
What if we add this to T2 instead?

Massimo

Timothy Farrell

unread,
Sep 30, 2008, 12:26:09 PM9/30/08
to web...@googlegroups.com
I would rather this be in web2py.

achipa

unread,
Sep 30, 2008, 3:10:37 PM9/30/08
to web2py Web Framework
Feels more like core web2py to me, as it deals with sessions as the
browser sees them, not necessarily users or any module related stuff
which you would relate to T2.

If you feel uneasy about it we can always make it an optional
parameter (like the port number) and if set to 0, would not differ in
any way compared to the current behaviour. However, I would strongly
suggest this to be a default behaviour, as from a security standpoint
you certainly do NOT want your sessions to last forever by default (as
they do now).

mdipierro

unread,
Sep 30, 2008, 4:01:31 PM9/30/08
to web2py Web Framework
The thing I am concerned about is that if a lot of sessions are
created in short time, until they expire all clients will be very slow
since they have to check all sessions. It seems to be session deletion
has to be done by a daemon, not by the threads that are serving the
users.

There are also issues with deleting sessions that may be temporarily
locked by another thread (its is opened, locked but not yet updated).
If this is done in another thread and in a try ... except it will be
safe else there may be issues.

I need to think about it some more.

Massimo

Timothy Farrell

unread,
Sep 30, 2008, 4:18:09 PM9/30/08
to web...@googlegroups.com
The best solution is for the web server have a utility thread to take
care of things like this. In the case of web2py, this falls to the
cherrypy wsgiserver component. However this functionality wouldn't be
available when using web2py in a cgi mode. Massimo, how do you feel
about including features that only work in some configurations? Let's
not slow down client threads for this.

I love making things maintenance-proof, but it's not too much to setup a
script that cron calls. I didn't know about the web2py-included script
before I wrote this one (which I still use because it's not a deamon).
I call it via window's built-in scheduler. Feel free to customize/use/sell.

----------------------
#!/usr/bin/env python
""" Cleans up yesterday's sessions. """
import os
import sys
from glob import glob
from datetime import datetime, time
print "Cleaning up old sessions:"
olddir = os.getcwd()
web2pydir = "c:/web2py/"
appsdir = os.path.join(web2pydir, 'applications')
doit = len(sys.argv) > 1 and sys.argv[1] == 'nukeem'
# Cut off time is the time after which sessions will be kept.
# in this case, it is 12:00am of today.
cutofftime = datetime.combine(datetime.now(), time(0))
os.chdir(appsdir)
#applications = glob("*")
applications = ['init', 'Formstation']
for app in applications:
sessiondir = os.path.join(appsdir, app, 'sessions')
sessions = glob(sessiondir + "/*")
for session in sessions:
accessedtime = datetime.fromtimestamp(os.stat(session).st_mtime)
if accessedtime < cutofftime:
if doit:
print "deleting %s from %s " %
(os.path.basename(session), app)
os.unlink(session)
else:
print "Would delete %s from %s " %
(os.path.basename(session)
, app)
os.chdir(olddir)

achipa

unread,
Sep 30, 2008, 4:45:26 PM9/30/08
to web2py Web Framework
You kind of lost me there... The change I posted works deals with this
exact problem, just as described above. So, regardless of how many
sessions get created or are active, the whole directory will be
checked only once every [expiration] minutes.

As for deleting locked sessions - that's why I set the expiration
cookie. And if the cookie is already deleted, the client won't be able
to request it hence web2py won't keep it locked -> it's deleteable. If
you have locking issues, being a daemon won't help you, if you're
stuck, you're stuck -> there is something wrong with the locking logic
and that's the root of the issue that needs to get fixed.

Also, daemons are a problem as most of the simpler hosting packages do
not support them (even access to cron is a chance thing to most).

mdipierro

unread,
Sep 30, 2008, 4:55:18 PM9/30/08
to web2py Web Framework
I see.... you only check one in a while for cleanup. So in principle
only some requests may see a slowdown.
Let me think about this some more.

Massimo
> ...
>
> read more »

Timothy Farrell

unread,
Sep 30, 2008, 4:59:50 PM9/30/08
to web...@googlegroups.com
Help me out here. I (obviously) haven't studied the code. Is there a
logical way that any one request will be significantly slowed by this
process? Here's what I'm thinking. I have a sessions directory with
19,000 sessions in it by day-end. These would all expire at midnight
(call them Cinderella sessions). Under your proposed system, the burden
of cleaning out 19k sessions would fall on the first poor sap to request
the login page? I thinking worst-case/perfect storm scenarios.

I would only endorse this system if it were also allowed to only process
a certain number of files per run. So the 19k would get expired over
the course of a few cleaning sessions (webpage requests). Granted the
expiration flag would be necessary to avoid accessing any expired sessions.

-timbo

achipa

unread,
Sep 30, 2008, 5:32:03 PM9/30/08
to web2py Web Framework
Take into account that on a busy site you would have the same slowdown
while the cron task is deleting the sessions, in fact I came up with
this solution exactly because I wanted to avoid file IO spikes which
cronjobs caused (not to mention that file operations tended to get
slow when sessions got piled up). If you have heavy use applications,
the cronjob needs to run more often, which means manual finetuning. If
you have many applications, the cronjobs might start at the same time
causing spikes, so again manual intervention is needed.

Note that the deletions might take place only on new sessions. This is
intentional. Whether you are typing in an address or coming through a
redirect, all of the maintenance will happen BEFORE you even see the
first page, this is the spot where the users are willing to wait the
most.

Now, as for the 'slowdowns'. I did some tests on my own, and on a very
average desktop I can clean hundreds of sessions in one second (the
most limiting factor is actually the type of the file system), so no
big deal there unless you want to run slashdot scale applications.

If you're really-really concerned about speed, the deletion itself
could be forked from the main body so it would not even slow down the
actual request (it wouldn't run completely parallel because of the
GIL, but file IO _is_ one of the things threads are good for in
python). However then you are in the domain of thousands of
simultaneous users and you have most likely already bottlenecked your
database.
> ...
>
> read more »

achipa

unread,
Sep 30, 2008, 5:42:51 PM9/30/08
to web2py Web Framework
Timothy, you can specify as a parameter how often the session dir gets
cleaned up. In the code example I sent this is one hour, but you can
change this parameter to whatever value you want. Every 10 minutes,
every 24 hours, never (=use your own cron scripts), it's all up to
you.

As for processing only a number of files... Mixed emotions on that
one, as different file systems/machines react differently to this.
However, it could be an optional parameter to bail out, why not. And I
just got the idea which will hopefully make even Massimo favor this
solution is to set a *time* limit. So we can guarantee we will never
be taking up, say, more than a second, no matter what.
> ...
>
> read more »

yarko

unread,
Sep 30, 2008, 7:31:28 PM9/30/08
to web2py Web Framework
in another thread, I made a big deal about what we call a "plugin" and
what we call a web2py-module (different than your Python modules), and
asked what defines a service?

Well - here is something calling for a service!

all the "complaints" about this kind of cleanup seem like they would
go away if this triggered a non-blocking call to a service, is that
about right?

time to define web2py services?
> > >>>>>> response.session_filename=os.path.join(up(request.folder),masterapp,'sessio ns',response.session_id)
> ...
>
> read more »

mdipierro

unread,
Sep 30, 2008, 7:38:13 PM9/30/08
to web2py Web Framework
right now you can do (on unix)

nohup python web2py.py -S yourapp -R scripts/sessions2trash.py &

Notice that sessions2trash is already in web2py. By running it with
nohup (on unix) it runs as a service. What do you it missing or needs
to be changed?

Massimo
> ...
>
> read more »

achipa

unread,
Sep 30, 2008, 7:50:49 PM9/30/08
to web2py Web Framework
Ok, I'm with you, but not still sure how that would solve my main
concerns with something that is largely asynchronous with user
activity (regardless of how we call it)

a) available under all web2py execution modes (esp with regard to
parent process specifics if they exist)
b) being able to concurrently run multiple web2py instances
c) multiplatform
d) non-invasive with regard to system services
e) runs on most webhosting platforms

Seems to me some of these points will have to give :(
> ...
>
> read more »

achipa

unread,
Oct 1, 2008, 9:18:46 AM10/1/08
to web2py Web Framework
Massimo, just out of curiosity, how would you feel if we moved this a
bit a bit higher and later... say somewhere around wsgibase stuff and
have it run AFTER the request having been processed. That way
multiplatformness/compatibility is kept, but the impact on users would
be even smaller (obviously the delay between the two requests would
still be there on single-process installs, but then again, you really
don't want one process serving thousands of simultaneous users, do
you :).
> ...
>
> read more »
Reply all
Reply to author
Forward
0 new messages