I'm wondering if someone could advise me on how to do certain periodic
background tasks with django? For instance, if I needed to retrieve a
list of RSS feeds daily to check for updates how would I do that?
Is there a way to do this by resorting to a solution within the django
framework and not some OS-level solution like cron jobs on Linux? I'm
developing on Windows and would love it if the solution is
OS-independent.
Cheers,
Harish
Ah. I've always done this kind of stuff with cron - that's certainly
the gold standard for this kind of problem on Linux/Unix and
something that's well supported by Django (since Python scripts can
import and use Django models).
Hopefully someone who has actually solved this will chip in, but from
scanning around the web it seems that the equivalent in the Windows
world is "Scheduled Tasks". There's a thread here that might be
useful to you:
http://weblogs.asp.net/pmarcucci/archive/2003/10/20/32662.aspx
Cheers,
Simon
As has been mentioned, using cron (or cron-like) to schedule running a
Python script is a great solution... but there are other ways that
might work for you too.
1) Within your application, write a "view" that doesn't necessarily
view anything, but instead it does all of the functionality you need
for this repetitive task... then... Still using cron functionality,
you have some machine somewhere in the world (or on your internal LAN
with appropriate ACLs and firewalling) go to a specific URL that
executes that "view" to do the repetitive tasks.
or
2) Also within your application, write a global function that (a)
immediately checks to see if it's the first time it's invoked each day
(or whatever interval), and then, if it is the first time, (b) does
all of the repetitive tasks... This translates to: the first click
within the desired interval automatically performs the pending
repetitive tasks. Note: this is not always a good solution (but can
sometimes be fine), since you might have tasks that need to be done
every interval even if no one clicks your site, or the tasks run a
long time, etc. This can be combined with #1 above in that the URL in
#1 above would only need to be some regular URL into your application
since ANY click to your application will do the repetitive task once
each interval.
--
Glenn Tenney CISSP CISM
I've found a better solution with signals and the dispatcher. The
essence is that you raise a signal every time someone views the
relevant page, for instance, a "Recent Flickr Photos" page. In the
relevant models.py file, you'll have attached a function call to be
called on the raising of that signal. That function should check when
the last time it was run, if it was run more than X minutes ago, have
it call the relevant Flickr synchronization code, or whatever else
you want it to do.
I'm using that method on a current site and it works very well. On
top of that, it doesn't rely on OS-specific tools to work.
Maybe that would be a nice contrib app: a "fake" cron using the above
method.
Just a thought, anyways.
-Tyson
Whats wrong with OS specific tools anyway? If you are developing on
windows and deploying to unix then you can use something like cygwin to
get cron in there.
When I code in java I use the excellent quartz scheduler, but then I
have to use my home-gronw monitoring tools to ensure that its working.
When I code in Python I can use cron and let the sysadmin deal with
failed jobs, he has to watch a bunch of other crons anyway.
But although it is easier to deploy django in a single self-contained
app I would look to the OS to give you what it can. It may save you
some headaches later.
There are 2 types of scheduled processes though - ones based on clock
time (like running the credit card transactions at 11:59) and ones
based on application performance (like cleaning the cache or sessions)
For timely events (like billing) I would stick to the most reliable and
monitorable scheduling tool that you have. For other apps the signals
trick or other 'in-process' systems will work. I recently helped a
friend debug a PHP app where his in-process RSS scraper has a suble bug
that was bringing the server to its knees.
I was just having this debate with the java folk and I have seen it as
almost a cultureal thing. I *like* the fact that python sits close to
the OS, and I *like* the fact that java is completely hosting in its
own library rich VM. Then again when I develop in MS I *like* Visual
Studio and I *like* MSSQL. (When doing python or java I am free to
dislike MSSQL though)
As the previous email said:
Just a thought, anyways.
-Aaron
Not a good idea if the process takes any significant amount of time to
run (such as, for example, retrieving things of the Internet). It will
block the response back to the user.
Malcolm
You could though have a page that sends an XMLHTTPRequest to a
different view, which can do the processing. Then the user won't
experience any blocking at all.
Jay P.
Anyways I looked at the kronos scheduler (pointed out by Canen above)
being used in Turbogears and what I wanted is more like this.
So currently the way I'm using this is to put kronos.py into the django
utils directory and then importing it from there. In a particular view,
I start a threaded scheduler if it hasn't been started yet and I assign
the repetitive task to it. This way the view's response is
instantaneous even the first time because the task will be running in a
separate thread. This seems to work for me!
Anyways I think it will be useful if Django has a scheduler like this
one included by default like TurboGears (actually it does not seem to
require much effort; just have to reuse the code in kronos.py).
Cheers,
Harish