Creating a background batch processing thread

9 views
Skip to first unread message

Richard Clark

unread,
Jun 15, 2006, 6:56:11 AM6/15/06
to TurboGears
Basically a short writeup of some interesting stuff I've had to do for
a project I've been working on recently. If people find this
interesting, I'll see if I can post some actual code listings.

Creating a background batch processing thread.


Our objective here is to have a single thread running in the background
that performs operations that may run longer than we might want to have
the user sit around for. There are a couple of possible areas for this,
certain type sof credit card processing, large batch operations,
interfaces with (potentially unavailable) external apis etc.

The general pattern works like this:

First up, create a module to place your batch processing stuff in. I'll
call mine "processor" (processor.py in your standard turbogears app
dir)

Within this, we need three things. We need a the thread function
itself, we need a singleton to allow us to identify and share resources
with the main thread, and we need a startup function to make it easy to
start the thread up.

First up, the singleton, this is what ties everything together and
makes it possible to provide notification to the thread.

class Singleton(object):
def __init__(self):
self.event = threading.Event()
self.thread = None
self.event.set()

singleton = Singleton()

Lets start with a really stupid processor:

def processingThread():
""" Thread to process domains """
while True:
singleton.event.wait(30) # Wait up to 30 seconds for a
new event
log.debug("Doing processing")

And finally, the startup:

def start():
if singleton.thread and singleton.thread.isAlive():
""" Bail out if we're already running """
return

log.debug("Starting processing")
singleton.thread = threading.Thread(target=processingThread)
singleton.thread.setDaemon(True) # Make sure the thread shuts
down when the main program does
singleton.thread.start()


Ok, that's almost it really, we just need to tie it into our program.
First up, in controllers.py we want to start() it on startup:

import processor
import turbogears

def myStartup():
processor.start()

turbogears.startup.call_on_startup.append(myStartup)

Adding it to call_on_startup means we'll get called whenever turbgoears
starts up (duh).

Now what we have is a processing thread being launched that will wait
up to 30 seconds for a new event, then try processing anyway. At the
moment it doesn't do anything, lets modify the thread a bit:

def processingThread():
""" Thread to change creditcards from waiting to paid """
# This part is important, it gives us a new connection for this
# thread, otherwise your thread will get messy db access
from sqlobject.util.threadinglocal import local as
threading_local
hub.threadingLocal = threading_local()

while True:
singleton.event.wait(300)
log.debug("Processing creditcards")
toCharge = Creditcard.selectBy(status="waiting")
for creditcard in toCharge:
hub.begin() # Begin transaction
log.debug("Processing creditcard %d" %
creditcard.id
creditcard.status = "paid"
hub.commit()

Ok! Now we have a (kinda) useful thing. Once every 5 minutes it'll
process the creditcard table looking for CC's that are waiting to be
charged, and it'll mark them paid.

But wait, 5 minutes is annoying. That's what that event thing is for.
The 5 minutes is a safety mark, our cleanup point.

What you should do, is any time you've added new creditcards in there
as waiting, simply do:

import processor
processor.singleton.event.set()

This will set the event, and cause the wait(300) to break out early.
That way it'll most likely be processed immediately, but worst case
it'll happen in 5 minutes.

Even better, with this mechanism (rather than, say, using scheduler),
you don't run the risk of having the processor execute simultaneously,
if it takes longer than 5 minutes to process the creditcards, it simply
won't get back around to waiting. When it does, if there are any new
events, it'll go check again.

Other uses include things like a thread which maintains a jabber
connection and lets you log messages into a conference (done that, good
fun :) multiplayer logs with comments!).

Lee McFadden

unread,
Jun 15, 2006, 8:25:13 AM6/15/06
to turbo...@googlegroups.com
Very tasty Richard. I think I may use this in Arkivo to control the
IRC bot entirely via TurboGears.

Lee


--
Lee McFadden

blog: http://www.splee.co.uk
work: http://fireflisystems.com

Kevin Dangoor

unread,
Jun 15, 2006, 10:13:45 AM6/15/06
to turbo...@googlegroups.com
This is a nice recipe. I'll add it to the docs, if that's okay with you.


--
Kevin Dangoor
TurboGears / Zesty News

email: k...@blazingthings.com
company: http://www.BlazingThings.com
blog: http://www.BlueSkyOnMars.com

Eric Larson

unread,
Jun 15, 2006, 2:30:43 PM6/15/06
to turbo...@googlegroups.com
+1 on the addtion to the docs. Great Post!

Richard Clark

unread,
Jun 15, 2006, 7:51:56 PM6/15/06
to TurboGears
Be my guest :)

Jorge Vargas

unread,
Jun 16, 2006, 10:27:29 AM6/16/06
to turbo...@googlegroups.com
On 6/15/06, Richard Clark <richar...@gmail.com> wrote:

Basically a short writeup of some interesting stuff I've had to do for
a project I've been working on recently. If people find this
interesting, I'll see if I can post some actual code listings.

Creating a background batch processing thread.

this is nice thanks

Even better, with this mechanism (rather than, say, using scheduler),
you don't run the risk of having the processor execute simultaneously,
if it takes longer than 5 minutes to process the creditcards, it simply
won't get back around to waiting. When it does, if there are any new
events, it'll go check again.

I had exactly that problem but my time frame was much smaller. Have you try this in a more intensive use, like ones every 5sec?

Other uses include things like a thread which maintains a jabber
connection and lets you log messages into a conference (done that, good
fun :) multiplayer logs with comments!).

yes lots of interesting stuff can be done with this and the scheduler module.

Richard Clark

unread,
Jun 16, 2006, 7:53:46 PM6/16/06
to TurboGears
> I had exactly that problem but my time frame was much smaller. Have you try
this in a more intensive use, like ones every 5sec?

This should work no problem. Reducing the timeout value will have only
a linear impact on your resource usage, ie 5 seconds will do twice as
many checks as 10 seconds. But you will never run into a problem where
the thread "restarts" over itself or anything like that. It processes
everything it can, one batch at a time.

The timeout is not actually strictly necessary at all. Assuming you do:

processor.event.set()

Whenever you have something for it to process, it should happen
immediately. You could set it to wait indefinitely and it'll still
process whenever you have data. I set a timeout because I'm paranoid,
not because it's necessary.

bon...@gmail.com

unread,
Jun 16, 2006, 8:23:54 PM6/16/06
to TurboGears

Richard Clark wrote:
> The timeout is not actually strictly necessary at all. Assuming you do:
>
> processor.event.set()
>
> Whenever you have something for it to process, it should happen
> immediately. You could set it to wait indefinitely and it'll still
> process whenever you have data. I set a timeout because I'm paranoid,
> not because it's necessary.

Very nice, it reminds me the job queue/batch stuff I once worked on
those big irons

Andrew Grover

unread,
Jul 4, 2006, 7:30:00 AM7/4/06
to turbo...@googlegroups.com
On 6/16/06, Richard Clark <richar...@gmail.com> wrote:
> The timeout is not actually strictly necessary at all. Assuming you do:
>
> processor.event.set()
>
> Whenever you have something for it to process, it should happen
> immediately. You could set it to wait indefinitely and it'll still
> process whenever you have data. I set a timeout because I'm paranoid,
> not because it's necessary.

I implemented your simple example and I got my periodic function being
called insanely fast :) I don't see any mention of Event being
auto-clearing so I think this a call to clear() is needed somewhere.

Regards -- Andy

Richard Clark

unread,
Jul 4, 2006, 7:40:24 PM7/4/06
to TurboGears
Ah yes, you're right, you should clear it immediatey after the wait()

Reply all
Reply to author
Forward
0 new messages