How to implement background processes elegantly..

120 views
Skip to first unread message

Kimmo

unread,
May 20, 2011, 7:56:44 AM5/20/11
to web2py-users
Hi,

I am using web2py as a frontend and i have a single script
("myscript.py") doing all the extra calculation based on the uploaded
files (unzipping zip file contents to tables, deleting other info
etc.) This script contains four threads and now the problem is that
these threads are accessing the same "DAL object->connection->cursor"
simultaneously which understandable produces "Commands out of sync"
errors.

In web2py, model is executed everytime a controller function is called
and new DAL db object is created. Now if i could take this behaviour
and add it in my "myscript.py" in a way that i could get a "new db
object" for every thread i have, this problem i described above would
disappear. This suggestion might not even be possible to do, so my
main question is what would be from the design perspective and/or
practically the best way to solve this issue?


One possible(?) way is to launch four scripts instead of one (separate
every thread to its own script). This way every script gets its own
"new db object". Problem is that i need these scripts
to communicate between them and easiest way would be to have them in a
single script.

Script is executed along web2py as guided in the manual: "python
web2py.py -S app -M -N -R applications/app/private/myscript.py"

Web2py: Version 1.95.1
DB: PostgreSQL 8.3 with psycopg 2.4.1 (threading level 2, ie. threads
can share a single db connection)


I'll gladly provide any additional information if needed!

Thanks in advance!

Ross Peoples

unread,
May 20, 2011, 9:40:48 AM5/20/11
to web...@googlegroups.com
If I am reading this properly, you want to create a new instance if db for each thread? I have never tried this before, but could you do something like this for each thread?

db_thread = DAL(db._uri)
for k, v in db.items():
    db_thread[k] = v

The idea being to using the existing db's URI string to make a new DAL instance, then copying the table information from db to db_thread. Again, I don't know if this works, but it might point in the right direction unless someone else has a more elegant solution.


Iceberg

unread,
May 23, 2011, 4:23:23 AM5/23/11
to web2py-users
Or another direction. DB is generally not designed to handle
concurrency very well. So maybe Kimmo can have his script handle
concurrency by gathering data into a queue, then use a 5th thread to
dump data into db.

Regards,
Iceberg

Michele Comitini

unread,
May 23, 2011, 4:58:28 AM5/23/11
to web...@googlegroups.com
I suggest to use the multiprocessing module
http://docs.python.org/library/multiprocessing.html
and avoid threads. It is standard in python 2.6+ and you can find
backports to 2.5 also.
API is similar to Threading.

mic


2011/5/23 Iceberg <ice...@21cn.com>:

Nite

unread,
May 23, 2011, 6:10:40 PM5/23/11
to web2py-users
I'll second the use of the multiprocessing library. I started out
using cron to start up a couple independent processes, but quickly ran
into issues depending on the complexity of what I was trying to do. In
the end I settled on the mp library which has worked out well.

On May 23, 4:58 am, Michele Comitini <michele.comit...@gmail.com>
wrote:
> I suggest to use the multiprocessing modulehttp://docs.python.org/library/multiprocessing.html
> and avoid threads.  It is standard in python 2.6+ and you can find
> backports to 2.5 also.
> API is similar to Threading.
>
> mic
>
> 2011/5/23 Iceberg <iceb...@21cn.com>:

pbreit

unread,
May 23, 2011, 6:43:11 PM5/23/11
to web...@googlegroups.com
Any hints on how to do this with Web2py?

Massimo Di Pierro

unread,
May 23, 2011, 10:38:59 PM5/23/11
to web2py-users
do what?

pbreit

unread,
May 23, 2011, 10:47:21 PM5/23/11
to web...@googlegroups.com
Use the multiprocessing library (to implement background processes).

Massimo Di Pierro

unread,
May 23, 2011, 11:26:02 PM5/23/11
to web2py-users
write a normal python program using multiprocessing...

within each process do

from gluon.shell import env
globals().update(env('appname',import_models=True))

and now you have your own db, request, response, etc.

I did not try it but it should work fine.
There may be a path issue since this expects to find applications in
ther current working folder.

Massimo

Nite

unread,
May 24, 2011, 6:36:46 AM5/24/11
to web2py-users
Here's an abstraction from my code. There may be better ways to do it,
but it works.

Let's say you wanted to start a miner to watch the system log...

from subprocess import *
from multiprocessing import Process, Queue

def doSomething(self, q):
p1 = Popen([ 'tail', '-f', '/var/log/messages'), stdout=PIPE)
p2 = Popen(['grep', '-i', 'foo'], stdin=p1.stdout, stdout=PIPE)
q.put(dict(pid=p1.pid))

def startDoSomething(self):
q = Queue()
p = Process(target=doSomething, args=(q,))
p.start()
db.settings.insert(pid=q.get()['pid'],
timestamp=datetime.datetime.now())
db.commit()

The reason I did it this way is that I need to control the start/stop
of this secondary process. There will only be one instance of it
running at any given time and special permissions are needed to access
the control functions.

--Nite

(Note, this example doesn't really make sense since we aren't doing
anything with the stdout from the grep command, but illustrates the
point of how to start a secondary process which was the purpose)

On May 23, 11:26 pm, Massimo Di Pierro <massimo.dipie...@gmail.com>
wrote:

Kimmo

unread,
May 24, 2011, 2:02:50 AM5/24/11
to web2py-users

Hi,

For anyone else that might be wondering this issue,
I tried Ross Peoples solution:

db_thread = DAL(db._uri)
for k, v in db.items():
db_thread[k] = v

And it did not work properly (it did feel a bit hack anyway).

One pretty bad solution to this issue is to use threading.Lock() in
every single db query / commit etc.

I will however look into multiprocessing and on the advice from
Massimo.

Thanks to everybody that helped with this issue!

Kimmo


On 24 touko, 06:26, Massimo Di Pierro <massimo.dipie...@gmail.com>
wrote:

Massimo Di Pierro

unread,
May 24, 2011, 9:21:42 AM5/24/11
to web2py-users
What is this supposed to do? Because I do not think it does it
anyway. :-)
Reply all
Reply to author
Forward
0 new messages