[web2py] creating background process with multiprocessing spawns new instance of web2py

272 views
Skip to first unread message

amoygard

unread,
May 19, 2010, 3:14:00 PM5/19/10
to web2py-users
Hi,

I'm pretty new to web2py and web application frameworks. I'm trying to
create a new background process in controller to handle incoming ajax
data from a user. I'm using the module multiprocessing for this.
However, when I start the new process, a new instance of web2py server
is started? I'm assuming this has something to do with forking (which
I don't know much about). Is there an easy workaround? Or should I
preferrably do this some other way, for instance with threads?

Code is as follows (controller calls function 'someinit' in a module
in module-folder):

def tekstpr(mstring,conn):
print mstring
conn.send("hello")

def someinit(userid):
from multiprocessing import Pipe, Process
parent_conn, child_conn = Pipe()
p = Process(target=tekstpr,args=("I'm alive!",child_conn))
session.procpipe = parent_conn
p.start()
session.procpipe.poll(300)
return session.procpipe.recv()

Yarko Tymciurak

unread,
May 19, 2010, 4:00:45 PM5/19/10
to web2py-users
On May 19, 2:14 pm, amoygard <amoyg...@gmail.com> wrote:
> Hi,
>
> I'm pretty new to web2py and web application frameworks. I'm trying to
> create a new background process in controller to handle incoming ajax
> data from a user.

You are trying to do too much: remember: the web is "stateless" ---
when _anything_ comes in from a client that gets directed to your URL,
(this is just an example flow, so you can get the general idea):

- somewhere, a DNS server replaces the URI name with a destination
address;
- the destination server (let's say apache) tries to sort out, and
direct to the correct app (in this case, web2py);
- web2py's main() parses the path part of the URI to decide how to
route internally
- if it finds a matching app/controller, main() will set up the
execution environment to prepare the call:
- i.e., your models are "executed" so that the table definitions,
etc. are all in the environment, and then your controller file
- the function in your controller is called with this execution
environment (this is the thread that you are trying to make, and
should not bother)

The controller function does it's stuff with the request, and returns
(to main()) the response - sometimes nothing, sometimes a dict. of
stuff;
Main takes the return values from the controller, and processes
appropriately, most often getting the view, parsing the template,
creating the resulting view, and sending it back to the client.

When an ajax call is made, you can see an example at
http://www.web2py.com/book/default/section/10/3, and explanation of
what this does.

The general point: you call a controller with a request (even in the
case of an ajax call);
The response goes back to the caller.

Lots of stuff happens for you - you don't need to write the entire web
server underpinnings; just worry about the logic you want to
implement so server and client can converse / exchange information.

Hope this helps.

Regards,
- Yarko

amoygard

unread,
May 19, 2010, 6:41:04 PM5/19/10
to web2py-users
Thanks for the answer - I was aware that I don't have to do this to
handle ajax requests in general. The application I'm building needs to
send and receive a sequence of messages from the client in a specific
order, so I thought it would be easier to handle it in one thread/
process. It is however probably better to do it statelessly as you
say, so I'll probably have to rewrite the code somewhat.

Out of curiosity though, what is the right way to start a subprocess
in web2py?
> When an ajax call is made, you can see an example athttp://www.web2py.com/book/default/section/10/3,  and explanation of

Yarko Tymciurak

unread,
May 19, 2010, 7:18:56 PM5/19/10
to web2py-users
On May 19, 5:41 pm, amoygard <amoyg...@gmail.com> wrote:
> Thanks for the answer - I was aware that I don't have to do this to
> handle ajax requests in general. The application I'm building needs to
> send and receive a sequence of messages from the client in a specific
> order, so I thought it would be easier to handle it in one thread/
> process. It is however probably better to do it statelessly as you
> say, so I'll probably have to rewrite the code somewhat.
>
> Out of curiosity though, what is the right way to start a subprocess
> in web2py?

On Wed, May 19, 2010 at 10:41 PM, amoygard <amoy...@gmail.com> wrote:
Thanks for the answer - I was aware that I don't have to do this to
handle ajax requests in general. The application I'm building needs to
send and receive a sequence of messages from the client in a specific
order, so I thought it would be easier to handle it in one thread/
process. It is however probably better to do it statelessly as you
say, so I'll probably have to rewrite the code somewhat.

Out of curiosity though, what is the right way to start a subprocess
in web2py?

In general, you don't - unless you have a long-running process that is
divorced from http requests coming in from the network (for long
running processes, there is a sort of "cron" facility).

In a hosting environment, you have apache/wsgi (for example) running a
wsgi-thred that is web2py - that (main and the stuff in gluon) is your
long-running process (er, thread). To restart web2py, with wsgi, you
would do what is normal (touch a file) to cause apache to re-start
that wsgi thread. Other than from something like a terminal on your
server, you do not do this (it would not be uncommon for this to run
year-long without a restart).

Within web2py, you have a number of threads: db connection pools, and
application threads; again, these respond to requests, and are
spawned off by web2py (not you) - Massimo, or someone who has dug into
this recently could comment on how that works - but frankly, that is
not important.

Your web app _is_ running in a stateless environment: web requests.
You have a way to "capture" some state on your own: sessions.

I only know of one framework that uses the tasklet facility of
stackless, and that's nagare --- it seems to use a tasklet to wait
for a return so that you seem to generate a view inline in your
code. In web2py, your (for example) default/index() controller is
typically called multiple times, and decides (though it's mostly
masked from you) through logic if it's an initial request, or a
response --- i.e., when you call something with a form, there is a
check by means of form.accepts() to see if the response variables are
populated, and (somewhat behind the scenes) all the stacks of
appropriate validators are run. This is why you see things like
form.accepts() success or fail conditions, and fall thru if neither
(i.e. not a response yet).

So - in general, you do not start subprocesses - with the exception of
cron. See http://www.web2py.com/book/default/section/4/17

- Yarko

Candid

unread,
May 20, 2010, 12:03:21 PM5/20/10
to web2py-users
Here is how I did it:

#in controller action
import subprocess
subprocess.Popen(r"python applications/{0}/modules/
run.py".format(request.application), shell=True)

amoygard

unread,
May 20, 2010, 12:58:30 PM5/20/10
to web2py-users
Thanks for the helpful answer!

On 20 Mai, 01:18, Yarko Tymciurak <resultsinsoftw...@gmail.com> wrote:
> On May 19, 5:41 pm, amoygard <amoyg...@gmail.com> wrote:
>
> > Thanks for the answer - I was aware that I don't have to do this to
> > handle ajax requests in general. The application I'm building needs to
> > send and receive a sequence of messages from the client in a specific
> > order, so I thought it would be easier to handle it in one thread/
> > process. It is however probably better to do it statelessly as you
> > say, so I'll probably have to rewrite the code somewhat.
>
> > Out of curiosity though, what is the right way to start a subprocess
> > in web2py?
> On Wed, May 19, 2010 at 10:41 PM, amoygard <amoyg...@gmail.com> wrote:
>
> Thanks for the answer - I was aware that I don't have to do this to
> handle ajax requests in general. The application I'm building needs to
> send and receive a sequence of messages from the client in a specific
> order, so I thought it would be easier to handle it in one thread/
> process. It is however probably better to do it statelessly as you
> say, so I'll probably have to rewrite the code somewhat.
>
> Out of curiosity though, what is the right way to start a subprocess
> in web2py?
>
> cron.   Seehttp://www.web2py.com/book/default/section/4/17

Yarko Tymciurak

unread,
May 20, 2010, 1:59:53 PM5/20/10
to web2py-users
You might also look at how services are setup: http://www.web2py.com/book/default/section/9/2

Yarko Tymciurak

unread,
May 20, 2010, 2:12:20 PM5/20/10
to web2py-users

On May 19, 6:18 pm, Yarko Tymciurak <resultsinsoftw...@gmail.com>
wrote:
> On May 19, 5:41 pm, amoygard <amoyg...@gmail.com> wrote:
>
....
>
> So - in general, you do not start subprocesses - with the exception of
> cron.   Seehttp://www.web2py.com/book/default/section/4/17

I might better have said you do not _want_ to be starting subprocesses
- besides the cost (compute time, memory, etc.), if you generally did
this. This (the inneficiency of spawning subrocesses) is why
stackless was created - and (among other things) used in a a very
busy online game. A lot of thought went into avoiding the costs of
spawning subprocesses.

If you haven't heard of it, stackless is an implementation of python
that does not use the traditional "C" stack for local variables,
etc. Among other things, it has added "tasklets" to the language, so
you can create and schedule tasks - without the overhead of doing so
in your operating system. There is a lot of discussion of benefit,
efficiency. Although there might be some discussion questioning the
approach, other alternative approaches, one thing is clear: the
motivation to stay away from creating threads / subprocesses, and the
costs involved. it might be interesting to read up on it.

- Yarko
>
> - Yarko
>

Magnitus

unread,
May 21, 2010, 4:33:38 AM5/21/10
to web2py-users
But if you create "tasks" without doing it at the OS level, doesn't
that means that you won't really be able to take full advantage of
multi-processor hardware (since the OS handles the hardware and if the
OS doesn't know about it, it won't be able to do the required
optimizations with the hardware)?

Maybe I've done C/C++ for too long and am trying to micro-manage too
much, but a solution to I like to use for the problem of creating/
tearing down process threads is just to pre-create a limited number of
them (optimised for the number of CPUs you have) and recycle them to
do various tasks as needed.

Of course, that works best when you give your threads/processes longer
tasks to perform in parallel (else, the extra cost of managing it will
probably outweight the benefits of running it in parallel).

On May 20, 2:12 pm, Yarko Tymciurak <resultsinsoftw...@gmail.com>

Yarko Tymciurak

unread,
May 21, 2010, 5:00:58 AM5/21/10
to web2py-users
On May 21, 3:33 am, Magnitus <eric_vallee2...@yahoo.ca> wrote:
> But if you create "tasks" without doing it at the OS level, doesn't
> that means that you won't really be able to take full advantage of
> multi-processor hardware (since the OS handles the hardware and if the
> OS doesn't know about it, it won't be able to do the required
> optimizations with the hardware)?

With the GIL, python itself does not utilize multiple processors, so
web2py is processor-bound (the only
effect of multi-core is that the o/s itself can "leave" a core to the
main python task, e.g.
it can grab an alternate core... other than that, you're running on
one core regardless -
unless you fire multiple instances of python interpreters, in which
case you are really only
going to communicate thru services anyway....

See some of the discussion at http://bugs.python.org/issue7946,
http://stackoverflow.com/questions/990102/python-global-interpreter-lock-gil-workaround-on-multi-core-systems-using-tasks

... and so forth...

>
> Maybe I've done C/C++ for too long and am trying to micro-manage too
> much, but a solution to I like to use for the problem of creating/
> tearing down process threads is just to pre-create a limited number of
> them (optimised for the number of CPUs you have) and recycle them to
> do various tasks as needed.

Well - since you don't have that with python, you run the risk of I/O
blocking .... which is why really lightweight
tasklets are so desireable (CCP Games runs http://en.wikipedia.org/wiki/Eve_Online
with many tens of thousands of simultaneous users, if I recall
correctly, and maintain stackless for this purpose).

>
> Of course, that works best when you give your threads/processes longer
> tasks to perform in parallel (else, the extra cost of managing it will
> probably outweight the benefits of running it in parallel).

There is much to cover in this - and I suppose reason to be happy that
python traditionally hasn't run multi-core.
See, for example, the discussions at:
http://stackoverflow.com/questions/203912/does-python-support-multiprocessor-multicore-programming

and http://docs.python.org/library/multiprocessing.html

Lots to read! ;-)

- Yarko

Graham Dumpleton

unread,
May 21, 2010, 6:06:51 AM5/21/10
to web2py-users


On May 21, 7:00 pm, Yarko Tymciurak <resultsinsoftw...@gmail.com>
wrote:
> On May 21, 3:33 am, Magnitus <eric_vallee2...@yahoo.ca> wrote:
>
> > But if you create "tasks" without doing it at the OS level, doesn't
> > that means that you won't really be able to take full advantage of
> > multi-processor hardware (since the OS handles the hardware and if the
> > OS doesn't know about it, it won't be able to do the required
> > optimizations with the hardware)?
>
> With the GIL, python itself does not utilize multiple processors, so
> web2py is processor-bound (the only
> effect of multi-core is that the o/s itself can "leave" a core to the
> main python task, e.g.
> it can grab an alternate core... other than that, you're running on
> one core regardless -
> unless you fire multiple instances of python interpreters, in which
> case you are really only
> going to communicate thru services anyway....
>
> See some of the discussion athttp://bugs.python.org/issue7946,http://stackoverflow.com/questions/990102/python-global-interpreter-l...
>
> ... and so forth...
>
>
>
> > Maybe I've done C/C++ for too long and am trying to micro-manage too
> > much, but a solution to I like to use for the problem of creating/
> > tearing down process threads is just to pre-create a limited number of
> > them (optimised for the number of CPUs you have) and recycle them to
> > do various tasks as needed.
>
> Well - since you don't have that with python, you run the risk of I/O
> blocking .... which is why really lightweight
> tasklets are so desireable (CCP Games runshttp://en.wikipedia.org/wiki/Eve_Online
> with many tens of thousands of simultaneous users, if I recall
> correctly, and maintain stackless for this purpose).
>
>
>
> > Of course, that works best when you give your threads/processes longer
> > tasks to perform in parallel (else, the extra cost of managing it will
> > probably outweight the benefits of running it in parallel).
>
> There is much to cover in this - and I suppose reason to be happy that
> python traditionally hasn't run multi-core.
> See, for example, the discussions at:http://stackoverflow.com/questions/203912/does-python-support-multipr...
>
> andhttp://docs.python.org/library/multiprocessing.html
>
> Lots to read! ;-)

Also read:

http://blog.dscpl.com.au/2007/07/web-hosting-landscape-and-modwsgi.html
http://blog.dscpl.com.au/2007/09/parallel-python-discussion-and-modwsgi.html

BTW, your prior descriptions about how web2py works under mod_wsgi
aren't overly accurate. You said:

"""
In a hosting environment, you have apache/wsgi (for example) running
a
wsgi-thred that is web2py - that (main and the stuff in gluon) is
your
long-running process (er, thread). To restart web2py, with wsgi,
you
would do what is normal (touch a file) to cause apache to re-start
that wsgi thread.

Within web2py, you have a number of threads: db connection pools,
and
application threads; again, these respond to requests, and are
spawned off by web2py (not you)
"""

When run under Apache/mod_wsgi there is not a thread that is dedicated
to web2py and web2py doesn't have its own threads to respond to
requests.

In each Apache or mod_wsgi daemon process, depending on whether you
are using embedded mode or daemon mode, there is a pool of threads.
These are C threads, not Python threads and the thread pool is managed
by Apache or mod_wsgi as appropriate.

How a connection is accepted depends on Apache MPM or mod_wsgi mode
being used, but ultimately one of the threads in the thread pool
processes the request, all still in C code. For embedded mode the
request may not even be for the WSGI application but be for a static
file or other dynamic application such as PHP. If daemon mode, or if
target of request was the WSGI application, only then does Python GIL
get acquired and the thread tries to call into the WSGI application as
an external thread calling into the embedded Python interpreter.

At this point the WSGI application may not have even been loaded, so
the first request to find that has to load the WSGI script file which
may in turn load web2py. In this case web2py doesn't do anything
special. That is, it doesn't go creating its own thread pool and it
actually must return immediately once it is loaded and initialised.
Once it returns, the thread calls into the WSGI application entry
point and web2py handles the request. Any response is thence passed
back through Apache with the GIL being released at each point where
this occurs. When complete request is done, the GIL is completely
released and thread becomes inactive again pending a further request.

If other requests occur at the same time, they could also call into
web2py. The only choke point is the initial loading of the WSGI script
as obviously only want to allow one thread to do that.

So, web2py doesn't have its own request threads and all calls come in
from a external threads managed by Apache or mod_wsgi.

Graham

Magnitus

unread,
May 21, 2010, 6:14:31 AM5/21/10
to web2py-users
Now that you mention it, I recall reading in the Python/C API that
Python wasn't really thread-safe and that Python objects shouldn't be
accessed from multiple C threads (they recommended using the Python
threading API which was exposed in the Python/C API instead, but that
didn't interest me as its not true multi-threading).

My solution to this was make my C++ code C++ only and then glue parts
that I wanted to expose to Python so that Python calls on the C++
compiled code, but not the reverse.

Hence, I probably bypassed a large part of the problematic discussed
above by not delving deeply into Python's threading API.

On May 21, 5:00 am, Yarko Tymciurak <resultsinsoftw...@gmail.com>
wrote:
> On May 21, 3:33 am, Magnitus <eric_vallee2...@yahoo.ca> wrote:
>
> > But if you create "tasks" without doing it at the OS level, doesn't
> > that means that you won't really be able to take full advantage of
> > multi-processor hardware (since the OS handles the hardware and if the
> > OS doesn't know about it, it won't be able to do the required
> > optimizations with the hardware)?
>
> With the GIL, python itself does not utilize multiple processors, so
> web2py is processor-bound (the only
> effect of multi-core is that the o/s itself can "leave" a core to the
> main python task, e.g.
> it can grab an alternate core... other than that, you're running on
> one core regardless -
> unless you fire multiple instances of python interpreters, in which
> case you are really only
> going to communicate thru services anyway....
>
> See some of the discussion athttp://bugs.python.org/issue7946,http://stackoverflow.com/questions/990102/python-global-interpreter-l...
>
> ... and so forth...
>
>
>
> > Maybe I've done C/C++ for too long and am trying to micro-manage too
> > much, but a solution to I like to use for the problem of creating/
> > tearing down process threads is just to pre-create a limited number of
> > them (optimised for the number of CPUs you have) and recycle them to
> > do various tasks as needed.
>
> Well - since you don't have that with python, you run the risk of I/O
> blocking .... which is why really lightweight
> tasklets are so desireable (CCP Games runshttp://en.wikipedia.org/wiki/Eve_Online
> with many tens of thousands of simultaneous users, if I recall
> correctly, and maintain stackless for this purpose).
>
>
>
> > Of course, that works best when you give your threads/processes longer
> > tasks to perform in parallel (else, the extra cost of managing it will
> > probably outweight the benefits of running it in parallel).
>
> There is much to cover in this - and I suppose reason to be happy that
> python traditionally hasn't run multi-core.
> See, for example, the discussions at:http://stackoverflow.com/questions/203912/does-python-support-multipr...
>
> andhttp://docs.python.org/library/multiprocessing.html
> > > > - Yarko- Hide quoted text -
>
> - Show quoted text -

Graham Dumpleton

unread,
May 21, 2010, 6:32:46 AM5/21/10
to web2py-users


On May 21, 8:14 pm, Magnitus <eric_vallee2...@yahoo.ca> wrote:
> Now that you mention it, I recall reading in the Python/C API that
> Python wasn't really thread-safe and that Python objects shouldn't be
> accessed from multiple C threads (they recommended using the Python
> threading API which was exposed in the Python/C API instead, but that
> didn't interest me as its not true multi-threading).

Python is thread safe so long as you abide by the contract of
acquiring the Python GIL. If you are going to ignore that required
contract, then obviously it will break.

This is the norm for any Python system in as much as you need to
acquire a lock when accessing a shared resource. In the Python case
there so happens to be one single global lock around any C API
accesses where as in a custom C/C++ application which is
multithreaded, you might have more fine grained locking around
specific data structures of parts of the API.

> My solution to this was make my C++ code C++ only and then glue parts
> that I wanted to expose to Python so that Python calls on the C++
> compiled code, but not the reverse.

Which is exactly what many C extension modules do. That is, they move
data between Python and C worlds so that they can then release the GIL
and process the data. That why it can still make use of multi core/
processor systems. This is why Apache/mod_wsgi isn't constrained by
the Python GIL because everything Apache itself does doesn't use the
Python GIL and can be properly parallelised.

Graham

> Hence, I probably bypassed a large part of the problematic discussed
> above by not delving deeply into Python's threading API.
>
> On May 21, 5:00 am, Yarko Tymciurak <resultsinsoftw...@gmail.com>
> wrote:
>
>
>
> > On May 21, 3:33 am, Magnitus <eric_vallee2...@yahoo.ca> wrote:
>
> > > But if you create "tasks" without doing it at the OS level, doesn't
> > > that means that you won't really be able to take full advantage of
> > > multi-processor hardware (since the OS handles the hardware and if the
> > > OS doesn't know about it, it won't be able to do the required
> > > optimizations with the hardware)?
>
> > With the GIL, python itself does not utilize multiple processors, so
> > web2py is processor-bound (the only
> > effect of multi-core is that the o/s itself can "leave" a core to the
> > main python task, e.g.
> > it can grab an alternate core... other than that, you're running on
> > one core regardless -
> > unless you fire multiple instances of python interpreters, in which
> > case you are really only
> > going to communicate thru services anyway....
>
> > See some of the discussion athttp://bugs.python.org/issue7946,http://stackoverflow.com/questions/9......

Yarko Tymciurak

unread,
May 21, 2010, 9:13:50 PM5/21/10
to web2py-users
On May 21, 5:06 am, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:
> On May 21, 7:00 pm, Yarko Tymciurak <resultsinsoftw...@gmail.com>
> wrote:

.......
> > There is much to cover in this - and I suppose reason to be happy that
> > python traditionally hasn't run multi-core.
> > See, for example, the discussions at:http://stackoverflow.com/questions/203912/does-python-support-multipr...
>
> > andhttp://docs.python.org/library/multiprocessing.html
>
> > Lots to read! ;-)
>
> Also read:
>
>  http://blog.dscpl.com.au/2007/07/web-hosting-landscape-and-modwsgi.html
>  http://blog.dscpl.com.au/2007/09/parallel-python-discussion-and-modws...
Thank you, Graham -

Your clarification is clear, and much appreciated!

Regards,
- Yarko

Magnitus

unread,
May 21, 2010, 10:22:02 PM5/21/10
to web2py-users
Yeah, thanks for the clarification about GIL, that was awesome (I read
many a textbook that was not as well written).

Made me realise that you can do some calls to the Python's C APY from
multiple C threads, but you should do so seldomly as its more
expensive in terms of interruptions in the parallelism (given that
there is a single lock on the entire API).

On May 21, 6:32 am, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:
> > > - Show quoted text -- Hide quoted text -
Reply all
Reply to author
Forward
0 new messages