mod_wsgi + multiprocessing

817 views
Skip to first unread message

Ed Summers

unread,
May 2, 2011, 9:37:02 AM5/2/11
to modwsgi
Hi all,

I asked this over on web-sig [1] earlier today, but am asking here
since it looks to only mod_wsgi related...

I've been trying to use the multiprocessing [2] w/ mod_wsgi and have
noticed what appears to be deadlocking behavior with body django and
web.py. I created a minimal example with web.py to demonstrate [3].

If you have mod_wsgi and web.py available, and and put something like
this in your apache config:

WSGIScriptAlias /multiprocessing /home/ed/wsgi_multiprocessing.py
AddType text/html .py

then visit:

http://localhost/

and compare with:

http://localhost/?multiprocessing=1

you should see the second URL hang.

Going forward I'm most likely going to move this functionality to an
asynchronous queue (celery, etc) but I was wondering if
multiprocessing + mod_wsgi was generally known to be something to
avoid, or if it was even forbidden somehow.

Any assistance you can provide would be welcome.

//Ed


[1] http://mail.python.org/pipermail/web-sig/2011-May/005065.html
[2] http://docs.python.org/library/multiprocessing.html
[3] https://gist.github.com/951570

Graham Dumpleton

unread,
May 2, 2011, 7:55:38 PM5/2/11
to mod...@googlegroups.com
Using the multiprocessing module within mod_wsgi is a really bad idea.
This is because it is an embedded system where Apache and mod_wsgi
manage processes. Once you start using multiprocessing module which
tries to do its own process management, then it could potentially
interfere with the operation of Apache/mod_wsgi in unexpected ways.

For example, taking your example and changing it not to be dependent
on web.py I get:

import multiprocessing
import os

def x(y):
print os.getpid(), 'x', y
return y

def application(environ, start_response):
status = '200 OK'
output = 'Hello World!'

response_headers = [('Content-type', 'text/plain'),
('Content-Length', str(len(output)))]
start_response(status, response_headers)

print 'create pool'
pool = multiprocessing.Pool(processes=1)
print 'map call'
result = pool.map(x, [1])
print os.getpid(), 'doit', result

return [output]

If I fire off a request to this it appears to work correctly,
returning me hello world string and log the appropriate messages.

[Tue May 03 09:40:36 2011] [info] [client 127.0.0.1] mod_wsgi
(pid=32752, process='hello-1',
application='hello-1.example.com|/mptest.wsgi'): Loading WSGI script
'/Library/WebServer/Sites/hello-1/htdocs/mptest.wsgi'.
[Tue May 03 09:40:36 2011] [error] create pool
[Tue May 03 09:40:36 2011] [error] map call
[Tue May 03 09:40:36 2011] [error] 32753 x 1
[Tue May 03 09:40:36 2011] [error] 32752 doit [1]

However, the process then appears to receive a signal from somewhere
causing it to shutdown:

[Tue May 03 09:40:36 2011] [info] mod_wsgi (pid=32752): Shutdown
requested 'hello-1'.
[Tue May 03 09:40:41 2011] [info] mod_wsgi (pid=32752): Aborting
process 'hello-1'.

The multiprocessing module does issue signals, so it may be the source of this.

One thought was that this may be occurring when the pool is destroyed
at the end of the function call, so I moved the creation of pool to
module scope.

import multiprocessing
import os

print 'create pool'
pool = multiprocessing.Pool(processes=1)

def x(y):
print os.getpid(), 'x', y
return y

def application(environ, start_response):
status = '200 OK'
output = 'Hello World!'

response_headers = [('Content-type', 'text/plain'),
('Content-Length', str(len(output)))]
start_response(status, response_headers)

print 'map call'
result = pool.map(x, [1])
print os.getpid(), 'doit', result

return [output]

This though will not even run:

[Tue May 03 09:47:31 2011] [info] [client 127.0.0.1] mod_wsgi
(pid=32893, process='hello-1',
application='hello-1.example.com|/mptest.wsgi'): Loading WSGI script
'/Library/WebServer/Sites/hello-1/htdocs/mptest.wsgi'.
[Tue May 03 09:47:31 2011] [error] create pool
[Tue May 03 09:47:31 2011] [error] map call
[Tue May 03 09:47:31 2011] [error] Process PoolWorker-1:
[Tue May 03 09:47:31 2011] [error] Traceback (most recent call last):
[Tue May 03 09:47:31 2011] [error] File
"/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/process.py",
line 231, in _bootstrap
[Tue May 03 09:47:31 2011] [error] self.run()
[Tue May 03 09:47:31 2011] [error] File
"/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/process.py",
line 88, in run
[Tue May 03 09:47:31 2011] [error] self._target(*self._args, **self._kwargs)
[Tue May 03 09:47:31 2011] [error] File
"/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/pool.py",
line 57, in worker
[Tue May 03 09:47:31 2011] [error] task = get()
[Tue May 03 09:47:31 2011] [error] File
"/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/queues.py",
line 339, in get
[Tue May 03 09:47:31 2011] [error] return recv()
[Tue May 03 09:47:31 2011] [error] AttributeError: 'module' object has
no attribute 'x'

The browser also then hangs at that point.

Part of the issue here may be that WSGI script files are not really
standard Python modules in that the basename of the WSGI script file
doesn't match a module in sys.modules. If the multiprocessing module
tries to do magic stuff with imports to find original code to execute
in sub process it isn't going to work.

Specifically, may be related to:

http://code.google.com/p/modwsgi/wiki/IssuesWithPickleModule

If I attempt to move x() into being a nested function as:

import multiprocessing
import os

print 'create pool'
pool = multiprocessing.Pool(processes=1)

def application(environ, start_response):
status = '200 OK'
output = 'Hello World!'

response_headers = [('Content-type', 'text/plain'),
('Content-Length', str(len(output)))]
start_response(status, response_headers)

def x(y):
print os.getpid(), 'x', y
return y

print 'map call'
result = pool.map(x, [1])
print os.getpid(), 'doit', result

return [output]

Then one does get pickle errors, albeit for a different reason:

[Tue May 03 09:52:59 2011] [info] [client 127.0.0.1] mod_wsgi
(pid=33010, process='hello-1',
application='hello-1.example.com|/mptest.wsgi'): Loading WSGI script
'/Library/WebServer/Sites/hello-1/htdocs/mptest.wsgi'.
[Tue May 03 09:52:59 2011] [error] create pool
[Tue May 03 09:52:59 2011] [error] map call
[Tue May 03 09:52:59 2011] [error] Exception in thread Thread-1:
[Tue May 03 09:52:59 2011] [error] Traceback (most recent call last):
[Tue May 03 09:52:59 2011] [error] File
"/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py",
line 522, in __bootstrap_inner
[Tue May 03 09:52:59 2011] [error] self.run()
[Tue May 03 09:52:59 2011] [error] File
"/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py",
line 477, in run
[Tue May 03 09:52:59 2011] [error] self.__target(*self.__args,
**self.__kwargs)
[Tue May 03 09:52:59 2011] [error] File
"/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/multiprocessing/pool.py",
line 225, in _handle_tasks
[Tue May 03 09:52:59 2011] [error] put(task)
[Tue May 03 09:52:59 2011] [error] PicklingError: Can't pickle <type
'function'>: attribute lookup __builtin__.function failed

So, it is doing pickling in some form, which isn't going to work for
stuff in WSGI script file.

If you really want to pursue this, then suggest you move this code
outside of the WSGI script file and put it in a standard module on the
Python module search path you have set up for application.

Overall though, I would recommend against using multiprocessing module
from inside of mod_wsgi.

Graham

> --
> You received this message because you are subscribed to the Google Groups "modwsgi" group.
> To post to this group, send email to mod...@googlegroups.com.
> To unsubscribe from this group, send email to modwsgi+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en.
>
>

Ed Summers

unread,
May 3, 2011, 12:17:05 AM5/3/11
to modwsgi
Thanks very much for the detailed prodding at this...and for the
general advice. I figured it was a bad idea to be using
multiprocessing in the mod_wsgi environment, so I appreciate the
confirmation.

//Ed

On May 2, 7:55 pm, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/mul tiprocessing/process.py",
> line 231, in _bootstrap
> [Tue May 03 09:47:31 2011] [error]     self.run()
> [Tue May 03 09:47:31 2011] [error]   File
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/mul tiprocessing/process.py",
> line 88, in run
> [Tue May 03 09:47:31 2011] [error]     self._target(*self._args, **self._kwargs)
> [Tue May 03 09:47:31 2011] [error]   File
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/mul tiprocessing/pool.py",
> line 57, in worker
> [Tue May 03 09:47:31 2011] [error]     task = get()
> [Tue May 03 09:47:31 2011] [error]   File
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/mul tiprocessing/queues.py",
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/thr eading.py",
> line 522, in __bootstrap_inner
> [Tue May 03 09:52:59 2011] [error]     self.run()
> [Tue May 03 09:52:59 2011] [error]   File
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/thr eading.py",
> line 477, in run
> [Tue May 03 09:52:59 2011] [error]     self.__target(*self.__args,
> **self.__kwargs)
> [Tue May 03 09:52:59 2011] [error]   File
> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/mul tiprocessing/pool.py",
> line 225, in _handle_tasks
> [Tue May 03 09:52:59 2011] [error]     put(task)
> [Tue May 03 09:52:59 2011] [error] PicklingError: Can't pickle <type
> 'function'>: attribute lookup __builtin__.function failed
>
> So, it is doing pickling in some form, which isn't going to work for
> stuff in WSGI script file.
>
> If you really want to pursue this, then suggest you move this code
> outside of the WSGI script file and put it in a standard module on the
> Python module search path you have set up for application.
>
> Overall though, I would recommend against using multiprocessing module
> from inside of mod_wsgi.
>
> Graham
>

Devesh Aggrawal

unread,
Feb 22, 2018, 6:36:30 PM2/22/18
to modwsgi
So, If I want some shared data that is accessible to all the apache processes and threads, can I use Value of multiprocessing to store those variables and acquire and release locks on them accordingly? 

Graham Dumpleton

unread,
Feb 25, 2018, 9:49:56 PM2/25/18
to mod...@googlegroups.com
As I understand it the Value and Array classes rely on being able to create it in a process from which child processes that need it are then forked. There is no point in Apache parent process where you could create the object so that it is shared across either Apache child worker processes, or mod_wsgi daemon processes.

Graham

-- 
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+u...@googlegroups.com.

To post to this group, send email to mod...@googlegroups.com.

jazz...@gmail.com

unread,
Aug 8, 2022, 11:04:01 PMAug 8
to modwsgi
Hi, 

I'm trying to speed up my python program using multiprocessing since some of it can be concurrent.

I am using Rocky Linux, Apache, mod_wsgi. I've been using this setup for years and no problem, but no multiprocessing...

What I have been doing all along is to invoke my program from the main wsgi-flask script as such:

Result = subprocess.run([python3 MainPgm.py],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
stdout_data = result.stdout

So I'm using the subprocess. 

My question is:  is it safe to add multiprocessing inside my "MainPgm"?
My tests today sure worked fine, but I notice that this is frowned upon, but I noticed:

"If you really want to pursue this, then suggest you move this code
outside of the WSGI script file and put it in a standard module on the
Python module search path you have set up for application."

^^ which seems to indicate it might work.

Thanks.

Graham Dumpleton

unread,
Aug 8, 2022, 11:11:17 PMAug 8
to mod...@googlegroups.com
Using subprocess module alone may work okay, really depends on what it is doing. For simple stuff it is probably okay, but danger is where the sub process being run has strange requirements around signals because of what it inherits from the Apache parent process by way of the signal mask. This for example causes certain Java applications to not work properly when executed via subprocess module out of mod_wsgi process as something about Java garbage collection (from memory), requires setting its own signal handlers, but they are blocked and so never execute and so Java gets stuck.

So you would really just need to try and see. For more complicated stuff, you would be better off delegating stuff to a backend task management system such as Celery.

Graham

To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+u...@googlegroups.com.

jazz...@gmail.com

unread,
Aug 9, 2022, 1:09:53 AMAug 9
to modwsgi

In Python... It's just reading from a database a little, minor updates, then some read-only models for AI, no network I/O.  When I ran experiments it fired up and used the pipes fine, no problems I could see, and I ran two calls concurrently.

Thanks Graham!

Graham Dumpleton

unread,
Aug 9, 2022, 1:15:44 AMAug 9
to mod...@googlegroups.com
Just be mindful of what will happen if a database operation takes a long time and holds some sort of lock. More requests may come into the web application, and if every one of these is creating a sub process, but then get stuck waiting for the first, then you could spike out memory usage for the system as a whole.

This is the benefit of using a task queuing system as it can queue up requests and give you a point of control for how many can run concurrently.

Also ensure that you are waiting on the sub processes if necessary and getting back any exit status. If you don't do this they can become zombie processes, which although dead, still can consume memory in kernel process table. So not being mindful of that and letting the number of zombie processes grow indefinitely is not a good idea.

Anyway, just look out for issues like that.

Graham

jazz...@gmail.com

unread,
Aug 9, 2022, 6:55:59 PMAug 9
to modwsgi
These are great points. Thanks Graham!!!

I did run some experiments and I do have a database lock in place (get for UPDATE in Mysql seems to act as a lock in pymysql connector), so as you note requests could pile up. However, the subprocess is apparently not invoked in my main wsgi python program (with the api.add_resource type statements) until a successful get-unique-key from MySql works. So I don't think I will pile up a mass of subprocesses, at least.

I just noticed my code does not presently check the return status properly after the subprocess completes. So yes, I would need to be doing that. Good point, check exit status.

Could you please recommend some reading for how to properly configure a queuing system?

Mike

Graham Dumpleton

unread,
Aug 9, 2022, 9:28:21 PMAug 9
to mod...@googlegroups.com
All I can do is point you at Celery and Redis Queue. Am not sure what other options there are for Python for task queuing systems these days.


Graham

Reply all
Reply to author
Forward
0 new messages