I am building some computational web services using soaplib. This creates a WSGI application.
However, since some of these services are computationally intensive, and may be long running, I was looking for a way to use multiple processes. I thought about using multiprocessing.Process manually in the service, but I was a bit worried about how that might interact with a threaded server (I was hoping the thread serving that request could just wait until the child is finished). Also it would be good to keep the services as simple as possible so it's easier for people to write them.
I have at the moment the following WSGI structure: TransLogger(URLMap(URLParser(soaplib objects))) although presumably, due to the beauty of WSGI, this shouldn't matter.
As I've found with all web-related Python stuff, I'm overwhelmed by the choice and number of alternatives. I've so far been using cherrypy and ajp-wsgi for my testing, but am aware of Spawning, twisted etc. What would be the simplest [quickest to setup and fewest details of the server required - ideally with a simple example] and most reliable [this will eventually be 'in production' as part of a large scientific project] way to host this sort of WSGI with a process-per-request style?
> I am building some computational web services using soaplib. This > creates a WSGI application.
> However, since some of these services are computationally intensive, > and may be long running, I was looking for a way to use multiple > processes. I thought about using multiprocessing.Process manually in > the service, but I was a bit worried about how that might interact > with a threaded server (I was hoping the thread serving that request > could just wait until the child is finished). Also it would be good to > keep the services as simple as possible so it's easier for people to > write them.
> I have at the moment the following WSGI structure: > TransLogger(URLMap(URLParser(soaplib objects))) > although presumably, due to the beauty of WSGI, this shouldn't matter.
> As I've found with all web-related Python stuff, I'm overwhelmed by > the choice and number of alternatives. I've so far been using cherrypy > and ajp-wsgi for my testing, but am aware of Spawning, twisted etc. > What would be the simplest [quickest to setup and fewest details of > the server required - ideally with a simple example] and most reliable > [this will eventually be 'in production' as part of a large scientific > project] way to host this sort of WSGI with a process-per-request > style?
>......
We've used forked fastcgi (flup) with success as that decouples the wsgi process (in our case django) from the main server (in our case apache). Our reasons for doing that were to allow the backend to use modern pythons without having to upgrade the server (which is required if using say mod_python). The wsgi process runs as an ordinary user which eases some tasks.
A disadvantage of our scheme is that long running processes may cause problems eg timeouts. In practice since there are no guarantees for how long an http connection will hold up (because of proxies etc etc) we decided to work around this problem. Basically long running jobs go into a task queue on the server and the response is used to reconnect to the long running job peridically for status querying/results etc etc. -- Robin Becker
On Feb 11, 12:10 pm, Robin Becker <ro...@reportlab.com> wrote:
> We've used forked fastcgi (flup) with success as that decouples the wsgi process > (in our case django) from the main server (in our case apache). Our reasons for > doing that were to allow the backend to use modern pythons without having to > upgrade the server (which is required if using say mod_python). The wsgi process > runs as an ordinary user which eases some tasks.
Yes - I've done something very similar with ajp-wsgi (from the author of flup; and which incidently performs very well works really nicely) to go from apache -> wsgi. But the issue I'm asking about here is to have multiple WSGI processes - ie to allow concurrent execution of more than one web service at the time (since these are long running computational soap web services). ajp-wsgi embeds a single python interpreter so multiple running services would be effected by the GIL - I imagine flup is similar (a single process on the python side).
So I'm not worried about decoupling from the web server - I'm happy to use pure python server (which I guess is easier to setup) - but I want the web server to dispatch requests to different processes running the wsgi app. I've looked at Spawning, but couldn't get it to work and it seems a little bit 'beta' for my taste (doesn't exit cleanly, leaves worker processes running etc.)
On Feb 11, 1:28 pm, Robin <robi...@gmail.com> wrote:
> On Feb 11, 12:10 pm, Robin Becker <ro...@reportlab.com> wrote:
> > We've used forked fastcgi (flup) with success as that decouples the wsgi process > > (in our case django) from the main server (in our case apache). Our reasons for > > doing that were to allow the backend to use modern pythons without having to > > upgrade the server (which is required if using say mod_python). The wsgi process > > runs as an ordinary user which eases some tasks.
I'm sorry - I originally missed the worked 'forked' and hence the whole point of your message I think.
I looked at flup before but had forgotten about the forked version. Having revisited it I think the forked version does keep a process pool so each request is processed by a seperate process, which is exactly what I wanted.
Robin wrote: > On Feb 11, 1:28 pm, Robin <robi...@gmail.com> wrote: >> On Feb 11, 12:10 pm, Robin Becker <ro...@reportlab.com> wrote:
>> > We've used forked fastcgi (flup) with success as that decouples the >> > wsgi process (in our case django) from the main server (in our case >> > apache). Our reasons for doing that were to allow the backend to use >> > modern pythons without having to upgrade the server (which is required >> > if using say mod_python). The wsgi process runs as an ordinary user >> > which eases some tasks.
> I'm sorry - I originally missed the worked 'forked' and hence the > whole point of your message I think.
> I looked at flup before but had forgotten about the forked version. > Having revisited it I think the forked version does keep a process > pool so each request is processed by a seperate process, which is > exactly what I wanted.
You can have that with mod_wsgi & daemon mode as well, with presumably less setup hassle.
> Yes - I've done something very similar with ajp-wsgi (from the author > of flup; and which incidently performs very well works really nicely) > to go from apache -> wsgi. But the issue I'm asking about here is to > have multiple WSGI processes - ie to allow concurrent execution of > more than one web service at the time (since these are long running > computational soap web services). ajp-wsgi embeds a single python > interpreter so multiple running services would be effected by the GIL > - I imagine flup is similar (a single process on the python side).
> So I'm not worried about decoupling from the web server - I'm happy to > use pure python server (which I guess is easier to setup) - but I want > the web server to dispatch requests to different processes running the > wsgi app. I've looked at Spawning, but couldn't get it to work and it > seems a little bit 'beta' for my taste (doesn't exit cleanly, leaves > worker processes running etc.)
well the flup server for fast cgi supports forking if the server is declared as an external process in apache. Then the top level of the flup process handles each request and passes it off to a forked worker. I cannot recall exactly, but I believe that apache mod_fastcgi does the right thing when it comes to internally declared fastcgi handlers. For apache at least I think the threading issues are handled properly.
I think the preforkserver.py code handles all the threading issues for you (assuming it's not win32). -- Robin Becker
>> On Feb 11, 1:28 pm, Robin <robi...@gmail.com> wrote: >>> On Feb 11, 12:10 pm, Robin Becker <ro...@reportlab.com> wrote:
>>>> We've used forked fastcgi (flup) with success as that decouples the >>>> wsgi process (in our case django) from the main server (in our case >>>> apache). Our reasons for doing that were to allow the backend to use >>>> modern pythons without having to upgrade the server (which is required >>>> if using say mod_python). The wsgi process runs as an ordinary user >>>> which eases some tasks. >> I'm sorry - I originally missed the worked 'forked' and hence the >> whole point of your message I think.
>> I looked at flup before but had forgotten about the forked version. >> Having revisited it I think the forked version does keep a process >> pool so each request is processed by a seperate process, which is >> exactly what I wanted.
> You can have that with mod_wsgi & daemon mode as well, with presumably less > setup hassle.
Another option that works well on Unix and even Windows is SCGI which deals with the forking and piping of data for you:
On Feb 11, 3:46 pm, Robin Becker <ro...@reportlab.com> wrote:
> well the flup server for fast cgi supports forking if the server is declared as > an external process in apache. Then the top level of the flup process handles > each request and passes it off to a forked worker. I cannot recall exactly, but > I believe that apache mod_fastcgi does the right thing when it comes to > internally declared fastcgi handlers. For apache at least I think the threading > issues are handled properly.
> I think the preforkserver.py code handles all the threading issues for you > (assuming it's not win32).
Thanks - I think if I go the flup route I would use AJP though - since its very easy to setup with apache (1 proxy line) and mod_ajp comes as standard. And then everything is very much seperated from the apache process.
Robin wrote: > On Feb 11, 3:46 pm, Robin Becker <ro...@reportlab.com> wrote: >> well the flup server for fast cgi supports forking if the server is declared as >> an external process in apache. Then the top level of the flup process handles >> each request and passes it off to a forked worker. I cannot recall exactly, but >> I believe that apache mod_fastcgi does the right thing when it comes to >> internally declared fastcgi handlers. For apache at least I think the threading >> issues are handled properly.
>> I think the preforkserver.py code handles all the threading issues for you >> (assuming it's not win32).
> Thanks - I think if I go the flup route I would use AJP though - since > its very easy to setup with apache (1 proxy line) and mod_ajp comes as > standard. And then everything is very much seperated from the apache > process.
.......
that's right and very easy to control. The only problem I recall is that the socket needs to be made readable by www. You can do that with a sudo chown or by setting up the mask at the ajp server start. -- Robin Becker
> I am building some computational web services using soaplib. This > creates a WSGI application.
> However, since some of these services are computationally intensive, > and may be long running, I was looking for a way to use multiple > processes. I thought about using multiprocessing.Process manually in > the service, but I was a bit worried about how that might interact > with a threaded server (I was hoping the thread serving that request > could just wait until the child is finished). Also it would be good to > keep the services as simple as possible so it's easier for people to > write them.
> I have at the moment the following WSGI structure: > TransLogger(URLMap(URLParser(soaplib objects))) > although presumably, due to the beauty of WSGI, this shouldn't matter.
> As I've found with all web-related Python stuff, I'm overwhelmed by > the choice and number of alternatives. I've so far been using cherrypy > and ajp-wsgi for my testing, but am aware of Spawning, twisted etc. > What would be the simplest [quickest to setup and fewest details of > the server required - ideally with a simple example] and most reliable > [this will eventually be 'in production' as part of a large scientific > project] way to host this sort of WSGI with a process-per-request > style?
In this sort of situation one wouldn't normally do the work in the main web server, but have a separarte long running daemon process embedding mini web server that understands XML-RPC. The main web server would then make XML-RPC requests against the backend daemon process, which would use threading and or queueing to handle the requests.
If the work is indeed long running, the backend process would normally just acknowledge the request and not wait. The web page would return and it would be up to user to then somehow occassionally poll web server, manually or by AJAX, to see how progres is going. That is, further XML-RPC requests from main server to backend daemon process asking about progress.
I do't believe the suggestions about fastcgi/scgi/ajp/flup or mod_wsgi are really appropriate as you don't want this done in web server processes as then you are at mercy of web server processes being killed or dying when part way through something. Some of these systems will do this if requests take too long. Thus better to offload real work to another process.
2009/2/12 alex goretoy <aleksandr.gore...@gmail.com>:
> GAE (Google App Engine) uses WSGI for webapps. You don't have to overhead of > managing a server and all it's services this way as well. Just manage dns > entries. Although, there are limitations depending on your project needs of > what libs you need to use.
GAE is not suitable as they kill off any requests that take more than a set time. That time isn't that long, so can't support long running requests.
> On Wed, Feb 11, 2009 at 1:59 PM, Graham Dumpleton > <Graham.Dumple...@gmail.com> wrote:
>> On Feb 11, 8:50 pm, Robin <robi...@gmail.com> wrote: >> > Hi,
>> > I am building some computational web services using soaplib. This >> > creates a WSGI application.
>> > However, since some of these services are computationally intensive, >> > and may be long running, I was looking for a way to use multiple >> > processes. I thought about using multiprocessing.Process manually in >> > the service, but I was a bit worried about how that might interact >> > with a threaded server (I was hoping the thread serving that request >> > could just wait until the child is finished). Also it would be good to >> > keep the services as simple as possible so it's easier for people to >> > write them.
>> > I have at the moment the following WSGI structure: >> > TransLogger(URLMap(URLParser(soaplib objects))) >> > although presumably, due to the beauty of WSGI, this shouldn't matter.
>> > As I've found with all web-related Python stuff, I'm overwhelmed by >> > the choice and number of alternatives. I've so far been using cherrypy >> > and ajp-wsgi for my testing, but am aware of Spawning, twisted etc. >> > What would be the simplest [quickest to setup and fewest details of >> > the server required - ideally with a simple example] and most reliable >> > [this will eventually be 'in production' as part of a large scientific >> > project] way to host this sort of WSGI with a process-per-request >> > style?
>> In this sort of situation one wouldn't normally do the work in the >> main web server, but have a separarte long running daemon process >> embedding mini web server that understands XML-RPC. The main web >> server would then make XML-RPC requests against the backend daemon >> process, which would use threading and or queueing to handle the >> requests.
>> If the work is indeed long running, the backend process would normally >> just acknowledge the request and not wait. The web page would return >> and it would be up to user to then somehow occassionally poll web >> server, manually or by AJAX, to see how progres is going. That is, >> further XML-RPC requests from main server to backend daemon process >> asking about progress.
>> I do't believe the suggestions about fastcgi/scgi/ajp/flup or mod_wsgi >> are really appropriate as you don't want this done in web server >> processes as then you are at mercy of web server processes being >> killed or dying when part way through something. Some of these systems >> will do this if requests take too long. Thus better to offload real >> work to another process.
> On Feb 11, 8:50 pm, Robin <robi...@gmail.com> wrote:
> > Hi,
> > I am building some computational web services using soaplib. This > > creates a WSGI application.
> > However, since some of these services are computationally intensive, > > and may be long running, I was looking for a way to use multiple > > processes. I thought about using multiprocessing.Process manually in > > the service, but I was a bit worried about how that might interact > > with a threaded server (I was hoping the thread serving that request > > could just wait until the child is finished). Also it would be good to > > keep the services as simple as possible so it's easier for people to > > write them.
> > I have at the moment the following WSGI structure: > > TransLogger(URLMap(URLParser(soaplib objects))) > > although presumably, due to the beauty of WSGI, this shouldn't matter.
> > As I've found with all web-related Python stuff, I'm overwhelmed by > > the choice and number of alternatives. I've so far been using cherrypy > > and ajp-wsgi for my testing, but am aware of Spawning, twisted etc. > > What would be the simplest [quickest to setup and fewest details of > > the server required - ideally with a simple example] and most reliable > > [this will eventually be 'in production' as part of a large scientific > > project] way to host this sort of WSGI with a process-per-request > > style?
> In this sort of situation one wouldn't normally do the work in the > main web server, but have a separarte long running daemon process > embedding mini web server that understands XML-RPC. The main web > server would then make XML-RPC requests against the backend daemon > process, which would use threading and or queueing to handle the > requests.
> If the work is indeed long running, the backend process would normally > just acknowledge the request and not wait. The web page would return > and it would be up to user to then somehow occassionally poll web > server, manually or by AJAX, to see how progres is going. That is, > further XML-RPC requests from main server to backend daemon process > asking about progress.
> I do't believe the suggestions about fastcgi/scgi/ajp/flup or mod_wsgi > are really appropriate as you don't want this done in web server > processes as then you are at mercy of web server processes being > killed or dying when part way through something. Some of these systems > will do this if requests take too long. Thus better to offload real > work to another process.
Thanks - in this case I am contrained to use SOAP (I am providing SOAP services using soaplib so they run as a WSGI app). I choose soaplib becuase it seems the simplest way to get soap services running in Python (I was hoping to get this setup quickly). So I am not really able to get into anything more complex as you suggest... I have my nice easy WSGI app soap service, I would just like it to run in a process pool to avoid GIL. Turns out I can do that with apache+mod_wsgi and daemon mode, or flup forked server (I would probably use ajp - so flup is in a seperate process to apache and listens on some local port, and apache proxies to that using the ajp protocol). I'm not sure which one is best... for now I'm continuing to just develop on cherrypy on my own machine.
I suspect I will use ajp forked flup, since that only requires mod_proxy and mod_proxy_ajp which I understand come with standard apache and the system administrators will probably be happier with.
On Feb 11, 9:32 pm, Graham Dumpleton <graham.dumple...@gmail.com> wrote:
> 2009/2/12 alex goretoy <aleksandr.gore...@gmail.com>:
> > GAE (Google App Engine) uses WSGI for webapps. You don't have to overhead of > > managing a server and all it's services this way as well. Just manage dns > > entries. Although, there are limitations depending on your project needs of > > what libs you need to use.
> GAE is not suitable as they kill off any requests that take more than > a set time. That time isn't that long, so can't support long running > requests.
GAE is definitely not suitable in this case... The servers are provided and maintained as part of a large scientific project for which I am providing just a few services... Other groups are running services in other platforms on tomcat through soaplab/instantsoap - but I was hoping to use native python services since I thought it would be easier.
> On Feb 11, 7:59 pm, Graham Dumpleton <Graham.Dumple...@gmail.com> > wrote:
> > On Feb 11, 8:50 pm, Robin <robi...@gmail.com> wrote:
> > > Hi,
> > > I am building some computational web services using soaplib. This > > > creates a WSGI application.
> > > However, since some of these services are computationally intensive, > > > and may be long running, I was looking for a way to use multiple > > > processes. I thought about using multiprocessing.Process manually in > > > the service, but I was a bit worried about how that might interact > > > with a threaded server (I was hoping the thread serving that request > > > could just wait until the child is finished). Also it would be good to > > > keep the services as simple as possible so it's easier for people to > > > write them.
> > > I have at the moment the following WSGI structure: > > > TransLogger(URLMap(URLParser(soaplib objects))) > > > although presumably, due to the beauty of WSGI, this shouldn't matter.
> > > As I've found with all web-related Python stuff, I'm overwhelmed by > > > the choice and number of alternatives. I've so far been using cherrypy > > > and ajp-wsgi for my testing, but am aware of Spawning, twisted etc. > > > What would be the simplest [quickest to setup and fewest details of > > > the server required - ideally with a simple example] and most reliable > > > [this will eventually be 'in production' as part of a large scientific > > > project] way to host this sort of WSGI with a process-per-request > > > style?
> > In this sort of situation one wouldn't normally do the work in the > > main web server, but have a separarte long running daemon process > > embedding mini web server that understands XML-RPC. The main web > > server would then make XML-RPC requests against the backend daemon > > process, which would use threading and or queueing to handle the > > requests.
> > If the work is indeed long running, the backend process would normally > > just acknowledge the request and not wait. The web page would return > > and it would be up to user to then somehow occassionally poll web > > server, manually or by AJAX, to see how progres is going. That is, > > further XML-RPC requests from main server to backend daemon process > > asking about progress.
> > I do't believe the suggestions about fastcgi/scgi/ajp/flup or mod_wsgi > > are really appropriate as you don't want this done in web server > > processes as then you are at mercy of web server processes being > > killed or dying when part way through something. Some of these systems > > will do this if requests take too long. Thus better to offload real > > work to another process.
> Thanks - in this case I am contrained to use SOAP (I am providing SOAP > services using soaplib so they run as a WSGI app). I choose soaplib > becuase it seems the simplest way to get soap services running in > Python (I was hoping to get this setup quickly). > So I am not really able to get into anything more complex as you > suggest... I have my nice easy WSGI app soap service, I would just > like it to run in a process pool to avoid GIL.
You can still use SOAP, you don't have to use XML-RPC, they are after all just an interprocess communications mechanism.
> Turns out I can do that > with apache+mod_wsgi and daemon mode, or flup forked server (I would > probably use ajp - so flup is in a seperate process to apache and > listens on some local port, and apache proxies to that using the ajp > protocol). I'm not sure which one is best... for now I'm continuing to > just develop on cherrypy on my own machine.
In mod_wsgi daemon mode the application is still in a distinct process. The only dfference is that Apache is acting as the process supervisor and you do not have to install a separate system such as supervisord or monit to start up the process and ensure it is restarted if it crashes, as Apache/mod_wsgi will do that for you. You also don't need flup when using mod_wsgi as it provides everything.
> I suspect I will use ajp forked flup, since that only requires > mod_proxy and mod_proxy_ajp which I understand come with standard > apache and the system administrators will probably be happier with.
The Apache/mod_wsgi approach actually has less dependencies. For it you only need Apache+mod_wsgi. For AJP you need Apache+flup+monit-or- supervisord. Just depends on which dependencies you think are easier to configure and manage. :-)
> If the work is indeed long running, the backend process would normally > just acknowledge the request and not wait. The web page would return > and it would be up to user to then somehow occassionally poll web > server, manually or by AJAX, to see how progres is going. That is, > further XML-RPC requests from main server to backend daemon process > asking about progress.
....... this is exactly what we do with the long runners. The wsgi (django in our case) process can work out how long the process is likely to take and either responds directly or offloads the job to an xmrpc server and responds with a page containing a token allowing access to the queue server which refreshes periodically to determine job status etc etc. When the job finishes the refresh request returns the job result and finishes looping. In our case we don't need to worry about people abandoning the job since the results are cached and may be of use to others (typical case produce brochure containing details of all resources in a country or large city). To avoid overload the xmlrpc server is only allowed to run 3 active threads from its queue. -- Robin Becker