Pylons, HTTP 201 Accepted, Task Queues and Background threads

kmw

unread,

Mar 3, 2009, 3:57:39 PM3/3/09

to pylons-discuss

Hi everyone,

I'm trying to find some docs or perhaps old discussions about
implementing a task queue within a pylons application. The scenario
I'm trying to support involves a request coming into the app server to
perform an action which takes a long time to complete, such as
rebuilding an index or updating a value across hundreds of thousands
of objects.

My thought was to create a processing thread when the app is loaded.
When I get the request I add the item to a synchronized queue (which
the processing thread blocks on) and return a HTTP 201 Accepted to the
client. The processing thread picks up tasks from the queue and they
are completed in the order received. The 201 response also has an
additional Location header to poll the status of the task.

The question that remained was how to create and manage processing
thread. I've read a couple of threads on this subject, and hunted
around google a bit and found a couple of options:
-
http://groups.google.com/group/pylons-discuss/browse_thread/thread/e30fb912ca79b000/7cc1d4a6b1d9919d?lnk=gst&q=background#7cc1d4a6b1d9919d
-
http://groups.google.com/group/pylons-discuss/browse_thread/thread/3e9dfda05af50634/bc914b96e2b96a1b?lnk=gst&q=background#bc914b96e2b96a1b

Now I'm leaning towards creating a process using the python
multiprocessing module which interfaces like a thread but skips issues
with the GIL and pylons thread management. However, I didn't find any
information about how to manage the process lifecycle and allow it to
shutdown gracefully when the server is stopped.

I'd appreciate feedback on this approach and any pointers to resources
that will allow me to hook into the app lifecycle and manage my
subprocess as well. Hopefully I can get a working recipe out of this
and put it all together in the pylons cook book for future reference.

Thanks.

chris van

unread,

Mar 3, 2009, 5:42:29 PM3/3/09

to pylons-...@googlegroups.com

G'day,

I know nothing about how to solve your problem, though it does sound like something that interests me and which I would like to do myself. I currently have a controller feeding data into a shared-memory circular buffer, from which an entirely separate application reads and executes in it's own time. I return a 400 OK to the client as soon as the controller is finished validating the request and stores it in the buffer; the client must then poll the server to get updates on the status of the request once it is in the buffer. Returning 201 Accepted sounds like a better solution, could you please let me know how successful you are in implementing it under pylons and how it works for you?

Thanks a lot,

Chris Van Schaijik

> Date: Tue, 3 Mar 2009 12:57:39 -0800
> Subject: Pylons, HTTP 201 Accepted, Task Queues and Background threads
> From: kochh...@gmail.com
> To: pylons-...@googlegroups.com

Kochhar

unread,

Mar 3, 2009, 6:26:50 PM3/3/09

to pylons-...@googlegroups.com

> G'day,
>
> I know nothing about how to solve your problem, though it does sound
> like something that interests me and which I would like to do myself. I
> currently have a controller feeding data into a shared-memory circular
> buffer, from which an entirely separate application reads and executes
> in it's own time. I return a 400 OK to the client as soon as the
> controller is finished validating the request and stores it in the
> buffer; the client must then poll the server to get updates on the
> status of the request once it is in the buffer. Returning 201 Accepted
> sounds like a better solution, could you please let me know how
> successful you are in implementing it under pylons and how it works for you?
>
> Thanks a lot,

Chris,

Two questions? What happens to the external application when the pylons
application is shut down? Does it empty the buffer and block for new input or do
you have a way to shut down the external application from pylons? Secondly, do
clients poll the pylons server or the external application for status?

I'm guessing you meant you return 200 OK not 400. Whether you choose to return a
200 or 201 status is mostly an aesthetic choice, though 201 is a clearer
indication of the result. If you're trying to change the response from 200 to
201 you can manipulate the pylons.response to set its status_code to 201. Hope
this helps you out.

Kochhar

chris van

unread,

Mar 3, 2009, 9:27:32 PM3/3/09

to pylons-...@googlegroups.com

> Date: Tue, 3 Mar 2009 15:26:50 -0800
> From: kochh...@gmail.com
> To: pylons-...@googlegroups.com
> Subject: Re: Pylons, HTTP 201 Accepted, Task Queues and Background threads

G'day Kochar,

In my ideal implementation, neither pylons application or the external application will ever be shut down ;-) In reality, if the external application is still running then it will empty the buffer and wait for new input. If the external application is shutdown and pylons is still running the buffer will fill as requests are received and then reject new input once it is full. Most of the time, if only one application is running, then the other should be too (or will be shortly). I guess this differs somewhat from the original message, where it is desired to shut-down the background process if the server is stopped. It wouldn't be wrong for my system to behave this way I guess, but I prefer to keep the two seperate (which is why the external application does not start/is not started by Pylons).

You got me on the 200/400, I've coded a lot more 400 responses using the abort command for when new input requests are found to be syntactically invalid. 202 Accepted is the code I would like to be returning, though I am not 100% sure what to make of the w3c recommendation "The entity returned with this response SHOULD include an indication of the request's current status and either a pointer to a status monitor or some estimate of when the user can expect the request to be fulfilled." I can give a time estimate easily enough (average processing time per input X number of inputs in buffer) but I can't see this being of much practical use. What is meant by a "pointer to a status monitor"? The original message in this thread mentions "The 201 response also has an additional Location header to poll the status of the task" which I take to be refering to the same thing, and it just means including a location field in the http header according to http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.30

Clients poll the pylons server which returns information about the state of the buffer and (with limited accuracy) the state of the external application. Reading more about this 202 response code, it looks like I should provide a little more detail in the status monitor about the progress of the specific request that generated the 202 message in relation to other items in the buffer.
Any suggestions about what format the status monitor should be in? At the moment it is just going to be an XHTML page with fields identifying the number of items in the buffer, position in the buffer and expected time in seconds till execution (assuming the external application is running properly).

This is my first real web application, and as I am trying to control a physical device, it is just a little different from the standard database-based web applications most documentation is geared to, but I'm trying to make it as standards complient and RESTful as I can manage, whilst struggling to comprehend the somewhat esoteric concepts involved.

Chris.

Find car news, reviews and more Looking to change your car this year?

Ian Bicking

unread,

Mar 4, 2009, 1:30:54 AM3/4/09

to pylons-...@googlegroups.com

If you are thinking about user-visible long running tasks, maybe give
a look at: http://pythonpaste.org/waitforit/ -- it seems like you are
more thinking about APIs, but at least similar.

FYI, I think there's actually an HTTP header to indicate when the
client should poll next.

--
Ian Bicking | http://blog.ianbicking.org

Chris Moos

unread,

Mar 4, 2009, 2:06:28 AM3/4/09

to pylons-discuss

I wrote a quick blog post about how I do worker threads:
http://chrismoos.com/2009/03/04/pylons-worker-threads/

Lawrence Oluyede

unread,

Mar 4, 2009, 3:18:33 AM3/4/09

to pylons-...@googlegroups.com

On Tue, Mar 3, 2009 at 9:57 PM, kmw <kochh...@gmail.com> wrote:
> When I get the request I add the item to a synchronized queue (which
> the processing thread blocks on) and return a HTTP 201 Accepted to the
> client. The processing thread picks up tasks from the queue and they
> are completed in the order received. The 201 response also has an
> additional Location header to poll the status of the task.

Just to clarify a thing: 201 is CREATED, 202 is ACCEPTED.
Don't send 201 instead of 202 because they have different meanings:
<http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html>

About the matter at hand: we spawn some process in that kind of
situation. Not threads.
If you have scalability issues you can think about using the ampq
protocol with an implementation such as rabbitmq
I'm +1 about using 202 ACCEPTED :-)

--
Lawrence, http://oluyede.org - http://twitter.com/lawrenceoluyede
"It is difficult to get a man to understand
something when his salary depends on not
understanding it" - Upton Sinclair

Kochhar

unread,

Mar 4, 2009, 5:56:23 PM3/4/09

to pylons-...@googlegroups.com

Lawrence Oluyede wrote:
> On Tue, Mar 3, 2009 at 9:57 PM, kmw <kochh...@gmail.com> wrote:
>> When I get the request I add the item to a synchronized queue (which
>> the processing thread blocks on) and return a HTTP 201 Accepted to the
>> client. The processing thread picks up tasks from the queue and they
>> are completed in the order received. The 201 response also has an
>> additional Location header to poll the status of the task.
>
> Just to clarify a thing: 201 is CREATED, 202 is ACCEPTED.
> Don't send 201 instead of 202 because they have different meanings:
> <http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html>
>
> About the matter at hand: we spawn some process in that kind of
> situation. Not threads.
> If you have scalability issues you can think about using the ampq
> protocol with an implementation such as rabbitmq
> I'm +1 about using 202 ACCEPTED :-)
>

I meant to say 202 Accepted not 201, that's what you get for trying to remember
HTTP status codes.

I am also leaning towards using processes instead of threads (using the
multiprocessing module) but to play devil's advocate for moment why do you
prefer processes to threads? Secondly, and this is pertinent when spawning
processes, how do you hook into the pylons shutdown process to get the external
process to stop?

Cheers,
- Kochhar

Kochhar

unread,

Mar 4, 2009, 6:02:46 PM3/4/09

to pylons-...@googlegroups.com

Thanks Ian, that's pretty useful to know about. It doesn't fit my case because
I'm working on server-side APIs, but I'll take a look at the implementation to
see if I can glean some ideas.

- Kochhar

Lawrence Oluyede

unread,

Mar 4, 2009, 6:22:57 PM3/4/09

to pylons-...@googlegroups.com

On Wed, Mar 4, 2009 at 11:56 PM, Kochhar <kochh...@gmail.com> wrote:
> I am also leaning towards using processes instead of threads (using the
> multiprocessing module) but to play devil's advocate for moment why do you
> prefer processes to threads?

The answer is kind of easy. I do not have computations which would
benefit more from a threading model than from a processing model. Our
Apache frontend uses multiprocesses and our async computations are
done in processes. We do not need to share anything and if we do, we
just copy the data to the process. With an API such as pyprocessing's
or subprocess is kind of easy.

If we do have to go to the reason why I generally don't like threading
with shared state there's a great resource which I suggest to read
from cover to cover:
<http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.html>

It's also useful if you do want to know about pro and cons of the
usual way threads are used. There's nothing bad intrinsecally, it's
the common style of threading some environments taught us which is
bad.

By the way you can combine the two techniques if you have to

> Secondly, and this is pertinent when spawning
> processes, how do you hook into the pylons shutdown process to get the external
> process to stop?

Not sure how to respond to this. Pylons is a framework, it can't be
turned on or shutted down. You can start and stop the web server and
how to sync this process with some "daemon processes" largely depend
on the operating system, and so on.

If you have worker process usually those worker processes are created
on demand. If you need some kind of process pool the processes are
killed by the pool, and so on.

Take a look at: <http://pyprocessing.berlios.de/>

If you use Python 2.6 use the builtin module "multiprocessing",
otherwise you can use the backport of that:
<http://pypi.python.org/pypi/multiprocessing/>

I'm not entirely sure of the compatibility issues between the standard
library version of the package and the original one.

--
Lawrence Oluyede
[eng] http://oluyede.org - http://twitter.com/lawrenceoluyede
[ita] http://neropercaso.it - http://twitter.com/rhymes

Kochhar

unread,

Mar 5, 2009, 2:39:57 PM3/5/09

to pylons-...@googlegroups.com

Lawrence Oluyede wrote:
> On Wed, Mar 4, 2009 at 11:56 PM, Kochhar <kochh...@gmail.com> wrote:
>> I am also leaning towards using processes instead of threads (using the
>> multiprocessing module) but to play devil's advocate for moment why do you
>> prefer processes to threads?
>
> The answer is kind of easy. I do not have computations which would
> benefit more from a threading model than from a processing model. Our
> Apache frontend uses multiprocesses and our async computations are
> done in processes. We do not need to share anything and if we do, we
> just copy the data to the process. With an API such as pyprocessing's
> or subprocess is kind of easy.
>
> If we do have to go to the reason why I generally don't like threading
> with shared state there's a great resource which I suggest to read
> from cover to cover:
> <http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.html>
>
> It's also useful if you do want to know about pro and cons of the

> usual way threads are used. There's nothing bad intrinsically, it's

> the common style of threading some environments taught us which is
> bad.
>
> By the way you can combine the two techniques if you have to

I lean towards using processes and skipping shared memory entirely as well.
However, there's no reason that multi-threaded applications can't use
message-passing. This might not be the forum for a processes vs. threads
discussion, though.

>> Secondly, and this is pertinent when spawning
>> processes, how do you hook into the pylons shutdown process to get the external
>> process to stop?
>
> Not sure how to respond to this. Pylons is a framework, it can't be
> turned on or shutted down. You can start and stop the web server and
> how to sync this process with some "daemon processes" largely depend
> on the operating system, and so on.

Sorry, I tend to say pylons when I mean the paste http sever. I'm still green on
the terminology.

> If you have worker process usually those worker processes are created
> on demand. If you need some kind of process pool the processes are
> killed by the pool, and so on.
>
> Take a look at: <http://pyprocessing.berlios.de/>
>
> If you use Python 2.6 use the builtin module "multiprocessing",
> otherwise you can use the backport of that:
> <http://pypi.python.org/pypi/multiprocessing/>
>
> I'm not entirely sure of the compatibility issues between the standard
> library version of the package and the original one.

I've successfully used multiprocessing before in other apps, and I'm fine
spawning worker processes inside of pylons. I just need to be able to shut them
down when pylons^H^H^H^H^H^H the web server shuts down. I guess I can just write
the process pids to the filesystem and use supervisord to send signals. Though,
I would really like that logic to be a part of the pylons application.

I was hoping to find the converse of config/environment.py:load_environment
which I could hook into to unload worker processes.

Reply all

Reply to author

Forward