Using multiprocessing for offloading cpu heavy tasks

76 views
Skip to first unread message

ams.fwd

unread,
May 13, 2016, 2:14:33 PM5/13/16
to gevent: coroutine-based Python network library
Hi All.

I am planning on using multiprocessing to spawn off cpu-intensive and somewhat long-running tasks from a gevent+flask based web server and stream the output back to the HTTP client. I do not intend to exchange any data other than read stdio. I have previously rolled my own implementation using sub-process but need to use more cores this time around.

On the surface it seems reasonably straight-forward with no gotchas and wanted to find out if anybody else has use multiprocessing to spawn off processes and read stdout/err and run into any issues?

Thanks.
AM

Jonathan Kamens

unread,
May 13, 2016, 2:16:02 PM5/13/16
to gev...@googlegroups.com
I recommend that you use gipc for this.
--
You received this message because you are subscribed to the Google Groups "gevent: coroutine-based Python network library" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gevent+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matt Billenstein

unread,
May 13, 2016, 2:36:44 PM5/13/16
to gev...@googlegroups.com
Use a queue -- celery+redis perhaps.

What happens when the flask app needs to restart and background tasks aren't
complete? Do you wait on them? Does it matter if you prematurely terminate
them? Having idempotent tasks and using an external queue is a much better
solution IMHO.

m
> --
> You received this message because you are subscribed to the Google Groups
> "gevent: coroutine-based Python network library" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to gevent+un...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.


--
Matt Billenstein
ma...@vazor.com
http://www.vazor.com/

AM

unread,
May 13, 2016, 2:38:52 PM5/13/16
to gev...@googlegroups.com
On 05/13/2016 11:15 AM, Jonathan Kamens wrote:
> I recommend that you use gipc <https://pypi.python.org/pypi/gipc/0.6.0>
> for this.

I actually did and gipc does appear to handle a lot more edge cases and
subtleties thank mp+gevent. However because I was not planning on any
actual IPC other than just reading off of the stdout, I was not quite
certain if I needed it at the moment.

I will take a look again.

Thanks.
AM


>
> On 05/13/2016 02:14 PM, ams.fwd wrote:
>> Hi All.
>>
>> I am planning on using multiprocessing to spawn off cpu-intensive and
>> somewhat long-running tasks from a gevent+flask based web server and
>> stream the output back to the HTTP client. I do not intend to exchange
>> any data other than read stdio. I have previously rolled my own
>> implementation using sub-process but need to use more cores this time
>> around.
>>
>> On the surface it seems reasonably straight-forward with no gotchas
>> and wanted to find out if anybody else has use multiprocessing to
>> spawn off processes and read stdout/err and run into any issues?
>>
>> Thanks.
>> AM
>> --
>> You received this message because you are subscribed to the Google
>> Groups "gevent: coroutine-based Python network library" group.
>> To unsubscribe from this group and stop receiving emails from it, send
>> an email to gevent+un...@googlegroups.com
>> <mailto:gevent+un...@googlegroups.com>.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google
> Groups "gevent: coroutine-based Python network library" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to gevent+un...@googlegroups.com
> <mailto:gevent+un...@googlegroups.com>.

AM

unread,
May 13, 2016, 2:48:55 PM5/13/16
to gev...@googlegroups.com
I totally agree about the external queue solution.

The tasks are fortunately completely independent from the actual web
server and manage their own state.

The web server is merely a triggering mechanism (usually triggered via
Jenkins jobs) and a convenience if you want to see some output. Killing
the web servers has no bearing on the tasks themselves as state and logs
can be recovered at any time.

We do have celery+sqs solution for another class of tasks and the
management overhead has been higher than we need for this particular
situation and this appears to be much easier to test and maintain.

I think I have a reasonably good understanding of the caveats, I just
wanted to poll the community to see if anybody had run into any issues
that I have overlooked.

Thanks.
AM
Reply all
Reply to author
Forward
0 new messages