gevent and multiprocessing

2,470 views
Skip to first unread message

Jan Persson

unread,
Dec 26, 2010, 9:04:14 AM12/26/10
to gev...@googlegroups.com
Hello all,

Before I delve deeper into this on my own, I'm curious if anyone has any
experience with letting one process be running gevent
(single-threadedly) and then to farm out work to a subprocesses started
with the multiprocessing module? My application is not a traditional web
server and part of this legacy protocol is quite CPU-intensive (since
there is some cryptography involved) and I'm just trying to make it a
little bit more scalable.

I've done some experimentation, but when I try to send messages with a
shared multiprocessing.Queue it blocks. I have tried both with a
monkey.patch_all() and with no monkey patching.

Is this supposed to work or is there more to it?

Since all socket operations are non-blocking, I can of course start the
subprocesses on my own and then open a socket and implement some kind of
protocol to make intra-process RPC possible, but the multiprocessing
module is already there and it sorts out all the quirks on the different
platforms I need to support.

Regards,
Jan Persson

Alex Dong

unread,
Dec 26, 2010, 9:49:48 PM12/26/10
to gev...@googlegroups.com
Jan, we have similar challenges here at trunk.ly. Once we got the link, we'll need to do some quite computational intense things including constructing a search index. 
Here is how we're doing it:

We use redis to connect the pipelines together. So the gevent components will crawl the web and put the html into the queue. Then other multi-processing units will pick up the tasks from the queue, using redis' BLPOP/SUBSCRIBE, and do their own job.  We have EC2 scripts to automatically start an instance and let it join the processing army with little impact on the gevent-based crawlers.

We've scaled this structure out using more than 10 EC2 instances as the processor. It's been working great for us. It also limits the scope gevent gets applied. 

HTH,
Alex

Equand

unread,
Dec 27, 2010, 2:54:47 PM12/27/10
to gevent: coroutine-based Python network library
there is stdin stdout non blocking solution, you just have to google
gevent non blocking stdin...
also you might implement the same in multiprocessing

Jan Persson

unread,
Jan 6, 2011, 8:39:54 AM1/6/11
to gev...@googlegroups.com
Thanks for all the answers.

I've done some more research on my own and I can definitely say that
making multiprocess cooperative is a major undertaking. It is most
likely doable, but it would sure help to have some intrinsic knowledge
about both multiprocessing and gevent and I'm sorry to say that I'm
not up to this task.

As Alex Dong is pointing out, for server side applications a message
queueing scheme is probably the way to go right now. But that will not
help me, since the product I'm building is an off-the-shelf
application and we cannot force the customers to install an MQ on
every single machine just to avoid the GIL.

Regards
//Jan Persson

--
Jan Persson - Esentus Technology AB - www.esentus.com - +46 702 854132 (mobile)

Matt Billenstein

unread,
Jan 7, 2011, 9:31:01 PM1/7/11
to gev...@googlegroups.com
Maybe I'm misunderstanding your problem, but if it's a single-producer,
multiple consumer problem, couldn't you just use gevent in the producer
to create a network server which the clients can connect to for fetching
input to process? This server would take requests, stick them into a
queue, then provide another socket or http interface for worker
processes to fetch input from.

The consumers can then be standalone processes which are spawned
independently from the producer process...

m

On Thu, Jan 06, 2011 at 02:39:54PM +0100, Jan Persson wrote:
> Thanks for all the answers.
>
> I've done some more research on my own and I can definitely say that
> making multiprocess cooperative is a major undertaking. It is most
> likely doable, but it would sure help to have some intrinsic knowledge
> about both multiprocessing and gevent and I'm sorry to say that I'm
> not up to this task.
>
> As Alex Dong is pointing out, for server side applications a message
> queueing scheme is probably the way to go right now. But that will not
> help me, since the product I'm building is an off-the-shelf
> application and we cannot force the customers to install an MQ on
> every single machine just to avoid the GIL.
>
> Regards
> //Jan Persson
>
> On Mon, Dec 27, 2010 at 20:54, Equand <equ...@gmail.com> wrote:
> > there is stdin stdout non blocking solution, you just have to google
> > gevent non blocking stdin...
> > also you might implement the same in multiprocessing
> >

> > On Dec 27, 3:49?am, Alex Dong <alex.d...@gmail.com> wrote:
> >> Jan, we have similar challenges here at trunk.ly. Once we got the link,
> >> we'll need to do some quite computational intense things including
> >> constructing a search index.
> >> Here is how we're doing it:
> >>
> >> We use redis to connect the pipelines together. So the gevent components
> >> will crawl the web and put the html into the queue. Then other
> >> multi-processing units will pick up the tasks from the queue, using redis'

> >> BLPOP/SUBSCRIBE, and do their own job. ?We have EC2 scripts to automatically


> >> start an instance and let it join the processing army with little impact on
> >> the gevent-based crawlers.
> >>
> >> We've scaled this structure out using more than 10 EC2 instances as the
> >> processor. It's been working great for us. It also limits the scope gevent
> >> gets applied.
> >>
> >> HTH,
> >> Alex
>
>
>
> --
> Jan Persson - Esentus Technology AB - www.esentus.com - +46 702 854132 (mobile)

--
Matt Billenstein
ma...@vazor.com
http://www.vazor.com/

Travis Cline

unread,
Jan 12, 2011, 6:18:26 PM1/12/11
to gevent: coroutine-based Python network library
On Jan 6, 7:39 am, Jan Persson <jan.pers...@gmail.com> wrote:
> Thanks for all the answers.
>
> I've done some more research on my own and I can definitely say that
> making multiprocess cooperative is a major undertaking. It is most
> likely doable, but it would sure help to have some intrinsic knowledge
> about both multiprocessing and gevent and I'm sorry to say that I'm
> not up to this task.
>
> As Alex Dong is pointing out, for server side applications a message
> queueing scheme is probably the way to go right now. But that will not
> help me, since the product I'm building is an off-the-shelf
> application and we cannot force the customers to install an MQ on
> every single machine just to avoid the GIL.
>
> Regards
>     //Jan Persson
>

This thread got my wheels turning a bit and I tossed together a few
small examples:

multiproc chat server https://gist.github.com/777085
multiproc echo server https://gist.github.com/776364

It appears that since multiprocessing Event and Queue implementation
block on a semaphore in the c code which is a bit of a dead end. The
Pipe implementation's exercising of the socket api appears
incompatible with gevent's - also c code.

I fell back to just using another StreamServer and communicating with
a socket from the children to the parent process.

Might be useful.


Travis

Creotiv

unread,
Jan 13, 2011, 10:16:19 AM1/13/11
to gevent: coroutine-based Python network library
I use MongoDB for creating messaging queues(just use findAndUpdate db
method that is atomic) between gevent forked processes.
Implementation very simple and pymongo driver work very well with
gevent.

nh2

unread,
Jan 23, 2011, 10:23:15 AM1/23/11
to gevent: coroutine-based Python network library, travis...@gmail.com
On Jan 12, 11:18 pm, Travis Cline <travis.cl...@gmail.com> wrote:
> This thread got my wheels turning a bit and I tossed together a few
> small examples:
>
> multiproc chat serverhttps://gist.github.com/777085
> multiproc echo serverhttps://gist.github.com/776364

Thanks for your examples.

Some days ago I made up example controllers on how to stream content
using Pylons/Turbogears with gevent and gunicorn. It is here:
https://github.com/nh2/eventstreamexamples/blob/master/eventstreamexamples/controllers/geventeventstream.py#L47

I just realized that this creates a socket leak: Whenever a client
refreshes, a new socket is opened and the old one is stuck in
CLOSE_WAIT state until the server is killed - the open file ulimit is
quickly exceeded and the server stops responding.

I guess this is a result of gevent.Queue's q.get() I use there:
Probably the controller is stuck there, waiting for the queue to be
filled, not noticing that the client quit.

Do you have a hint on how to resolve that issue?

Does gevent.Queue have the same problems you describe for
multiprocessing.Queue?

Thanks
Niklas

Niklas Hambüchen

unread,
Jan 26, 2011, 5:27:29 PM1/26/11
to gev...@googlegroups.com
I came up with two solutions for this:

1. Give Queue.get() a timeout and flush an empty message or time
response. If the client disconnected, the write will fail and no further
element will be requested from the stream iterator.

2. Give Queue.get() a timeout and check the socket with select. Return
from iterator if the connection was closed. Adapted from
http://bytes.com/topic/python/answers/40278-detecting-shutdown-remote-socket-endpoint

Do you consider them a good solution?


1:

@expose()
def eventstream(self):
response.headers['Content-type'] = 'text/event-stream'
response.charset = ""

q = get_queue_to_stream()

def stream():
while True:
try:
# Set Queue timeout to 1s
msg = q.get(True, 1)
yield "data: %s\n\n" % msg
except:
# No new message for 1s
yield "data: %s\n\n" % "ping"

return stream()

2:
import select

@expose()
def eventstream(self):
response.headers['Content-type'] = 'text/event-stream'
response.charset = ""

# Get the socket somehow, e.g. from the request environment
sock = request.environ['gunicorn.sock']

q = get_queue_to_stream()

def stream():
while True:
try:
# Set Queue timeout to 1s
msg = q.get(True, 1)
yield "data: %s\n\n" % msg
except:
r, w, e = select.select([sock],[],[sock], 0)
if r or e:
return

return stream()

Jan Persson

unread,
Jan 27, 2011, 6:21:07 AM1/27/11
to gev...@googlegroups.com
Thanks for providing us with examples, but as far as I can see, this
will not work on Windows which is one of the platforms I need to
support.

Correct me if I'm wrong, but aren't your examples relying on
multiprocessing to use fork to create processes and to inherit file
descriptors?

Regards
//Jan Persson

On Thu, Jan 13, 2011 at 00:18, Travis Cline <travis...@gmail.com> wrote:
>
> This thread got my wheels turning a bit and I tossed together a few
> small examples:
>
> multiproc chat server https://gist.github.com/777085
> multiproc echo server https://gist.github.com/776364
>
> It appears that since multiprocessing Event and Queue implementation
> block on a semaphore in the c code which is a bit of a dead end. The
> Pipe implementation's exercising of the socket api appears
> incompatible with gevent's - also c code.
>
> I fell back to just using another StreamServer and communicating with
> a socket from the children to the parent process.
>
> Might be useful.
>
>
> Travis

--

Travis Cline

unread,
Jan 27, 2011, 8:46:46 AM1/27/11
to gev...@googlegroups.com
On Thu, Jan 27, 2011 at 5:21 AM, Jan Persson <jan.p...@gmail.com> wrote:
> Thanks for providing us with examples, but as far as I can see, this
> will not work on Windows which is one of the platforms I need to
> support.
>
> Correct me if I'm wrong, but aren't your examples relying on
> multiprocessing to use fork to create processes and to inherit file
> descriptors?
>
> Regards
>    //Jan Persson

Yes. standard pre-forking. look up 'accept before fork'.

Travis

Reply all
Reply to author
Forward
0 new messages