ThreadPool.spawn is too slow under greenlet.

69 views
Skip to first unread message

Fei Li

unread,
Jan 7, 2015, 2:57:26 AM1/7/15
to gev...@googlegroups.com
here is my code:

import time
import contextlib
import gevent
from gevent.threadpool import ThreadPool


pool = ThreadPool(1)


@contextlib.contextmanager
def timer(label):

    bt = time.time()
    yield
    print '%s %.2f ms' % (label, 1000 * (time.time() - bt))


def _eat_cpu():
    time.sleep(1)


def handle(i):

    with timer('pool.spawn: %s' % i):
        r = pool.spawn(_eat_cpu)

    return r.get()


def test1():

    for i in xrange(10000):
        gevent.spawn(handle, i)


def test2():

    for i in xrange(10000):
        handle(i)


test1()
# test2()

gevent.wait()


result of test1:
pool.spawn: 0 0.20 ms
pool.spawn: 1 0.10 ms
pool.spawn: 2 1000.98 ms
pool.spawn: 3 2003.38 ms
pool.spawn: 4 3007.69 ms
pool.spawn: 5 4011.77 ms
pool.spawn: 6 5012.74 ms
pool.spawn: 7 6015.41 ms
pool.spawn: 8 7016.91 ms
pool.spawn: 9 8018.87 ms
pool.spawn: 10 9023.03 ms

result of test2:
pool.spawn: 0 0.74 ms
pool.spawn: 1 0.08 ms
pool.spawn: 2 0.08 ms
pool.spawn: 3 0.08 ms
pool.spawn: 4 0.12 ms
pool.spawn: 5 0.07 ms
pool.spawn: 6 0.11 ms
pool.spawn: 7 0.07 ms
pool.spawn: 8 0.14 ms
pool.spawn: 9 0.10 ms
pool.spawn: 10 0.07 ms

I find some lines in gevent/threadpool.py below which describe the reason. i try to remove line 161 and the test works well.

160             # rawlink() must be the last call
161             result.rawlink(lambda *args: self._semaphore.release())
162             # XXX this _semaphore.release() is competing for order with get()
163             # XXX this is not good, just make ThreadResult release the semaphore before doing anything else

I want to know why  _semaphore.release() is competing for order with get() and whether removing the code is OK.

Also how to make a pool's spawn works like queue's put_nowait.

Wait for help and reply, thank you.

Denis Bilenko

unread,
Jan 7, 2015, 5:03:15 AM1/7/15
to gev...@googlegroups.com
> pool = ThreadPool(1)

Your pool has size = 1, so it can only run one request at a time. Try
changing 1 into 10. or 1000.
> --
> You received this message because you are subscribed to the Google Groups
> "gevent: coroutine-based Python network library" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to gevent+un...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Fei Li

unread,
Jan 7, 2015, 7:13:58 AM1/7/15
to gev...@googlegroups.com
I set size=1 intentional to make the result more obvious.

My point is there is a semaphore in ThreadPool which make the pool process task one by one when the pool is full.

Fei Li

unread,
Jan 7, 2015, 8:59:25 AM1/7/15
to gev...@googlegroups.com
From the result I paste in my first post, we can find the time which task cost become larger and larger. This is very bad for a online system.

For example:

Consider the pool can process 10 requests per second, and there are 15 requests per second need to be processed. We should drop 5 requests per second to make 66% requests be well processed because the requests need to be processed in time.

The original threadpool process every submitted task which makes the new tasks wait longer and longer.

Catstyle Lee

unread,
Jan 7, 2015, 3:41:45 PM1/7/15
to gev...@googlegroups.com
your `timer` is misused, it should be

with timer('pool.spawn: %s'  % i):
    r = pool.spawn(_eat_cpu)
    return r.get()

or you can try add some log as below
def timer(label):
    bt = time.time()
    print gevent.getcurrent()
    yield
    print '%s %.2f ms' % (label, 1000 * (time.time() - bt))

then you can see something useful
--
BR,
/Catstyle_Lee
Reply all
Reply to author
Forward
0 new messages