gevent faster, even with regular requests library

701 views
Skip to first unread message

Rex East

unread,
Jan 20, 2019, 8:54:48 AM1/20/19
to gevent: coroutine-based Python network library
Hi, I'm reading the intro documentation for gevent, which says "Using the standard socket module inside greenlets makes gevent rather pointless". However, I wrote a little performance test using requests, and found a big advantage to gevent. I also added grequests to the comparison. With the below code, test_sync_gevent and test_async_gevent both run in about 4 seconds consistently, whereas test_sync takes 20 seconds. Am I missing something?

#### SCRIPT
import gevent, grequests, requests, time
import random

url = 'http://www.httpbin.org/get'
url = 'https://news.ycombinator.com/'
N_ITER = 20


def compare():
    results = []
    print(N_ITER, 'iterations')
    for fxn in [test_sync_gevent, test_async_gevent, test_sync]:
        start = time.time()
        fxn()
        elapsed = time.time() - start
        print(round(elapsed, 2), fxn)

    # ensure that all functions get the same data

    for r in results:
        assert r == results[0]


def sync_request():
    return requests.get(url).text


def async_request():
    async_req = grequests.get(url)
    async_req.send()
    return async_req.response.text


def test_sync():
    return [sync_request() for _ in range(N_ITER)]

def test_sync_gevent():
    jobs = [gevent.spawn(sync_request) for _ in range(N_ITER)]
    gevent.joinall(jobs)
    return [job.value for job in jobs]

def test_async_gevent():
    jobs = [gevent.spawn(async_request) for _ in range(N_ITER)]
    gevent.joinall(jobs)
    return [job.value for job in jobs]


compare()

#### END SCRIPT


Jason Madden

unread,
Jan 20, 2019, 9:08:00 AM1/20/19
to gev...@googlegroups.com


> On Jan 20, 2019, at 07:37, Rex East <rex.eas...@gmail.com> wrote:
>
> Hi, I'm reading the intro documentation for gevent, which says "Using the standard socket module inside greenlets makes gevent rather pointless". However, I wrote a little performance test using requests, and found a big advantage to gevent. I also added grequests to the comparison. With the below code, test_sync_gevent and test_async_gevent both run in about 4 seconds consistently, whereas test_sync takes 20 seconds. Am I missing something?

"Using the standard socket module" means just that: 'import socket' from the standard library (without monkey-patching). 'import socket' will produce objects that block the event loop (are non-cooperative) and hence don't allow greenlets to run concurrently. 'from gevent import socket' (or monkey-patching) will, on the other hand, produce cooperative objects that allow greenlet concurrency.

That said, I don't think you're measuring what you think you're measuring here. It happens that 'import grequests' *implicitly monkey-patches* when you import it (https://github.com/kennethreitz/grequests/blob/master/grequests.py#L21).

Thus after that 'import socket' is equivalent to 'from gevent import socket' and 'requests.get()' is equivalent to 'grequests.get()' --- everything is cooperative with gevent. So what you're measuring here in all cases is simply the difference of running things concurrently vs sequentially. You're not measuring using standard (non-cooperative) sockets inside greenlets; you'd have to drop the import of grequests to do that.

If I do that on my system (remove 'import grequests', simply use 'requests'), I find all three take 20s as expected (all three are being run sequentially). If I then add 'from gevent import monkey; monkey.patch_all()' to the beginning of the script and run again, I see the same behaviour you report; 'test_sync_gevent' and 'test_async_gevent' both run in ~4s (because they run concurrently) while 'test_sync' still takes 20s (running sequentially).

~Jason

Rex East

unread,
Jan 20, 2019, 10:59:25 AM1/20/19
to gevent: coroutine-based Python network library
Hi Jason,

Thank you so much! Now I understand my mistake.

I am running a Django/gunicorn site. On each request, I need to make 5-15 server-side calls to external HTTP APIs. For responsiveness, I want them to be parallelized. I don't want to monkey patch socket globally because I don't want to disrupt other modules that rely on socket, etc (such as the webserver itself). Currently I'm using aiohttp but I find it is spreading too many async/awaits throughout the code (especially as I add more layers/wrappers around the HTTP calls. I just found out about geventhttpclient which doesn't seem to use monkey patching, but in my benchmark it takes just as long as the requests.get version. Am I doing something wrong? Or some alternative I should use? Thank you!

# SCRIPT
import gevent, requests, time
import random
from geventhttpclient import HTTPClient
import gevent.pool

#url = 'http://www.httpbin.org/get'
url = 'https://news.ycombinator.com/'
N_ITER = 15



def compare():

    print(N_ITER, 'iterations')
    for fxn in [
            test_async_gevent,
            test_sync
            ]:
        start = time.time()
        print(fxn())

        elapsed = time.time() - start
        print(round(elapsed, 2), fxn)

    # ensure that all functions get the same data


def sync_request():
    return requests.get(url).text

http = HTTPClient.from_url(url)

def async_request():
   
    resp = http.get(url)
    assert resp.status_code == 200
    body = resp.read()
    return body



def test_sync():
    return [sync_request() for _ in range(N_ITER)]


def test_async_gevent():
   
    # i also tried with pool

    jobs = [gevent.spawn(async_request) for _ in range(N_ITER)]
    gevent.joinall(jobs)
    return [job.value for job in jobs]



compare()
### END

Jason Madden

unread,
Jan 20, 2019, 12:01:04 PM1/20/19
to gev...@googlegroups.com


> On Jan 20, 2019, at 09:45, Rex East <rex.eas...@gmail.com> wrote:
>
> I am running a Django/gunicorn site. On each request, I need to make 5-15 server-side calls to external HTTP APIs. For responsiveness, I want them to be parallelized. I don't want to monkey patch socket globally because I don't want to disrupt other modules that rely on socket, etc (such as the webserver itself). Currently I'm using aiohttp but I find it is spreading too many async/awaits throughout the code (especially as I add more layers/wrappers around the HTTP calls. I just found out about geventhttpclient which doesn't seem to use monkey patching, but in my benchmark it takes just as long as the requests.get version. Am I doing something wrong? Or some alternative I should use? Thank you!
>

I'm not familiar with `geventhttpclient` (I usually just monkey-patch the process), but a quick look at it shows that `HTTPClient.from_url` creates an object that uses a `ConnectionPool` that uses lock to enforce a concurrency of 1. So it's also being used sequentially. I tried passing the `concurrency=` argument to that method, and I tried creating that object in the method instead of globally, both as ways to get around the lock, and yes, concurrency went up, but I also got 503 responses from the server, so I'm not sure what went wrong.

Rex East

unread,
Jan 20, 2019, 3:32:45 PM1/20/19
to gevent: coroutine-based Python network library
OK thank you! I got it working now :)
Reply all
Reply to author
Forward
0 new messages