Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Can greenlets be used for bursty IO bound tasks?

37 views
Skip to first unread message

Prithvi Bhargava

unread,
Aug 30, 2024, 3:04:58 PM8/30/24
to gevent: coroutine-based Python network library
I have an endpoint on a flask app running on gunicorn that receives requests, and they can be bursty - up to 500 requests at nearly the same instant.

Each request then spawns 4 greenlets each, and each greenlet performs a curl on a remote server. The curl is set to a 125ms timeout so all 4 greenlets are joined in that time. What I'm seeing is that several requests complete promptly but a handful of greenlets get stuck waiting to spawn for up to 4s. This causes timeouts on the requesters and generally a lot of errors. 

Are greenlets the right thing to use here?

Thanks

Aleksandar Kordic

unread,
Aug 31, 2024, 9:56:45 AM8/31/24
to gev...@googlegroups.com
The curl part is for sure blocking. Try replacing it with python implementation. 

To test if this is bottleneck:
- create flask endpoint that does gevent.sleep(2) before returning string response
- spawn 10 greenlets requesting new endpoint via curl
Total time to get 10 requests should be under 9sec. If it is 20sec all curl operations are sequential instead of being parallel. 

My guess is curl is built with threading model. If true it is easier to adopt pure python library to greenlets than modifying external c lib to work with coroutines. 

--
You received this message because you are subscribed to the Google Groups "gevent: coroutine-based Python network library" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gevent+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gevent/200a602c-2999-4628-b73f-ad0032553c09n%40googlegroups.com.

Kevin Tewouda

unread,
Sep 1, 2024, 2:20:11 AM9/1/24
to gevent: coroutine-based Python network library
Hello everyone,
another option will be to call curl in a gevent subprocess. But yeah, like Aleksandar, I think it is preferable to use a library like requests or httpx monkeypatched.

Prithvi Bhargava

unread,
Sep 1, 2024, 8:53:45 AM9/1/24
to gev...@googlegroups.com
I’m using pycurl and have the timeouts set to 250ms. Through logging I can confirm that all 4 greenlets exit in under 250ms. In reality they finish in under 100 ms.

So If I spawn 4 greenlets if they run their curl function, and each function blocks then does it mean that the greenlets effectively run sequentially?

I tried using the python requests library but it was taking over 250ms just to establish a connection. Is there any other approach I can use?

Thanks!

You received this message because you are subscribed to a topic in the Google Groups "gevent: coroutine-based Python network library" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gevent/6oDR80tRnZk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gevent+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gevent/CAC75SYp9DBqSc-rCdskU5SWfE6qZEgUphFBvq250TyOA7JicBw%40mail.gmail.com.

Matt Billenstein

unread,
Sep 1, 2024, 1:57:35 PM9/1/24
to gev...@googlegroups.com
I don't think pycurl cooperates with the event loop? Either use the built-in
urllib.request [1] or the requests library.

As for stalls using requests, you may need to think about the dns resolver -
are all the requests to the same host? You might be getting blocked on dns; I'm
not exactly sure how to go about debugging this. You could try making the
requests to the ip address for each host directly (using Host: header) and
seeing if it stalls less.

And 500 x 4 = 2000 tcp sockets - you might just be flooding the network
interface, saturating the cpu, queuing excessively in the event loop, etc. You
probably need to think if this is a good architecture for this.

Possibly do the requests async and cache the response so you can just return it
directly instead of trying to do it real-time? Or spread the load over multiple
processes - if you need to consistently make 2000 requests, have your process
make N requests to N local processes and each of those make 2000/N requests to
the final destinations - play with values for N to see what works best.

thx

m


1. Example: https://github.com/mattbillenstein/pingthing/blob/main/pingthing.py#L86
> gevent/CA%2BGsUnP_ctCq97CsMyvu12UNTkhuveMOLqLb6c3dmDudvt8ing%40mail.gmail.com.

--
Matt Billenstein
ma...@vazor.com
https://vazor.com

Kevin Tewouda

unread,
Sep 2, 2024, 1:20:02 AM9/2/24
to gevent: coroutine-based Python network library
You can also try the geventhttpclient if you wish.

Aleksandar Kordic

unread,
Sep 2, 2024, 2:45:10 AM9/2/24
to gev...@googlegroups.com
I’m using pycurl and have the timeouts set to 250ms. Through logging I can confirm that all 4 greenlets exit in under 250ms. In reality they finish in under 100 ms.

So If I spawn 4 greenlets if they run their curl function, and each function blocks then does it mean that the greenlets effectively run sequentially?

It seems you may have left out the gevent.sleep(2) portion on the server side in your test. This is likely leading to same timings of sequential and parallel requests.

The purpose of that sleep is to create a scenario where the server-side processing time is significantly longer than the network transport time for each request. This simulates real-world conditions where the server might be performing intensive tasks, such as database queries or complex calculations. By introducing this delay, the sequential nature of the requests on the client side will become obvious. 
Reply all
Reply to author
Forward
0 new messages