AutoPool

cymrow

unread,

Jun 2, 2015, 4:06:39 AM6/2/15

to gev...@googlegroups.com

I have a distributed network of gevent workers doing a pretty wide variety of jobs involving large numbers of requests, sometimes combined with some CPU work. One of the bits of information I'm most interested in is the response time of the requests, so I want them to be as accurate as possible, while pushing gevent's event loop as much as possible. I can, of course, run tests for each new type of job I create to find the ideal limit for my worker pools, but I figure there must be a good way to have the pool adjust itself automatically.

I'm not entirely sure if this makes sense for how gevent works internally, but I'm sure someone has had this idea before. My approach so far has been to use either gevent.sleep() or gevent.idle(), and measure the amount of time it actually takes to return to determine if I can safely scale up. My thinking was that an .idle() call, for example, would wait until all available IO had been processed before returning, so I would know if some incoming response was taking longer than it should to be processed. In small-scale tests, this actually seems to work pretty well. But once I start to include multiple IO calls within a greenlet, it starts to break down and the pool quickly goes over the limit of what it can actually handle. In other words, .sleep() and .idle() return quickly, but my other requests begin to time out.

Can anyone point me in a better direction, or explain to me why this won't work? I'd love to just drop a worker pool on any machine and know that it will maximize resource usage automatically.

Thanks,

miguel

Matt Billenstein

unread,

Jun 2, 2015, 1:47:56 PM6/2/15

to gev...@googlegroups.com

What do you mean by pool? greenlet pool or a process pool?

m

> --
> You received this message because you are subscribed to the Google Groups
> "gevent: coroutine-based Python network library" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to gevent+un...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
Matt Billenstein
ma...@vazor.com
http://www.vazor.com/

cymrow

unread,

Jun 2, 2015, 2:14:45 PM6/2/15

to gev...@googlegroups.com

Greenlet pool. I run a worker process with a greenlet pool for each CPU core.

Matt Billenstein

unread,

Jun 2, 2015, 2:32:31 PM6/2/15

to gev...@googlegroups.com

On Tue, Jun 02, 2015 at 11:14:45AM -0700, cymrow wrote:
> Greenlet pool. I run a worker process with a greenlet pool for each CPU
> core.

Okay, cool, I've done the same thing and I didn't use a greenlet pool at all -
I basically just spawned from the accept loop for each request. I log response
time and cpu usage in graphite to track scaling metrics and so forth.

But it occurs to me, and maybe more to your question, you'd want to perhaps
look at the event loop directly and see how many greenlets are ready to run but
are waiting. This would give you a better idea (than sleep/idle) of how hard
you were pushing the process and perhaps that it was getting too many requests.
I would think a largish backlog would roughly correlate with high cpu
utilization.

I unfortunately don't know enough about the gevent internals to say how to do
that or if this is even possible...

It may also be interesting to look at the listening socket backlog as well
although it seems perhaps you can only do that through netstat.

m

Shaun Lindsay

unread,

Jun 2, 2015, 4:18:44 PM6/2/15

to gev...@googlegroups.com

I wouldn't use a greenlet pool for this at all. When you have a combination of CPU and IO bound tasks, the burstiness and composition of the load will make tuning a pool-based approach very difficult. The simplest approach is to not use a pool and just monitor response time and cpu-load, adding more servers when your 95th percentile response time goes above whatever threshold is acceptable or your load average goes over the number of cpus (likely those two events will happen at the same time).

When you have a heterogenous mixture of tasks running, there's no safe pool size that will guarantee fast, consistent response times. If you have a pool with two greenlets, and get two requests, one CPU-bound taking 2 seconds and the other an IO-bound task taking 5 milliseconds, you'll likely have both take 2 seconds, since the cpu-bound task won't yield until it finishes.

For more response time consistency, you could consider dispatching the CPU-heavy portions of the tasks to a separate pool of machines. For those, there's not much point in using gevent -- better to just use a couple of dedicated processes per cpu. IO-bound tasks are a perfect fit for a standard greenlet-per-request model, no pool required.

Another approach is to try to divide up the cpu-heavy portions of a task (adding a time.sleep(0) to force the greenlet to yield), allowing other greenlets to do their IO-related stuff without waiting for the entire CPU-bound section to complete. This isn't a particularly robust approach and generally requires profiling to figure out what code sections are really causing problems, but might be enough to smooth out the response time distribution.

--

cymrow

unread,

Jun 3, 2015, 12:09:11 AM6/3/15

to gev...@googlegroups.com

First, I think I should clarify that I'm not running a web server, which I guess is what most gevent users are doing. This is more like a scanner. So my workers aren't receiving the requests, they're making them. In my case, there isn't really a backlog to worry about. My problem is when I send out 1k requests, and they all respond within a 0.5s window. I want to know the response times of those responses as precisely as I'm able. If I cap a greenlet pool to 100, then there are no problems. But if I just spawn off all 1k requests at once, I'll get response times of 10s when the server is actually responding in 0.01s.

At this point I'm just talking about single request, pure IO greenlets. In that case, the sleep/idle technique actually seems to work to make sure that it's not getting more responses than it can handle. If I make multiple requests and add a little bit of CPU (maybe 0.02s) it breaks down, and I start seeing far too many greenlets running and high response times. I'm not sure if it's the CPU or the extra IO at this point, until I do some more testing.

miguel

cymrow

unread,

Jun 3, 2015, 12:34:56 AM6/3/15

to gev...@googlegroups.com

shaun,

I can't necessarily apply a threshold to my response times because there is likely to be a wide range of times that will be acceptable or expected for any given context. For example, if I were to make requests to a large range of IPs that have temporarily gone away, the high response times/timeouts would not mean my event loop is at max capacity.

I have seen pool adjusters based on CPU load (in celery, I think). I think I discounted that option because I couldn't find a reliable and efficient way to measure that at the time. I can see it working, though, so I'll revisit that.

I'm well aware of the effects of mixing CPU with IO when using gevent. The response time-sensitive jobs my workers do, have minimal CPU components, so that's not really my concern. I have long considered adding a pool of worker processes for CPU work to the pipeline for some of the other jobs that do need that, however.

I have also thought a lot about adding pure IO, one spawn per request worker processes, but I'm not sure the cost of all the serialization involved would be worth it. I do plan to test that option out at some point.

I abandoned the idea of mixing sleeps into CPU work long ago. It's just far too messy.

Matt Billenstein

unread,

Jun 3, 2015, 1:27:53 AM6/3/15

to gev...@googlegroups.com

I think practically speaking I've usually limited the pool size in this case by
running a few experiments and seeing where the performance levels off (50-100
usually) in terms of req/s.

If you're running the backends you're making requests to, then maybe there is
something to be said about making the pool size some multiple of the number of
backends -- and if you own the backends, you could do fancier things like
sticking the response time in the response from the server so you could compare
that to what you're seeing in the client process... That is if latency isn't a
big part of the equation.

m

> --
> You received this message because you are subscribed to the Google Groups
> "gevent: coroutine-based Python network library" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to gevent+un...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Shaun Lindsay

unread,

Jun 3, 2015, 3:32:33 AM6/3/15

to gev...@googlegroups.com

Ah, I see. My suggestions are less relevant if you're trying to measure requests you're making rather than requests you're handling. The way that gevent is structured, there's likely no way to separate a slowly responding destination from an overloaded system in a reliable way, especially for large bursts of requests. Doing thousands of simultaneous requests will always induce some amount of timing skew. The only way I could see getting around that would be writing this in C and managing the epoll yourself, tracking when response file descriptors become active, but handling the responses on a separate thread. I'll have to think about this a bit more, seems like there should be a way to reliably measure this at the greenlet level.

cymrow

unread,

Jun 3, 2015, 9:30:49 AM6/3/15

to gev...@googlegroups.com

No I have no control over the remote end. I have considered adding a periodic request to a server that I do control, and using the response times of that to control my pool. Maybe even requests to a socket listening within the same process. It's not ideal but it could work.

cymrow

unread,

Jun 3, 2015, 9:35:42 AM6/3/15

to gev...@googlegroups.com

That's exactly the sort of thing I was hoping to find. What I essentially want is to force the event loop to process all IO that's waiting to be processed. If it is able to do that within a set time threshold, then I know I can add another greenlet to the pool. I was hoping idle() would do that, but it seems that it sometimes returns even when there is still IO ready for processing.

Reply all

Reply to author

Forward