First off, thanks Sven (and anyone else who contributed) for your work on mod_gearman. I’m testing out v1 now as a DNX replacement for a high-volume Nagios implementation and so far I have to say I’m impressed.
As part of that testing I’ve been tweaking and observing performance for the jobs queue, check latency and the rate at which new workers spawn. What I think I’m seeing is that workers tend to spin up somewhat slowly in response to an increase in the jobs waiting queue.
For example, for some of my tests I had the min # of workers set to 20 and the max # set to 100 on two dedicated mod_gearman instances, with a third on the running Nagios server set to a min of 1 and a max of 20. The queue was processing speedily, with a service check latency of around 0.2 seconds (as reported by Thruk) and no jobs waiting queue to speak of, and the number of workers on a given instance at any given time was well below the configured maximum.
Then I shut down one of the workers. The jobs waiting queue quickly increased as expected to around 400. But the number of workers on the remaining instances increased only slowly, and only hit a max of around 60 (as gauged by debug log entries) on the dedicated instance. Despite the jobs waiting queue staying >0 for most of the time and the average latency steadily increasing.
It seems that the ideal would be for the number of workers to quickly increase as long as there are jobs waiting, until that instance hits the configured max # of workers and/or the queue is fully processed. But given the seeming (and possibly intentional) delay in spawning new workers, should I therefore increase the minimum number of workers if I want max performance? I’m currently testing with a min # of 50 and a max of 100. But is there any potential harm (aside from tying up system resources) to maintaining a high minimum number of workers on a system?
Again, many thanks for all the work that’s gone into this. We’re excited to try it out.
Cheers,
Erik Larkin
increasing and maintaining worker population is tricky topic and there are a two parameters to tune beside the min/max-worker.
The idle-timeout can be used to decrease the population when there are now or only a few jobs. Should make no difference in
your case. The second one is the max-jobs setting which controls the amount of jobs after which a worker will exit. The default is
1000 jobs per worker which is probably to low for your setup.
Mod-Gearman will spawn ~ 1 worker per second as long as there are jobs waiting. So when you have 1000 jobs per second done, then the
number of total worker would stay equal, because there is one worker spawning and one exiting. The number of spawning worker is hard
coded at the moment. Maybe i will add a parameter for it if there is use for it.
I don't think there is a perfect default setup for everyone, so just play around with these parameters to see what fits best into
your environment.
thanks for using mod-gearman,
Sven
On Tue, Mar 01, 2011 at 09:04:56PM -0300, Deives Michellis wrote:
> It would be nice to have that worker increase magic number as a config
> option. We have bursts of up to 600 checks in one second. Going from an
> average of 20 to a burst like that, one worker per second, will be a serious
> bottleneck.
Should be easy to add this option.
One bug advantage of mod-gearman is to normalize these burts. Usually it does
not make sense to start 600 worker when there is such a burst. Most times it is
enough to start way less worker.
But anyway, you are right, it should be a config option...
Sven
Next version (release planned for the next few days) will have a "spawn-rate" option for that.
Sven
Seconded, thanks Sven. Very much looking forward to trying the new option.
From: mod_g...@googlegroups.com [mailto:mod_g...@googlegroups.com] On Behalf Of Deives Michellis
Sent: Thursday, March 03, 2011 5:26 AM
To: mod_g...@googlegroups.com
Cc: Sven Nierlein
Subject: Re: [mod_gearman] Performance tuning for large instances
Thank you! Very much appreciated!