load balancing among multiple servers - python

933 views
Skip to first unread message

Yang Peng

unread,
May 7, 2013, 1:01:10 PM5/7/13
to gea...@googlegroups.com
Hi all,

I searched previous posts and found that the C implementation cannot handle load balancing among multiple servers well, basically, jobs are dispatched based on the registration sequence of servers, FIFO manner?

Does the python implementation has the similar problem?
I want to set up two servers, and let jobs be dispatched to workers on different servers with load balancing.

Thanks,
Yang

Brian Moon

unread,
May 7, 2013, 2:36:53 PM5/7/13
to gea...@googlegroups.com, Yang Peng
This is not entirely true. In actuality, the server sends a wake up to
the workers in the order they registered. It is up to them to answer and
take the job. The first one to answer gets it. So, it is not per server.
It is per process that is connected to the gearmand. If you have your
workers spawn sanely, it is likely this is a non-issue in production.

The question is, why do you feel the need to micromanage the job
dispatching? If you don't have enough work for two servers, it seems
silly to worry about load balancing.

Brian.
--------
http://brian.moonspot.net/
> --
> You received this message because you are subscribed to the Google
> Groups "Gearman" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to gearman+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Yang Peng

unread,
May 7, 2013, 2:49:21 PM5/7/13
to gea...@googlegroups.com, Yang Peng
Yeah, I agree with your points.

My problem is I do have a lot of work for the server to do: each job will take a long time to complete (the job is both CPU- and IO- bounded), and a server cannot handle all jobs from multiple clients in parallel, though the server itself is powerful enough.
The more jobs a server takes, the longer time it takes to finish.

So I need two or more servers to handle are job requests from clients, and I want jobs to be balanced to different servers so all jobs can be finished quickly and probably close to each other.
If jobs cannot be balanced to different servers, it may be possible that a server is fully loaded, but the other one is in a very low CPU usage.


Yang

Shane Harter

unread,
May 7, 2013, 2:58:39 PM5/7/13
to gea...@googlegroups.com
You try to have a server fail a job if its load factor is too high. Gearmand would then try to re-start it. You may find this creates too much artificial scarcity and is inefficient. But it would be a pretty simple thing to test. If I were you I'd code up a prototype implementation in the time it takes to drink a cup of coffee and then test against it for the rest of the day and see how it performs.

Brian Moon

unread,
May 7, 2013, 3:02:54 PM5/7/13
to gea...@googlegroups.com, Yang Peng
Are you seeing this in production or are you afraid it is going to
happen? Usually people are afraid this is going to happen and never
actually see it.

I would also say that if your service degrades, the server does not in
fact have enough resources to do that many jobs.

Brian.
--------
http://brian.moonspot.net/
> > an email to gearman+u...@googlegroups.com <javascript:>.
> > For more options, visit https://groups.google.com/groups/opt_out
> <https://groups.google.com/groups/opt_out>.

Yang Peng

unread,
May 7, 2013, 4:40:46 PM5/7/13
to gea...@googlegroups.com, Yang Peng
Yeah, unfortunately, I saw this in production, or I should say validation.

I set up two gearman-servers on two server machines respectively, let 80 worker instances run on each server machine, but let all of the 160 workers register on both gearman-servers.
So basically, gearman-servers can schedule jobs to all 160 workers.

I created 80 clients and ask them to send requests to both servers, and monitored each server to see how they respond to requests.
I did this test over 10 times, and I always saw unbalanced job dispatching, approximate distribution values are about 30% and 70% over two servers.

Gearman does solve most of my problems, and that's the primary reason I'm inclined to use it in production, but I really cannot find a clear description/document about how it achieves 'load balancing'.
To me, CPU load balancing problem itself is a difficult problem to tackle, and I want gearman to help me addressing it. I'm lazy to design the algorithm and code it up.

Any tips, tricks, tweeks to help achieving a better 'load balancing' would be appreciated!

Thanks,
Yang

Brian Moon

unread,
May 7, 2013, 6:17:06 PM5/7/13
to gea...@googlegroups.com, Yang Peng
I have my workers run for an hour max and then restart. I also randomize
the order in which they add jobs to their "can do" list. All these
things helps to keep my workers balanced across the gearmand and it is
never a problem.

I really think that after a week of running, this will be a non-issue
for you. Things will settle down and stuff will naturally happen.

Also, if you have 80 jobs, why start 160 workers? Tests like this are
set up to fail. They are too perfect.

Brian.
--------
http://brian.moonspot.net/

On 5/7/13 3:40 PM, Yang Peng wrote:
> Yeah, unfortunately, I saw this in production, or I should say validation.

Brian Aker

unread,
May 10, 2013, 11:21:26 PM5/10/13
to gea...@googlegroups.com, Yang Peng
Hi,

On May 7, 2013, at 3:17 PM, Brian Moon <br...@moonspot.net> wrote:

I have my workers run for an hour max and then restart. 

So why are you doing this?

Cheers,
-Brian

Brian Moon

unread,
May 11, 2013, 5:00:35 PM5/11/13
to gea...@googlegroups.com, Brian Aker, Yang Peng
>> I have my workers run for an hour max and then restart.
>
> So why are you doing this?

There are a few reasons.

1. PHP has historically had memory issues in long running scripts. This
is not as true now as it once was however.

2. We have generated files on disk that contain PHP arrays of data. It
is slow changing. Having my workers restart every hour helps to pick up
these changed files without worry.

3. The example in this thread is another. Having my workers staggering a
restart over a few minutes (they don't die until after they have run a
job) keeps their order in the daemon's worker list nice and randomized.

Brian.

Brian Aker

unread,
May 14, 2013, 4:22:42 AM5/14/13
to Brian Moon, gea...@googlegroups.com, Yang Peng
Thanks for the insight.
Reply all
Reply to author
Forward
0 new messages