Load Balancing

35 views
Skip to first unread message

bcarey

unread,
Dec 21, 2009, 4:24:47 PM12/21/09
to gearman
Hi all,

Is it possible to put gearmand instances behind a load balancer in
order to distribute the requests being sent to it? I'm unsure of how
the job server interacts with the clients to send the responses back
to them from the workers. Are jobs "self contained", meaning they
know how to get back to the originating client or is that handled by
the gearmand process based on where the connection came from (for
instance, by IP addresss)?

Thanks for any insight you can provide!

Brian

Brian Moon

unread,
Dec 21, 2009, 5:14:06 PM12/21/09
to gea...@googlegroups.com, bcarey
We have 3 instances behind an F5 Networks BIG-IP. Works like a charm.
We only do background work with no checking or foreground process. If
you submitted jobs and then wanted to check status later, you would need
to get back to the server you sent the job to. No easy way to do that.

Brian.
--------
http://brian.moonspot.net/

bcarey

unread,
Dec 21, 2009, 6:03:39 PM12/21/09
to gearman
That is all we need to do at the current time and we also use Big-
IP's. Do you happen to have any specific configuration options that
seem to work best when you setup the pool/nodes? For instance, what
check do you use for the health monitor? Also, do you happen to have
any favorite load balancing method?

Thanks Brian.

- Brian

Abe Hassan

unread,
Dec 21, 2009, 6:13:29 PM12/21/09
to gea...@googlegroups.com
In our experience it's a lot better to just let the client take care of it. We configure the application to know about all the gearman servers, and then the client will choose among the servers.

The client behaves properly when a server is down, so there's generally little need to have a BigIP to be able to take nodes out of rotation. Also, if you use uniq keys for your jobs, your clients will hash all jobs with the same uniq to the same server (so they're coalesced down into one job). And it's one less piece of machinery between client and server.

(Though admittedly in my mind the only drawback of this approach is that your list of gearman servers isn't centralized, but configuration management takes care of that for us.)

-- Abe

Brian Moon

unread,
Dec 21, 2009, 6:35:01 PM12/21/09
to gea...@googlegroups.com, bcarey
fast_tcp for the TCP profile, just a tcp health check and we use
observed unless we have a reason not to.

Brian.
--------
http://brian.moonspot.net/

Brian Moon

unread,
Dec 21, 2009, 6:36:27 PM12/21/09
to gea...@googlegroups.com, Abe Hassan
> The client behaves properly when a server is down, so there's
> generally little need to have a BigIP to be able to take nodes out of
> rotation. Also, if you use uniq keys for your jobs, your clients will
> hash all jobs with the same uniq to the same server (so they're
> coalesced down into one job). And it's one less piece of machinery
> between client and server.

This is important to keep in mind. If you have a lot of duplicate jobs,
then this could be important. You could probably use an iRule to do
this, but who wants to write that TCL script?

Brian.

Brian Moon

unread,
Dec 21, 2009, 6:38:19 PM12/21/09
to gea...@googlegroups.com, Abe Hassan
On 12/21/09 5:36 PM, Brian Moon wrote:
>> The client behaves properly when a server is down, so there's
>> generally little need to have a BigIP to be able to take nodes out of
>> rotation. Also, if you use uniq keys for your jobs, your clients will
>> hash all jobs with the same uniq to the same server (so they're
>> coalesced down into one job). And it's one less piece of machinery
>> between client and server.

Oh, Abe, you assume too much about all the libs out there. =) I should
fix that.

/**
* Get a connection to a Gearman server
*
* @return resource A connection to a Gearman server
*/
protected function getConnection()
{
return $this->conn[array_rand($this->conn)];
}


Brian.

Abe Hassan

unread,
Dec 21, 2009, 6:39:20 PM12/21/09
to Brian Moon, gea...@googlegroups.com

On Dec 21, 2009, at 3:38 PM, Brian Moon wrote:

> On 12/21/09 5:36 PM, Brian Moon wrote:
>>> The client behaves properly when a server is down, so there's
>>> generally little need to have a BigIP to be able to take nodes out of
>>> rotation. Also, if you use uniq keys for your jobs, your clients will
>>> hash all jobs with the same uniq to the same server (so they're
>>> coalesced down into one job). And it's one less piece of machinery
>>> between client and server.
>
> Oh, Abe, you assume too much about all the libs out there. =) I should fix that.

Oh no. Yeah everything I said applies to the Perl client, Gearman::Client. :)

BUSTARRET, Jean-francois

unread,
Dec 22, 2009, 11:38:23 AM12/22/09
to gea...@googlegroups.com

As far as we are concerned, we added a HTTP test (GET /test.html going to an apache instance running on the server) to the standard TCP check.

This makes it easy to enable/disable a server for maintenance purposes.

Jean-François
http://www.wat.tv

-----Message d'origine-----
De : gea...@googlegroups.com [mailto:gea...@googlegroups.com] De la part de bcarey
Envoyé : mardi 22 décembre 2009 00:04
À : gearman
Objet : Re: Load Balancing

mjy

unread,
Jan 6, 2010, 8:32:12 AM1/6/10
to gearman
On Dec 22 2009, 12:13 am, Abe Hassan <ahas...@sixapart.com> wrote:
> In our experience it's a lot better to just let the client take care of it. We configure the application to know about all the gearman servers, and then the client will choose among the servers.
>
> The client behaves properly when a server is down, so there's generally little need to have a BigIP to be able to take nodes out of rotation. Also, if you use uniq keys for your jobs, your clients will hash all jobs with the same uniq to the same server (so they're coalesced down into one job). And it's one less piece of machinery between client and server.

Could you provide a few more details about this solution? Gearman::XS
+ libgearman at least do not seem to handle a failed gearmand (of
several) as I would have expected. If run_tasks() returns
GEARMAN_TIMEOUT (while completing a few jobs successfully), calling it
again causes all of the outstanding (due to the timeout/dead server)
jobs to be sent to the dead gearmand again (i.e. none are reassigned).
If the outstanding tasks are added again manually before the second
run_tasks(), some of them still get assigned to the dead server, i.e.
it seems that the client library ignores the fact that one gearmand is
dead and it doesn't even provide information about which server is
dead.

So I can see no cheap solution for getting all tasks completed with a
dead gearmand, I would probably have to set up different client
instances with 1 gearmand each just to detect which one is dead (e.g.
with echo()), then remove that and resubmit the outstanding tasks -
with a maximum total latency of my timeout value times number of
servers, which is bad. If I just use multiple Gearman::XS::Client
instances with 1 gearmand each and assign the tasks between them
manually, I can detect the dead server immediately, but that somewhat
defeats the purpose of being able to set multiple gearmand's for one
client instance.

If those are the only solutions using the client libraries, it's
probably better to use something like wackamole to let another
gearmand handle the request going to the dead gearmand's IP (then
retrying with just run_tasks() should do the job), so I'd really like
to see how your client handles this situation.

-mjy

Abe Hassan

unread,
Jan 6, 2010, 12:04:37 PM1/6/10
to gea...@googlegroups.com

On Jan 6, 2010, at 5:32 AM, mjy wrote:

> On Dec 22 2009, 12:13 am, Abe Hassan <ahas...@sixapart.com> wrote:
>> In our experience it's a lot better to just let the client take care of it. We configure the application to know about all the gearman servers, and then the client will choose among the servers.
>>
>> The client behaves properly when a server is down, so there's generally little need to have a BigIP to be able to take nodes out of rotation. Also, if you use uniq keys for your jobs, your clients will hash all jobs with the same uniq to the same server (so they're coalesced down into one job). And it's one less piece of machinery between client and server.
>
> Could you provide a few more details about this solution? Gearman::XS
> + libgearman at least do not seem to handle a failed gearmand (of
> several) as I would have expected. If run_tasks() returns
> GEARMAN_TIMEOUT (while completing a few jobs successfully), calling it
> again causes all of the outstanding (due to the timeout/dead server)
> jobs to be sent to the dead gearmand again (i.e. none are reassigned).
> If the outstanding tasks are added again manually before the second
> run_tasks(), some of them still get assigned to the dead server, i.e.
> it seems that the client library ignores the fact that one gearmand is
> dead and it doesn't even provide information about which server is
> dead.

Brian pointed out in an earlier reply that at least part of what I was referring to was specific to the perl Gearman::Client, it looks like the retry behavior might be specific to that client too. It just retries another server if it's not able to connect to one. Unfortunately, I don't actually know how other clients behave in that situation though.

mjy

unread,
Jan 7, 2010, 7:26:19 AM1/7/10
to gearman
On Jan 6, 6:04 pm, Abe Hassan <ahas...@sixapart.com> wrote:
> Brian pointed out in an earlier reply that at least part of what I was referring to was specific to the perl Gearman::Client, it looks like the retry behavior might be specific to that client too. It just retries another server if it's not able to connect to one. Unfortunately, I don't actually know how other clients behave in that situation though.

I guess this behaviour is completely implementation-dependent then and
we cannot really rely on it as we want to keep the flexibility to
switch at any time should we encounter problems with one client
library. We've recently switched from the Perl Gearman::Client to the
XS variant for newer projects because the former had some strange
error handling (google for "INITIAL SUBMIT FAILED" to see what I mean
- has been fixed in r451, still not on CPAN) and because the C-
gearmand has brought a huge performance boost and we'd like to stick
to its corresponding client library. If the retry behaviour is so much
better, we might switch back though (although I vaguely recall getting
all tasks submitted twice with 2 gearmand's using older Perl clients).
Sorry for hijacking this thread and thanks for the info. ;-)

-mjy

Reply all
Reply to author
Forward
0 new messages