[erlang-questions] Load balancing/multiplexing rpc calls amongst Erlang Nodes

102 views
Skip to first unread message

Joshua Muzaaya

unread,
Oct 22, 2012, 3:43:21 AM10/22/12
to erlang-q...@erlang.org
You can view this question in a better readable version here: http://stackoverflow.com/q/12998345/431620
Here goes:

Building from this question, imagine an application with N Erlang Web Servers, and N/2 Mnesia Database Nodes. The set up is such that the Web Servers, each, runs on its own hardware server (say HP DL385), and each Mnesia Instance, runs on its own hardware Server as well.

Web Servers make rpc:call/4 calls to the back end (the Mnesia DB Servers). The Data is all replicated across all the Mnesia instances. Now, you want to have the calls being made to the Database servers, MULTIPLEXED, more precisely ( by TIME), on each Web Server, so that some kind of LOAD BALANCING is attained.

If Web Server A makes a connection to Mnesia Instance 3, it cannot make the next connection to the same Instance. All Database Nodes need to be kept busy and not having any one of them idle while the others are working. The Load balancing Algorithm should not be random, but should be aimed at balancing the load on the Database Servers.

Qn 1: Come up with your load balancing strategy, in such a situation. Also, please show with some sample illustrative code, how you would implement this strategy.

Qn 2: If a Mnesia Instance goes down, how would your load balance Algorithm adapt to the changes in the cluster ?

Qn 3: Is there any Erlang library aimed at load balancing of Erlang Servers working within the same system, and calling each other via rpc:call/4 ?

Thank you all.



muzaaya_joshua

unread,
Oct 22, 2012, 3:51:48 AM10/22/12
to erlang-q...@erlang.org
Building from this question ( http://stackoverflow.com/q/5339329/431620 ),
imagine an application with N Erlang Web Servers, and N/2 Mnesia Database
Nodes. The set up is such that the Web Servers, each, runs on its own
hardware server (say HP DL385), and each Mnesia Instance, runs on its own
hardware Server as well.

Web Servers make rpc:call/4 calls to the back end (the Mnesia DB Servers).
The Data is all replicated across all the Mnesia instances. Now, you want to
have the calls being made to the Database servers, MULTIPLEXED, more
precisely ( by TIME), on each Web Server, so that some kind of LOAD
BALANCING is attained.

If Web Server A makes a connection to Mnesia Instance 3, it cannot make the
next connection to the same Instance. All Database Nodes need to be kept
busy and not having any one of them idle while the others are working. The
Load balancing Algorithm should not be random, but should be aimed at
balancing the load on the Database Servers.

Qn 1: Come up with your load balancing strategy, in such a situation. Also,
please show with some sample illustrative code, how you would implement this
strategy.

Qn 2: If a Mnesia Instance goes down, how would your load balance Algorithm
adapt to the changes in the cluster ?

Qn 3: Is there any Erlang library aimed at load balancing of Erlang Servers
working within the same system, and calling each other via rpc:call/4 ?



--
View this message in context: http://erlang.2086793.n4.nabble.com/Load-balancing-multiplexing-rpc-calls-amongst-Erlang-Nodes-tp4655158.html
Sent from the Erlang Questions mailing list archive at Nabble.com.
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Paul Peregud

unread,
Oct 22, 2012, 5:22:08 PM10/22/12
to muzaaya_joshua, erlang-q...@erlang.org
May you specify why load balancing should be based on time and can not
be random? Have you implemented random load balancing? Has it proved
to be insufficient?
--
Best regards,
Paul Peregud
+48602112091

Joshua Muzaaya

unread,
Oct 23, 2012, 1:22:25 AM10/23/12
to Yogish Baliga, erlang-q...@erlang.org
Yes,i tries using the random method, but because requests are so frequently many, you find the erlang random generator skewing results making around 50/60 requests hitting one Node while others are waiting. Another thing is that, i am not using gen_servers at the Web Server layer. I am using yaws web server and for each connection, yaws spawns a process, this process communicates with Mnesia Nodes to query for data. But the connections are so many and i wanted to scale the application horizontally, adding more web servers and more mnesia Nodes. I came to think of a load balancing middle ware, abstracting my processes from knowing where the call has hit ( i.e on what mnesia node the call has hit). This middle ware ensures that requests are load balanced across my Mnesia DBs.

That is the background of the problem. Its a real-time Web Notification system, plugged into a major intranet Management System. However, clients are many, and yaws is sustaining 30,000 concurrent connections at low peaks. I am a software engineer in one of the telecommunications companies in Africa. I keep running into a few memory problems on single node yaws server, so i need ti add more web servers to assist. Also, mnesia sometime will get *** Too many DB Tables ** when requests are too many and too frequent. I changed everything to use dirty operations and when i by-passed the transaction manager, things improved a bit.

I need fellow erlangers to think of a load balancing algorithm in such a situation. Do you think a process dictionary like GPROC would be so useful ? i was kinda thinking about it last night, but i wonder how i would apply it in this case.

Having one gen_server to decide where the request may go, might alos slow down the application as all requests will have to go through that gen_server. 



On Tue, Oct 23, 2012 at 2:53 AM, Yogish Baliga <bal...@gmail.com> wrote:
One option is to run proxy gen_server on each Mnesia box and register these gen_server pids with pg2. Now you can do load balancing on pg2 processes based on message queue length as described here


When I used this method in my last project in Erlang, it gave better result than normal round robin. Under very low load, all requests were redirected to single Mnesia instance.


-- baliga



--
*Muzaaya Joshua
Systems Engineer
+256774115170*
*"Through it all, i have learned to trust in Jesus. To depend upon His Word"
*



Paul Peregud

unread,
Oct 23, 2012, 3:45:34 AM10/23/12
to Joshua Muzaaya, Yogish Baliga, erlang-q...@erlang.org
> find the erlang random generator skewing results making around 50/60 requests hitting one Node while others are waiting

This is unusual. I've seen PRNG from random module skewing the results, but never to such extent. Please check if it is properly seeded (random:seed/0 seeds with predefined constant).

If seeding is done properly, then you may want to consider switching to crypto:rand_uniform/2. It's a bit slower, but it produces random numbers of quality better then enough for purpose of load balancing.

Joshua Muzaaya

unread,
Oct 23, 2012, 5:52:45 AM10/23/12
to Paul Peregud, Yogish Baliga, erlang-q...@erlang.org
Thank you so much. let me try that. But is it not possible to have a non-random method ? one that is intelligent and fair on all the nodes

Paul Peregud

unread,
Oct 23, 2012, 7:44:48 AM10/23/12
to Joshua Muzaaya, Yogish Baliga, erlang-q...@erlang.org
I believe that it may be hard. Some possible problems here:

1) you need a way to keep system clock of webcores synchronized

2) you need a way to avoid any "waves of load" that may appear because of slight desynchronization of system clocks, or random events on webcores.

3) you need a way to manage information about number of db cores (number of time slots)

I prefer to rely on Law of large numbers if possible.

Geoff Cant

unread,
Oct 23, 2012, 7:41:37 PM10/23/12
to Joshua Muzaaya, Yogish Baliga, erlang-q...@erlang.org
So one cluster load balancing scheme that worked reasonably well for me at ngmoco was based on gossip with estimation.

Each node has some kind of metric for how loaded it is (in my case number of a certain kind of process running, in your case could be RPCs being evaluated), and a reasonable estimation for how much a remote job request will affect that metric (in my case spawning a new process remotely would increase the process count by one :).

Each node maintains a table of
{node(), LoadMetric, BroadcastTimestamp}.

* It updates its own entry continually (the local metric should be pretty accurate), and broadcasts its metric to other nodes every broadcast period (5s).
* Each node receiving a timely load broadcast (they contain timestamps, discard any that are more than 1 gossip period old) overwrites its load table entry with the load metric and broadcast timestamp (i.e. every 5s, the running estimate of remote load will be brought up to date)
* Every broadcast period (every time you're about to broadcast), scan the load table and delete entries with a timestamp older than 3*broadcast period. -- Ignore nodes that are probably faulty/down
* Each time a node routes/transfers load/sends an RPC to a remote node, it bumps the local load table entry for the remote node -- this is the estimation step.

Now you have a local ets table of size M rows (cluster cardinality) which you can read and make your load balancing decision on. I generally went for lowest loaded node if there is more than one entry and the local node otherwise.

The local estimation step is important if you have a deterministic load balancing function.


You can also extend/modify this implementation to add a graceful-cluster-exit scheme for a node by adding an administrative mechanism to stop a node broadcasting its own load. The other nodes will stop transferring load to it and load on it will eventually finish up.

I'm sorry I don't have an implementation for you to use as I don't have permission to release the code. It's not super-hard to write however.

Cheers,
-Geoff

On 2012-10-23, at 02:52 , Joshua Muzaaya <josh...@gmail.com> wrote:

> Thank you so much. let me try that. But is it not possible to have a
> non-random method ? one that is intelligent and fair on all the nodes
>
> <http://www.linkedin.com/pub/muzaaya-joshua/39/2ba/202>
> Designed with WiseStamp -
> <http://r1.wisestamp.com/r/landing?u=cf16262215eb8784&v=3.11.21&t=1350985854695&promo=10&dest=http%3A%2F%2Fwww.wisestamp.com%2Femail-install%3Futm_source%3Dextension%26utm_medium%3Demail%26utm_campaign%3Dpromo_10>Get
> yours<http://r1.wisestamp.com/r/landing?u=cf16262215eb8784&v=3.11.21&t=1350985854695&promo=10&dest=http%3A%2F%2Fwww.wisestamp.com%2Femail-install%3Futm_source%3Dextension%26utm_medium%3Demail%26utm_campaign%3Dpromo_10>
>>> <http://www.linkedin.com/pub/muzaaya-joshua/39/2ba/202>
>>> Designed with WiseStamp -
>>> <http://r1.wisestamp.com/r/landing?u=cf16262215eb8784&v=3.11.21&t=1350969121119&promo=10&dest=http%3A%2F%2Fwww.wisestamp.com%2Femail-install%3Futm_source%3Dextension%26utm_medium%3Demail%26utm_campaign%3Dpromo_10>Get
>>> yours<http://r1.wisestamp.com/r/landing?u=cf16262215eb8784&v=3.11.21&t=1350969121119&promo=10&dest=http%3A%2F%2Fwww.wisestamp.com%2Femail-install%3Futm_source%3Dextension%26utm_medium%3Demail%26utm_campaign%3Dpromo_10>

Scott Lystig Fritchie

unread,
Oct 24, 2012, 12:33:38 AM10/24/12
to Joshua Muzaaya, erlang-q...@erlang.org
Hi, Joshua. One note of caution about the rpc service, or more
specifically about the 'rex' gen_server that implements the server side
of the client/server remote execution service.

The 'rex' process is a single OTP gen_server and thus can use at most a
single CPU core's worth of computation resource. Experience with using
rpc/rex with Riak has shown that certain workloads can easily overwhelm
the rex server's ability to quickly execute requests from remote
clients. When possible, I'd recommend avoiding using the 'rpc' module
and bypass 'rex' by spawn()/spawn_monitor()/spawn_link()'ing worker
processes directly on a remote node.

-Scott

Joshua Muzaaya

unread,
Oct 24, 2012, 1:51:39 AM10/24/12
to Geoff Cant, Yogish Baliga, erlang-q...@erlang.org
Thank you so much Geoff Cant.   This is great. let me think along this line and try to build something like this.

Joshua Muzaaya

unread,
Oct 24, 2012, 2:24:51 AM10/24/12
to Scott Lystig Fritchie, erlang-q...@erlang.org
indeed, this so helpful of you. You have saved me alot of problems. I was experiencing the same and thought that i was doing things the wrong way. Thank you so much.
On Wed, Oct 24, 2012 at 7:33 AM, Scott Lystig Fritchie <frit...@snookles.com> wrote:
Hi, Joshua.  One note of caution about the rpc service, or more
specifically about the 'rex' gen_server that implements the server side
of the client/server remote execution service.

The 'rex' process is a single OTP gen_server and thus can use at most a
single CPU core's worth of computation resource.  Experience with using
rpc/rex with Riak has shown that certain workloads can easily overwhelm
the rex server's ability to quickly execute requests from remote
clients.  When possible, I'd recommend avoiding using the 'rpc' module
and bypass 'rex' by spawn()/spawn_monitor()/spawn_link()'ing worker
processes directly on a remote node.

-Scott



Bengt Kleberg

unread,
Oct 24, 2012, 3:07:26 AM10/24/12
to erlang-q...@erlang.org
Greetings,

My use case is to start a gen_server on the remote node. I need the Pid
returned in {ok, Pid} from gen_server:start/3 (actually a start_link/3
would be preferable).
Will erlang:spawn/4 return that Pid, as does rpc:call/4, or another pid?


bengt

Scott Lystig Fritchie

unread,
Oct 26, 2012, 2:50:39 AM10/26/12
to bengt....@ericsson.com, erlang-q...@erlang.org
Bengt Kleberg <bengt....@ericsson.com> wrote:

bk> My use case is to start a gen_server on the remote node. I need the
bk> Pid returned in {ok, Pid} from gen_server:start/3 (actually a
bk> start_link/3 would be preferable). Will erlang:spawn/4 return that
bk> Pid, as does rpc:call/4, or another pid?

Hi, Bengt. The public API of gen_server doesn't mention such a service,
as you've noticed. After poking around the internals of R15B02's
gen.erl, there doesn't appear to be a private method, either.

But I think you could steal a little bit of private code from gen:do_spawn()
then modify it to use a version of proc_lib that will do a spawn & link
on a remote node, e.g., any of the spawn() or spawn_link() flavors where
a Node argument is the first argument.

Igor Karymov

unread,
Oct 26, 2012, 5:13:56 PM10/26/12
to erlang-pr...@googlegroups.com, erlang-q...@erlang.org
module pool may be useful for your case:
Reply all
Reply to author
Forward
0 new messages