Fault Tolerant DRb?

Kirk Haines

unread,

Jul 1, 2005, 12:25:09 PM7/1/05

to

Just pondering different things this morning, and my mind came back to
something I've thought about now and again.

Assume you are using a DRb service for....something. It doesn't matter what.
The case is the same whether one is accessing an array via DRb or a Rinda
Ring. Is there some reasonably easy way of making a service work in a fault
tolerant way? That is, one could have two processes on two different
machines both offering the same service. If one process dies, the data is
still present on the other, and the clients of that service can continue
operating without data loss?

Kirk Haines

Ara.T.Howard

unread,

Jul 1, 2005, 12:40:37 PM7/1/05

to

i've done tons of ha (high availability) setups before for stateful and
stateless machines. suffice it to say it is almost un-imaginably complex.
consider:

* how to you tell if one machine is down vs. the network just being slow?
for instance on our machines monthly backups might make any machine seem
dead (can't ping) for 20 minutes or more. typically this is solved via a
serial cable between nodes to ping on using real-time priorities.

* if you have the data on both machines and it can EVER be written to
(modified) how to you bring the data back in sync when a machine has died
but is now back up?

these problems are solved - but it's still amazingly hard to get right. check
out the linux-ha project (google it).

depending on you needs you may be able to code something simple that 'good
enough' but you'll need some sort of distributed transaction capability and
the easist way to get that is via a real rdbms like postgresql. however, once
you have that setup it's stilly to use drb unless your data is terrible to
model within the relational model.

feel free to contact me offline if you want to setup an ha box(es).

hth.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================

Shashank Date

unread,

Jul 1, 2005, 12:45:43 PM7/1/05

to

Hi Kirk,

I have written something like this a long time back to build fault tolerant database clusters.
It became pretty messy pretty quick (of course I was not as proficient in Ruby back then ;-)).
So I have some questions:

--- Kirk Haines <kha...@enigo.com> wrote:
> Assume you are using a DRb service for....something. It doesn't matter what.
> The case is the same whether one is accessing an array via DRb or a Rinda
> Ring. Is there some reasonably easy way of making a service work in a fault
> tolerant way? That is, one could have two processes on two different
> machines both offering the same service. If one process dies, the data is
> still present on the other,

^^^^^^^^^^^^^^^^^^^^^^^^^^
How do you propose to ensure that? Is it on a shared file system (like NFS).
If true, then take a look at Ara's rq package:

http://www.codeforpeople.com/lib/ruby/rq/rq-2.3.0/TUTORIAL

If false, then think of some "easy" way of replication.

> and the clients of that service can continue
> operating without data loss?

I had to worry about how the clients who were in the middle of a request would know that the
service is no longer available.

>
> Kirk Haines
>

-- shanko
>

____________________________________________________
Yahoo! Sports
Rekindle the Rivalries. Sign up for Fantasy Football
http://football.fantasysports.yahoo.com

Kirk Haines

unread,

Jul 1, 2005, 2:23:46 PM7/1/05

to

On Friday 01 July 2005 10:40 am, Ara.T.Howard wrote:

> depending on you needs you may be able to code something simple that 'good
> enough' but you'll need some sort of distributed transaction capability and
> the easist way to get that is via a real rdbms like postgresql. however,
> once you have that setup it's stilly to use drb unless your data is
> terrible to model within the relational model.

LOL. All valid points. You never know, though. Sometimes when one asks for
something magical and unlikely, someone else pipes up and delivers. It was
worth a shot. Thanks Ara (and Shashank) for the comments.

Kirk Haines

gwt...@mac.com

unread,

Jul 1, 2005, 3:05:04 PM7/1/05

to

On Jul 1, 2005, at 12:25 PM, Kirk Haines wrote:
> Assume you are using a DRb service for....something. It doesn't
> matter what.
> The case is the same whether one is accessing an array via DRb or a
> Rinda
> Ring. Is there some reasonably easy way of making a service work
> in a fault
> tolerant way?

You might want to take a look at some of the software and ideas at
http://www.cse.cuhk.edu.hk/~xychen/GroupCS/gcs.htm

This page has a great summary of toolkits that implement
"process group communication" or "virtual synchrony". A variety of
toolkits
have evolved and been released in various forms. While I don't know
of any
ruby implementation or wrapper for these ideas/software it would be a
great
project.

The goal of process group communication is to send a series of
messages to a
named group of recipients and ensure that every member of the group
receives
the messages in a globally consistent order in the presence of
communication
and/or hardware failures. From this foundation you can build a
variety of
fault tolerant systems.

Gary Wright

Ara.T.Howard

unread,

Jul 1, 2005, 3:46:20 PM7/1/05

to

http://raa.ruby-lang.org/project/rb_spread/

cheers.

Ara.T.Howard

unread,

Jul 1, 2005, 5:00:34 PM7/1/05

to

On Fri, 1 Jul 2005, Gary Wright wrote:

>
> On Jul 1, 2005, at 3:46 PM, Ara.T.Howard wrote:
>>>
>>
>> http://raa.ruby-lang.org/project/rb_spread/
>
> Cool! After I posted my link I found the main Spread
> site and have been reading about it for the last hour or so.
>
> Now I have something to play with!

i think i may have a patched version of this around... seems like there was a
little buggette or two in it... let me know if you can't get it working and
i'll look for it.