Clustering and zones

Ceri Davies

unread,

Aug 25, 2006, 5:00:48 AM8/25/06

to

Before I start chasing rainbows, does anyone have experience of running
either Veritas or Sun cluster services with resource groups in zones?

Part of me wonders that it might be easier to just make the zone a
resource group and fail the whole zone over when required, but could
that actually work?

Thanks for any tales of woe/joy with this!

Ceri
--
That must be wonderful! I don't understand it at all.
-- Moliere

Frank Fegert

unread,

Aug 25, 2006, 6:42:02 AM8/25/06

to

On 2006-08-25, Ceri Davies <ceri_...@submonkey.net> wrote:
> Before I start chasing rainbows, does anyone have experience of running
> either Veritas or Sun cluster services with resource groups in zones?
>
> Part of me wonders that it might be easier to just make the zone a
> resource group and fail the whole zone over when required, but could
> that actually work?
>
> Thanks for any tales of woe/joy with this!

Yes, it works. The setup is rather painless, just follow
the walkthrough in the "Sun Cluster Data Service for Solaris
Containers Guide" available here:
http://docs.sun.com/app/docs/coll/1124.4

Still, you can't avoid writing some kind of failure detection
logic which tells SC a failure has occurred and it needs
to failover the Zone. Plus the failover time might increase
due to the fact, that a whole zone has to be stopped and
started, not just a single/couple of services.

Regards,

Frank

Ceri Davies

unread,

Aug 25, 2006, 7:37:08 AM8/25/06

to

On 2006-08-25, Frank Fegert <fra.no...@gmx.de> wrote:
> On 2006-08-25, Ceri Davies <ceri_...@submonkey.net> wrote:
>> Before I start chasing rainbows, does anyone have experience of running
>> either Veritas or Sun cluster services with resource groups in zones?
>>
>> Part of me wonders that it might be easier to just make the zone a
>> resource group and fail the whole zone over when required, but could
>> that actually work?
>>
>> Thanks for any tales of woe/joy with this!
>
> Yes, it works. The setup is rather painless, just follow
> the walkthrough in the "Sun Cluster Data Service for Solaris
> Containers Guide" available here:
> http://docs.sun.com/app/docs/coll/1124.4

Thanks, that sounds promising!

> Still, you can't avoid writing some kind of failure detection
> logic which tells SC a failure has occurred and it needs
> to failover the Zone. Plus the failover time might increase
> due to the fact, that a whole zone has to be stopped and
> started, not just a single/couple of services.

I had worried about that. Perhaps it would be possible to run the agents
in the global zone, and just fail them to a different zone in
the usual way.

Thanks again for your input here, it's encouraging to know that I won't
be wasting my time looking into this.

Frank Fegert

unread,

Aug 25, 2006, 7:58:56 AM8/25/06

to

On 2006-08-25, Ceri Davies <ceri_...@submonkey.net> wrote:
> On 2006-08-25, Frank Fegert <fra.no...@gmx.de> wrote:
>> Still, you can't avoid writing some kind of failure detection
>> logic which tells SC a failure has occurred and it needs
>> to failover the Zone. Plus the failover time might increase
>> due to the fact, that a whole zone has to be stopped and
>> started, not just a single/couple of services.
>
> I had worried about that. Perhaps it would be possible to run the agents
> in the global zone, and just fail them to a different zone in
> the usual way.

Are you talking about services/processes without a HA
agent available, or ones with HA agent available? As
far as i know, you can up to now only fail over whole
zones, not single services/processes within zones. If
you want single service failover AFAIK this is only
supported in the global zone. But make sure to check
with the docs ;-)

Regards,

Frank

Ceri Davies

unread,

Aug 25, 2006, 8:50:08 AM8/25/06

to

On 2006-08-25, Frank Fegert <fra.no...@gmx.de> wrote:
> On 2006-08-25, Ceri Davies <ceri_...@submonkey.net> wrote:
>> On 2006-08-25, Frank Fegert <fra.no...@gmx.de> wrote:
>>> Still, you can't avoid writing some kind of failure detection
>>> logic which tells SC a failure has occurred and it needs
>>> to failover the Zone. Plus the failover time might increase
>>> due to the fact, that a whole zone has to be stopped and
>>> started, not just a single/couple of services.
>>
>> I had worried about that. Perhaps it would be possible to run the agents
>> in the global zone, and just fail them to a different zone in
>> the usual way.
>
> Are you talking about services/processes without a HA
> agent available, or ones with HA agent available? As
> far as i know, you can up to now only fail over whole
> zones, not single services/processes within zones.

Here's a concrete example of what I'm thinking of. Our current setup is
with the Veritas cluster suite, so if the terminology differs or clashes
with SC, then I'm using the Veritas terminology. :)

We currently have 3 application stacks, each with an Oracle database and
an application server. Currently these sit on 6 machines, in 3 clusters
which each have 2 resource groups named foo_ora and foo_app. I want to
reduce the number of machines if possible.

The database conciliation is pretty straightforward. For the
application servers, I'd like to reduce this down to a zone per
application server, on two physical machines. I'm currently wondering
if I can run three zones on each host, and have the services failover
between those zones, just as they could if the failover group was spread
between two physical machines. ie:

+-- Host 1 ---+ +--- Host 2 --+
| | | |
| +-zone1--+ | | +--zone2-+ |
| | appA<-+--+--failover--+-+-> appA | |
| +--------+ | | +--------+ |
| | | |
| +-zone3--+ | | +--zone4-+ |
| | appB<-+--+--failover--+-+-> appB | |
| +--------+ | | +--------+ |
| | | |
| +-zone5--+ | | +--zone6-+ |
| | appC<-+--+--failover--+-+-> appC | |
| +--------+ | | +--------+ |
| | | |
+-------------+ +-------------+

ie, all six zones are up all the time, but the application resource
groups fail over. It sounds like you are saying below that it isn't
supported, but I wonder if running the agents in the global zone is and
whether it would work (we'd probably need to do this for the disk
volumes to work anyway). For an Oracle agent, a simple 'select 1 from
dual' or similar could probably be made to work, and some of our other
agents are pretty stupid too (does the process ID in this file exist?),
but I guess we'd run into trouble with anything more sophisticated.

> If
> you want single service failover AFAIK this is only
> supported in the global zone. But make sure to check
> with the docs ;-)

Will do. Perhaps I'm overcomplicating this anyway, and throwing "zlogin
zone1 " in front of all the application resource startups will work!

Thanks again.

Frank Fegert

unread,

Aug 25, 2006, 3:15:57 PM8/25/06

to

Ah, ASCII art says it all ;-) There is apparently a way to
do the above with a "multiple-master" zone setup in SC:

"You can configure Sun Cluster HA for Solaris Containers as a failover
service or a multiple-masters service. You cannot configure Sun Cluster
HA for Solaris Containers as a scalable service."
(http://docs.sun.com/app/docs/doc/819-2664/6n4uhp5gm?a=view)

The drawback is, that you'll have to take care of the load-
balanced or failed over access yourself:

"The difference between scalable and multiple masters configuration is
only in the way, the clients access the cluster nodes. In a scalable
configuration, they access the shared address. Otherwise the clients
access the physical hostnames."
(http://docs.sun.com/app/docs/doc/819-1085/6n3ffttap?q=multiple+masters&a=view)

Whereas a scalable RG (which is up to now not supported with
zones) would relief you of that pain. The amount of pain
depends largely upon the braindeadness of your client and
app-servers ;-) If you're already using loadbalancers in
front of your apps/servers, i'd say it'd be minimal.

Regards,

Frank

Ceri Davies

unread,

Aug 27, 2006, 5:44:10 PM8/27/06

to

On 2006-08-25, Frank Fegert <fra.no...@gmx.de> wrote:

> Whereas a scalable RG (which is up to now not supported with
> zones) would relief you of that pain. The amount of pain
> depends largely upon the braindeadness of your client and
> app-servers ;-) If you're already using loadbalancers in
> front of your apps/servers, i'd say it'd be minimal.

Thanks again for your insight and help, Frank, you've been very helpful.
Watch this space for more dumb questions in the near future! ;-)