CM for MUTUAL AWARE Open Switch(es)

senordba

unread,

Apr 19, 2007, 1:18:32 PM4/19/07

to

Hello,

I have 2 mutually aware OSW.

I am in the process of building CMs for them.

I need the CMs to be able to rp_pool_[rem|add]server. I
would prefer the primary osw run it on both and the
companion run it on itself if the primary is down.

Any ideas?

Thanks.

Neal Stack

unread,

Apr 19, 2007, 3:11:06 PM4/19/07

to

Hello,

In OpenSwitch 15, you can call those procedures
directly using the CM APIs:
cm_rp_pool_remserver()
http://infocenter.sybase.com/help/topic/com.sybase.help.openswitch_15.0.oswref/html/oswref/CHDDJDHA.htm

Thanks,
Neal

senordba

unread,

Apr 20, 2007, 10:53:16 AM4/20/07

to

Thanks, Neal for your response. Here is what I got:

Apr 18 18:31:26 2007: NORS2_UAT_OSW: WARN: spid 28:
rp_switch: Received a request from a coordination module
that was not the current coordination module. Ignoring
request.
Apr 18 18:31:26 2007: NORS2_UAT_OSW: WARN: spid 28:
coord_set_srv: Received a request from a coordination module
that was not the current coordination module. Ignoring
request.
Apr 18 18:31:26 2007: 16331 - Recursive Calls to
srv__seterr() in spid: 28. Thread terminated.

So, I guess what I'll have to do is connect to the companion
via Open Client.. Thoughts?

Ayo.

Neal Stack

unread,

Apr 20, 2007, 12:06:31 PM4/20/07

to

Hello,

I think the CM/OpenSwitch design in an environment where you have
more than one CM connecting to the OpenSwitch (also known as
concurrent CMs) is such that the OpenSwitch is only suppose to
accept commands from the "active" CM and ignore anything from
the "standby" CMs.

I think they designed it this way to avoid what they call "split
brain" decisions. A "split brain" decision could come about if
you have more than one decision maker telling OpenSwitch what it
should do.

For example, the primary CM is on "server A" and the standby CM is on "server B".
For some reason (network failure), the standby CM on "server B" is unable to
reach ASE1 but the primary CM on "server A" is still able to reach ASE1. So now
you have a split brain decision where the primary CM thinks everything
is OK and the standby thinks that ASE1 is unreachable.

If you have one CM telling OpenSwitch to mark the server down and the
other telling OpenSwitch to mark it up, you have a "split brain".

Not knowing your server/network architecture I don't know if this
is something you could run into. Generally, we recommend that the
CM is installed on the same server as OpenSwitch. We also recommend
that the standby OpenSwitch/CM are on the same LAN (if you had the
primary OpenSwitch in Florida and the standby in New York, the chances
of a network failure between the two are greater than if they were
on a LAN).

So I am not sure I answered your question (I'm not really sure I
understand your environment and what you are trying to accomplish).

Thanks,
Neal

senordba

unread,

May 2, 2007, 5:25:31 PM5/2/07

to

Our prod environment is like so:
1 ASE, 1 REP, 1 OSW at HQ
1 ASE, 1 REP, 1 OSW at Disaster Recovery Site(DR).

I want to introduce CM into that mix. Mainly because during
our test, the CFG_STORAGE DBs were not updated in between
tests. One test included shutting down one of the OSW
(application was supposed to connect to the companion. This
happened as planned except, when the osw was restarted it
got its data from the CFG_STORAGE DB which had not been
updated yet)

The env I setup to test the CMs is similar to prod except
most components are in HQ:

1 ASE, 1 REP, 1 OSW, 1 CM in HQ (simulating HQ)

1 OSW, 1 CM in HQ (simulating DR)
1 ASE in DR (simulating DR)

The OSW are companion / mutually aware in both environments.

Each CM listens to both OSW - I start the first, wait for
some time before starting the second. I verified this by
tailing the logs during login attempts through both osw and
indeed one of the CMs handles all connections until I
shutdown one OSW then the CMs split, each handling one OSW.

During ASE FAIL event, the CM execs rp_pool_remserver on
only one OSW, resulting in the "split brain" that you
discussed. When the ASE comes back UP, it is marked
correctly as UP in both OSW, but on one it is moved to 2nd
on the list. The other OSW marks it as up and sends
subsequent logins to it.

My questions:
1. How can I speed up the updates to the CFG_STORAGE DBs?
2. How can I get the callback for asefail to exec for both
osw?

Thank you.

Ayo.

Neal Stack

unread,

May 3, 2007, 3:20:39 PM5/3/07

to

It appears you are trying to use rp_pool_remserver with a Mutual Aware Support OpenSwitch server. The propagation of
rp_pool_remserver to the companion OpenSwitch is NOT supported by Mutual Aware OpenSwitch server(s). This is the reason why our
sample CM1.c application checks if the OpenSwitch is a mutual aware server. If not, then rp_pool_remserver is invoked to remove the
ASE server from the pool.

In this case, you need not call this registered procedure or take any action. Mutual aware server takes care of the split brain
decision by changing the status of the ASE servers appropriately.

CFG_STORAGE cluster tables is an internal feature used by Mutual Aware OpenSwitch server. The OpenSwitch server takes care of
updating these cluster tables whenever required to keep the configurations in sync. There is no handle available for customer
usage with respect to the CFG_STORAGE cluster tables for this reason.