Quorum strategy

Marc Van Dyck

unread,

May 22, 2019, 4:45:03 AM5/22/19

to

Our production environment is made of 3 clusters of two members each,
with a quorum disk. Each member has one vote, the quorum disk also,
so expected votes is 3 and the quorum is 2.

Now, in our environment, one of the two cluster members is more
important than the other because there are application parts that
can run on this one only. We call this the primary member. The other
one
is the secondary member.

If case of system failure, if it is the secondary member that fails, we
can just switch the applications that ran on it to the primary. The
only price to pay will be a load more difficult to handle.

If the primary member fails and can't be restarted, we shut down the
secondary too and restart it as primary. All data are on SAN storage,
including the system disk. So it is just a matter of shutting down,
change the boot flags, and restart.

Before you start with that, yes, I know that this is bad, that
applications should not behave like that, but that's how it is and I
can't change it. What I can do is minimize the occurrences of that
situation.

So, in case of the cluster connection manager will have to make a
choice
between the two members (for example if the two nodes can't talk with
each other anymore), I'd like to make sure that it's the primary member
that will always survive.

I found on the 'net an old presentation explaining how the connection
manager works. In such cases, it seems that it will first try to
maximize the number of surviving nodes, and if that's not enough to
make a choice, it will then try to maximize the number of remaining
votes.

So, say that I give

- 2 votes on the primary member
- 2 votes on the the quorum disk
- 1 vote on the secondary member

Expected votes is 5, quorum is 3.

If I lose the

- primary member, I keep 2 + 1 votes => quorum kept, cluster survives
- secondary member, I keep 2 + 2 votes => quorum kept, cluster survives
- quorum disk, I keep 2 + 1 votes => quorum kept, cluster survives

So this would work at least as well as the existing 1/1/1 votes config
that I have now.

But if primary and secondary members lose contact with each other,
one of the two will be ejected, and as primary + quorum = 4 votes
and secondary + quorum = 3 votes, the primary member should always be
kept, and the secondary one ejected.

First question : did I get that right ?
Second question : did I miss anything ?

Many thanks,
Marc.

--
Marc Van Dyck

John E. Malmberg

unread,

May 22, 2019, 9:13:21 AM5/22/19

to

On 5/22/2019 3:44 AM, Marc Van Dyck wrote:
> Our production environment is made of 3 clusters of two members each,
> with a quorum disk. Each member has one vote, the quorum disk also,
> so expected votes is 3 and the quorum is 2.
>
> Now, in our environment, one of the two cluster members is more
> important than the other because there are application parts that
> can run on this one only. We call this the primary member. The other one
> is the secondary member.
>
> If case of system failure, if it is the secondary member that fails, we
> can just switch the applications that ran on it to the primary. The
> only price to pay will be a load more difficult to handle.
>
> If the primary member fails and can't be restarted, we shut down the
> secondary too and restart it as primary. All data are on SAN storage,
> including the system disk. So it is just a matter of shutting down,
> change the boot flags, and restart.

<snip>

>
> So, say that I give
>
> - 2 votes on the primary member
> - 2 votes on the the quorum disk
> - 1 vote on the secondary member

The quorum disk is only needed if you need the secondary to survive for
a period of time after the primary system fails, such as long enough for
an orderly shutdown or do other diagnostics.

If you do not give equal votes to the two systems and quorum disk, when
the primary system goes down, the secondary system will hang until
either the primary rejoins the cluster, or you adjust the number of
votes on the secondary via the console.

If this hang of the secondary is is acceptable, and your recovery is to
crash it instead of an orderly shutdown, then your quorum disk is not
doing anything useful in that configuration. Essentially you have the
same thing as 1 vote on the primary member, 0 votes on the secondary
member, and no quorum disk.

Regards,
-John
wb8...@qsl.net_work

Hans Bachner

unread,

May 22, 2019, 12:21:42 PM5/22/19

to

> [snip]

With the vote distribution described by Marc, imho the secondary system
will happily continue to run as long as it has access to the quorum disk
even if the primary node goes down.

Hans.

Stephen Hoffman

unread,

May 22, 2019, 1:08:44 PM5/22/19

to

On 2019-05-22 08:44:59 +0000, Marc Van Dyck said:

> First question : did I get that right ?

Clustering does have a concept of a primary for the connection manager,
but does not otherwise present that to the app-level processing.

In the absence of the possibility to update the code, and in the
absence of the possibility of writing some code to automatically
perform the app transition from secondary to primary, or to—for
instance—scan for the lowest-addressed host when more than one host is
present in the cluster...

The whole reason the quorum disk was implemented was to avoid the need
for a primary-secondary configuration. And that primary-secondary
configuration is what you want here.

Give one vote to the cluster member host that you're referring to as
the primary. Remove the quorum disk. Give no votes for the secondary.

The secondary is operating as only a little more than a warm standby,
here. Yes, it can be configured to run some apps.

Since you reboot to promote the secondary to primary, that reboot can
be with an added vote, and this only when the primary can be left
hard-down.

Could well be one boot root to boot as secondary, and another boot root
to boot as primary. Or write a manual process that tweaks the relevant
system parameters.

Downside: if the failed primary reboots and rejoins the party, you'll
have two hosts that both want to be "primary". Much like adding code
for the transition, it's also possible to code some checks at startup
to detect and block a second host from becoming primary.

--
Pure Personal Opinion | HoffmanLabs LLC

Hans Bachner

unread,

May 22, 2019, 4:18:45 PM5/22/19

to

Stephen Hoffman schrieb am 22.05.2019 um 19:08:
[...]

> Give one vote to the cluster member host that you're referring to as the
> primary. Remove the quorum disk. Give no votes for the secondary.
>
> The secondary is operating as only a little more than a warm standby,
> here. Yes, it can be configured to run some apps.

[...]

This is not what he wants:

Marc Van Dyck schrieb am 22.05.2019 um 10:44:
> So, in case of the cluster connection manager will have to make a choice
> between the two members (for example if the two nodes can't talk with
> each other anymore), I'd like to make sure that it's the primary member
> that will always survive.

I believe Marc also mentioned somewhere (can't find that right now...)
that the secondary node could/should be used for some
analysis/troubleshooting tasks before it is rebooted. This would not
work if only the primary node had a vote. It even could not be cleany
shut down before rebooting it as the "new primary".

Hans.

Stephen Hoffman

unread,

May 22, 2019, 5:35:02 PM5/22/19

to

On 2019-05-22 20:18:44 +0000, Hans Bachner said:

> I believe Marc also mentioned somewhere (can't find that right now...)
> that the secondary node could/should be used for some
> analysis/troubleshooting tasks before it is rebooted.

What I'm referring to as the secondary can do that.

If the secondary is wedged while the primary is down, the secondary can
be manually selected to continue its processing using the IPLC handler.

Or rebooting the secondary as primary, or rebooting in a degraded or
restricted configuration pending promotion to primary.

That IPLC processing could conceivably be automated using outboard
keep-alive processing, and a whole lot of testing.

> This would not work if only the primary node had a vote.

Sure it would.

> It even could not be cleany shut down before rebooting it as the "new primary".

I stated that the reboot would require parameter changes to adjust the votes.

Would I want to design apps and a cluster configuration to work this way? No.

App changes would provide better results.

But the no-changes requirements precludes other approaches.

There is no good way to solve this within the constraints.

I'd like to believe we can still have isolated systems running static
configurations and forever and with no changes. A very few folks
actually do manage that, too. Most of us are increasingly running with
less-than-fully-isolated and increasingly connected servers, and many
of these configurations with little or no control over peer servers and
even less over clients. In recent years, software is no more "done"
than lawns are "mowed". We've done remote software upgrades on Mars,
after all. https://www.nasa.gov/jpl/msl/mars-rover-curiosity-20131220/

Marc Van Dyck

unread,

May 23, 2019, 3:04:20 AM5/23/19

to

Hans Bachner explained :

Which is indeed what we want to achieve. It's better for us to shutdown
the secondary in an orderly fashion, at a time of our choosing, rather
than have it crash abruptly.

--
Marc Van Dyck

Marc Van Dyck

unread,

May 23, 2019, 3:08:09 AM5/23/19

to

Stephen Hoffman formulated the question :

We don't want the secondary to crash or hang if the primary fails.
First
because each application abmormally interrupted requires some work to
restart it, and second, because it is always easier to trobleshoot if
we still have a vms instance alive. So giving a zero vote to the
secondary is not viewed as a good option. If it was that simple, it
would be implemented already.

--
Marc Van Dyck

Marc Van Dyck

unread,

May 23, 2019, 3:10:17 AM5/23/19

to

on 22/05/2019, Hans Bachner supposed :

Yes indeed. And also let the applications running on the secondary
stop gracefully with no recovery work necessary before restart. We
have enough work with those that crashed with the primary already.

--
Marc Van Dyck

Marc Van Dyck

unread,

May 23, 2019, 3:17:13 AM5/23/19

to

>
> Would I want to design apps and a cluster configuration to work this way?
> No.
>

Me neither. The design of those apps is 30 years old, I wasn't even
working here when that was done. The stability of the OpenVMS ecosystem
is such that this design never required to be changed.

> App changes would provide better results.
>

Totally agree with you there, but it's not me who decides. Basically
the
applications contain the node name hardcoded in them all over the
places
and changing that is expected to take months of work. Limited
development workforce, limited deep OpenVMS skills (harder and harder
to find, if we ever quit OpenVMS, this is likely to be the main
reason),
and always more urgent functional needs...

> But the no-changes requirements precludes other approaches.
>
> There is no good way to solve this within the constraints.
>

--
Marc Van Dyck