Advice on extending CTDB to support multiple NIC interfaces per node

Kevin Osborn

unread,

Mar 26, 2014, 11:49:00 AM3/26/14

to

Hi,

We are thinking of extending CTDB to support multiple public NIC interfaces per node and we would like to ask your advice on the correct approach. I will describe what necessitates this feature as well as the areas we expect to have to modify. We would love some advice from the experts, especially if our approach is leading us off into dangerous territory. Thanks in advance for any help you can offer.

Why do we need multiple NIC interfaces per node?
We are working on passing the VMware iSCSI certification test suite and there are some tests that expect at least two physical routes to the same iSCSI target. Each iSCSI target must be hosted by just one node in our architecture. So we will need to add another Ethernet interface to the node to facilitate the VMware failover scheme and pass the certification test.

What failover behavior would we expect?
Each node would host two Ethernet interfaces and two public IP addresses allocated by CTDB. Failure of a single interface would result in the failed IP address being moved to the other interface on the same node. Failure of the entire node would move both IP addresses to some other node, but both would be served up by the same node. The two IP addresses would always be hosted by a single node.

How would we have to change CTDB?
We see the following areas that will need to change to support this new feature.
Failover:
1. Introduce a new tunable that would activate this mode, say MULTI_INTERFACE_PER_NODE. This tunable would activate several new code paths including a new failover mechanism and introduce a new IP allocation scheme.
2. We would not fail an entire node when a single interface goes down. This means that the failover logic needs to be changed from node based to interface based. (This looks like it could get complicated!)
3. Add a new ip_alloc_multi_interface() to the ctdb_takeover_run_core() function
4. We would add a new configuration file that would list the valid ip address tuples that any node can host. This file would be saved on the cluster file system.

I have made this description as brief as possible to sketch out our intentions. We are open to other approaches too. Please feel free to ask detailed questions by contacting me directly.

Thanks again for your help,

-Kevin

Michael Adam

unread,

Mar 26, 2014, 4:57:27 PM3/26/14

to

Hi Kevin,

if I don't get your description wrong, the feature you are requesting
already exists in ctdb.

You can name multiple interfaces after an IP in
public addresses, like this:

10.11.12.1/24 eth1,eth2,eth3

In that case, local failover will be done
if possible. If an interface goes down,
but others are still up, the node will not
be UNHEALTHY but PARTIALLY ONLINE.

Cheers - Michael

signature.asc

Kevin Osborn

unread,

Mar 26, 2014, 5:40:34 PM3/26/14

to

Hi Michael,

Thanks for your help. I tried using the interface list after each IP, but both IP addresses on a node are failed over whenever any single interface goes down on a node. I would have expected the IP from the down interface to have moved to the "up" interface on the same node. Instead all IPs are moved off of that node.

Here is my public_addresses file (same for every node)
192.168.51.161/22 bond1,bond2
192.168.51.162/22 bond1,bond2
192.168.51.163/22 bond1,bond2
192.168.51.171/22 bond2,bond1
192.168.51.172/22 bond2,bond1
192.168.51.173/22 bond2,bond1

Here is the ctdb status from before unplugging a cable from one of the NICs on a node:
Number of nodes:3
pnn:0 192.0.2.59 OK (THIS NODE)
pnn:1 192.0.2.216 OK
pnn:2 192.0.2.24 OK
Generation:1168736351
Size:3
hash:0 lmaster:0
hash:1 lmaster:1
hash:2 lmaster:2
Recovery mode:NORMAL (0)
Recovery master:0

And the output from ctdb ip:
Public IPs on node 0
192.168.51.161 2
192.168.51.162 0
192.168.51.163 1
192.168.51.171 1
192.168.51.172 2
192.168.51.173 0

Now same commands from after bringing an interface down on this node:
Number of nodes:3
pnn:0 192.0.2.59 UNHEALTHY (THIS NODE)
pnn:1 192.0.2.216 OK
pnn:2 192.0.2.24 OK
Generation:100668884
Size:3
hash:0 lmaster:0
hash:1 lmaster:1
hash:2 lmaster:2
Recovery mode:NORMAL (0)
Recovery master:0

Public IPs on node 0
192.168.51.161 2
192.168.51.162 2
192.168.51.163 1
192.168.51.171 1
192.168.51.172 2
192.168.51.173 1

ctdb ifaces
Interfaces on node 0
name:bond2 link:down references:0
name:bond1 link:up references:0

ctdb listvars shows these settings:
DeterministicIPs = 0
LCP2PublicIPs = 0

Thanks again,

-Kevin

Martin Schwenke

unread,

Mar 26, 2014, 7:29:39 PM3/26/14

to

Hi Kevin,

On Wed, 26 Mar 2014 21:40:34 +0000, Kevin Osborn
<kos...@overlandstorage.com> wrote:

> Thanks for your help. I tried using the interface list after each IP, but both IP addresses on a node are failed over whenever any single interface goes down on a node. I would have expected the IP from the down interface to have moved to the "up" interface on the same node. Instead all IPs are moved off of that node.

Please try setting:

CTDB_PARTIALLY_ONLINE_INTERFACES=yes

in your CTDB configuration.

peace & happiness,
martin

Kevin Osborn

unread,

Mar 27, 2014, 10:51:36 AM3/27/14

to

Thanks Martin, that did the trick. Now the IP address just moves from one interface to the other when one link goes down.

Hosting the two IP addresses on the same node is required to allow us to provide two different paths to the same iSCSI targets. This means that these two IP addresses must stay together if the node fails entirely. Because of this we still plan to add a new tunable (but we will probably call it IP_ADDR_TUPLES instead of MULTI_INTERFACE_PER_NODE). We will also add a new ip_alloc_ip_tuple_interface() to the ctdb_takeover_run_core() function. And finally, we would add a new configuration file that would list the valid IP address tuples that any node can host. This file would be saved on the cluster file system.

Thanks again for your help,

-Kevin

-----Original Message-----
From: Martin Schwenke [mailto:mar...@meltin.net]
Sent: Wednesday, March 26, 2014 4:30 PM
To: Kevin Osborn
Cc: Michael Adam; samba-t...@lists.samba.org
Subject: Re: Advice on extending CTDB to support multiple NIC interfaces per node

Martin Schwenke

unread,

Mar 27, 2014, 7:52:33 PM3/27/14

to

Hi Kevin,

On Thu, 27 Mar 2014 14:51:36 +0000, Kevin Osborn
<kos...@overlandstorage.com> wrote:

> Thanks Martin, that did the trick. Now the IP address just moves
> from one interface to the other when one link goes down.

Cool.

> Hosting the two IP addresses on the same node is required to allow
> us to provide two different paths to the same iSCSI targets. This
> means that these two IP addresses must stay together if the node
> fails entirely. Because of this we still plan to add a new tunable
> (but we will probably call it IP_ADDR_TUPLES instead of
> MULTI_INTERFACE_PER_NODE). We will also add a new
> ip_alloc_ip_tuple_interface() to the ctdb_takeover_run_core()
> function. And finally, we would add a new configuration file that
> would list the valid IP address tuples that any node can host. This
> file would be saved on the cluster file system.

Not that I can think of a better way of doing what you're saying, but
this sounds like it might be difficult. The main issues are:

* Public IP address configuration can be heterogeneous across nodes in
the cluster. The current IP allocation framework collects a list of
IPs that can be hosted from each node.

So you might need to extend the information that is collected from
all nodes. We have long-term plans to do this (e.g. include allowed
interfaces for each IP address). It could be that different IP
allocation algorithms would collect different information.

Both the main daemon and the recovery daemon have knowledge of the
public IP address configuration, so this makes the task even more
complex. In the future we're hoping to break the public IP address
handling out into a separate daemon, which would do IP address
allocation and consistency checking. The main daemon and the
recovery daemon would then have no knowledge of IP addresses. At
that point adding more information to share between nodes would
become easier. It would also become easier to simply replace the
whole public IP daemon with one that makes different assumptions.
We're a couple of steps away from doing this... perhaps some time
this year.

* I would encourage you not to save configuration in the cluster
filesystem. When cluster filesystem performance hits its limits
then CTDB would be unable to reload the configuration. Also, if you
assume a common file then you break the assumption of heterogeneous
IP configuration. At a minumum can you please make the location of
the file configurable?

In fact, how is this for a crazy but mostly backward compatible
hack? Extend the current public addresses file to allow multiple IP
addresses:

ip-address iface-list [ip-address iface-list ...]

You could extend the ctdb_vnn structure to link to a list of "slave"
ctdb_vnn structures, which would contain all but the first IP from
each line. The IP allocation stuff could just (continue to) work
with the primary IP, since the other IPs have to follow
(i.e. whenever the primary is taken/released then all the slaves
would follow - that could be implemented in the main daemon). To
keep certain functions simple (e.g. killtcp/tickle handling) the
slave ctdb_vnn's could also be in the main list but could be tagged
as slaves (so that IP allocation ignores them). There are some
potential wrinkles in the public IP consistency checking that the
recovery daemon does...

I clearly need to think this through more but it might work. :-)

peace & happiness,
martin