Can't get NIC Bonding with active-backup working

Shaun

unread,

Jan 23, 2012, 7:20:02 AM1/23/12

to

Hi all,

Have tried to get NIC Bonding working as per wiki.debian.org/Bonding.

Each NIC is connected to a different switch for redundancy rather than
bandwidth purposes (insulate against a switch failure). I'm using the
active-backup mode for HA failover.

output from cat /etc/network/interfaces

auto bond0
iface bond0 inet static
address 192.168.166.164
netmask 255.255.255.240
network 192.168.166.160
gateway 192.168.166.161
slaves eth0 eth1
bond_mode active-backup
bond_miimon 100
bond_downdelay 200
bond_updelay 200

output from cat /proc/net/bonding/bond0 as follows:

Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: b8:ab:6f:92:eb:c3

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: b8:ab:6f:92:eb:c4

If I then pull a cable (or use ifconfig eth0 down) I get the following
in the syslog:

Jan 23 11:21:50 host-1 kernel: [55852.565975] bonding: bond0: link
status down for active interface eth0, disabling it in 200 ms.
Jan 23 11:21:51 host-1 kernel: [55852.761549] bonding: bond0: link
status definitely down for interface eth0, disabling it
Jan 23 11:21:51 host-1 kernel: [55852.761555] bonding: bond0: making
interface eth1 the new active one.

All looks good... but... ping from host-1 produces Destination host
unreachable (with the icmp errors coming from the IP of the bond0 device
itself). And my remote ssh session dies. Good job I have KVM access :)

So it's not working. This setup seems so simple I can't see where
anything could be wrong, so I'm starting to suspect a problem with the
switch. Maybe the switch(es) are being too clever? But then again maybe
I've done something wrong.

What can I do to find out what's going on? I'm using Squeeze (current
point release) and Kernel 2.6.32-5-amd64.

--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
Archive: http://lists.debian.org/4F1D4483...@gmsl.co.uk

Camaleón

unread,

Jan 30, 2012, 12:30:03 PM1/30/12

to

On Mon, 23 Jan 2012 11:29:07 +0000, Shaun wrote:

> Have tried to get NIC Bonding working as per wiki.debian.org/Bonding.
>
> Each NIC is connected to a different switch for redundancy rather than
> bandwidth purposes (insulate against a switch failure). I'm using the
> active-backup mode for HA failover.
>
> output from cat /etc/network/interfaces

(...)

I have the same setup as yours in my lenny servers but with a difference:
both cards are connected to the same physical switch in the same VLAN.

> If I then pull a cable (or use ifconfig eth0 down) I get the following
> in the syslog:
>
> Jan 23 11:21:50 host-1 kernel: [55852.565975] bonding: bond0: link status down for active interface eth0, disabling it in 200 ms.
> Jan 23 11:21:51 host-1 kernel: [55852.761549] bonding: bond0: link status definitely down for interface eth0, disabling it
> Jan 23 11:21:51 host-1 kernel: [55852.761555] bonding: bond0: making interface eth1 the new active one.
>
> All looks good... but... ping from host-1 produces Destination host
> unreachable (with the icmp errors coming from the IP of the bond0 device
> itself). And my remote ssh session dies. Good job I have KVM access :)
>
> So it's not working. This setup seems so simple I can't see where
> anything could be wrong, so I'm starting to suspect a problem with the
> switch. Maybe the switch(es) are being too clever? But then again maybe
> I've done something wrong.
>
> What can I do to find out what's going on? I'm using Squeeze (current
> point release) and Kernel 2.6.32-5-amd64.

Mmm... have you tried the other way round? Disconnect eth1 and see if it
works.

Another thing I would test is with no bonding setup at all, configure
both ethernet cards separately and try to ping with both of them, i.e.:

ping -c 3 -I eth0 google.com
ping -c 3 -I eth1 google.com

Just to discard a hardware or routing issue.

Greetings,

--
Camaleón

--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Archive: http://lists.debian.org/jg6jkm$n8f$1...@dough.gmane.org

toor

unread,

Jan 31, 2012, 9:20:01 AM1/31/12

to

Hi

My first guess would be an arp issue. Have you tried a flush of the arp table on the switches?

Shaun

unread,

Feb 1, 2012, 6:10:01 AM2/1/12

to

Sadly this box is a physical server that is in a data centre and the
underlying infrastructure is not mine. Though the admin is quite happy
to try some things for me to get it working. But not sure what to try next!

--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Archive: http://lists.debian.org/4F291CE6...@gmsl.co.uk

Shaun

unread,

Feb 1, 2012, 7:00:03 AM2/1/12

to

On 30/01/2012 17:22, Camaleón wrote:
> On Mon, 23 Jan 2012 11:29:07 +0000, Shaun wrote:
>>
>> output from cat /etc/network/interfaces
>
> (...)
>
> I have the same setup as yours in my lenny servers but with a difference:
> both cards are connected to the same physical switch in the same VLAN.

> Mmm... have you tried the other way round? Disconnect eth1 and see if it
> works.

I've tired that by setting the active interface from the command line
with ifenslave -c. Didn't make a difference the other way around.

> Another thing I would test is with no bonding setup at all, configure
> both ethernet cards separately and try to ping with both of them, i.e.:

Yes I tried this just to eliminate cable/NIC/switch. They work fine when
configured independently with different IPs.

Also when I do arp -a (via remote console access via iDRAC like
technology) I see the arp table is empty following failover. It has
entries prior to failover.

I have to login via the remote-console access and restart networking to
get it up again. I can only think it's something to do with the switch
but not sure how to prove this. I can also get my hosting company to
connect both NICS to the same switch and I can try testing again? But I
would lose the switch redundancy this way :(

Is there anything further I can do to debug or eliminate things? Sadly I
don't have any access to any of the following or proceeding hops to run
tcpdump or such stuff.

Shaun

--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Archive: http://lists.debian.org/4F291C24...@gmsl.co.uk

Shaun

unread,

Feb 1, 2012, 8:50:03 AM2/1/12

to

On 30/01/2012 17:22, Camaleón wrote:

> On Mon, 23 Jan 2012 11:29:07 +0000, Shaun wrote:
>>
>> output from cat /etc/network/interfaces
>
> (...)
>
> I have the same setup as yours in my lenny servers but with a difference:
> both cards are connected to the same physical switch in the same VLAN.

> Mmm... have you tried the other way round? Disconnect eth1 and see if it
> works.

I've tired that by setting the active interface from the command line
with ifenslave -c. Didn't make a difference the other way around.

> Another thing I would test is with no bonding setup at all, configure
> both ethernet cards separately and try to ping with both of them, i.e.:

Yes I tried this just to eliminate cable/NIC/switch. They work fine when
configured independently with different IPs.

Also when I do arp -a (via remote console access via iDRAC like
technology) I see the arp table is empty following failover. It has
entries prior to failover.

I have to login via the remote-console access and restart networking to
get it up again. I can only think it's something to do with the switch
but not sure how to prove this. I can also get my hosting company to
connect both NICS to the same switch and I can try testing again? But I
would lose the switch redundancy this way :(

Is there anything further I can do to debug or eliminate things? Sadly I
don't have any access to any of the following or proceeding hops to run
tcpdump or such stuff.

Shaun

--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Archive: http://lists.debian.org/4F2941A4...@gmsl.co.uk

Camaleón

unread,

Feb 1, 2012, 11:50:02 AM2/1/12

to

Before doing more tests I would ask first to the hosting company is this
option (channel bonding) is currently available, my guess is that they
can be using "something" (a special hardware configuration or specific
software/routing setup) that is preventing this from working the way you
want. Having no control over hardware it's always a problem ;-(

Greetings,

--
Camaleón

--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Archive: http://lists.debian.org/jgbp1c$at5$1...@dough.gmane.org

Shaun

unread,

Feb 1, 2012, 12:30:01 PM2/1/12

to

On 01/02/2012 16:25, Camaleón wrote:
> option (channel bonding) is currently available, my guess is that they
> can be using "something" (a special hardware configuration or specific
> software/routing setup) that is preventing this from working the way you
> want.

..Indeed they were ;-)

Working now. yay!

--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Archive: http://lists.debian.org/4F2975E8...@gmsl.co.uk