Kernel drops UDP datagrams between interface and process

phil-new...@ipal.net

unread,

Apr 14, 2010, 8:15:31 PM4/14/10

to

And no ... there are no iptables rules at all. But this is weird because
it only does it on ONE of TWO interfaces.

Both eth0 and eth1 are configured with the same set of IP addresses:

eth0 172.30.16.5/16
eth0:0 172.30.0.7/16
eth0:1 172.30.0.13/16
eth0:2 172.30.0.48/16

eth1 172.30.16.5/16
eth1:0 172.30.0.7/16
eth1:1 172.30.0.13/16
eth1:2 172.30.0.48/16

There are also some IPv6 address configured in addition to link-local.
The MAC addresses are:

eth0 00:15:17:da:e0:28
eth1 00:15:17:da:e0:29

BIND is listening on 127.0.0.1 and 172.30.0.7. NSD is listening on
172.30.0.13. Both BIND and NSD are listening to port 53, UDP and TCP.
A firewall translates a public IP address to 172.30.0.13.

Whether I go through eth0 or eth1, of course, depends on which MAC
address I get at ARP query time. Or I can set an ARP entry in the
cache to force it for tests. I can do this on both the machine I run
tests from, as well as the firewall.

So, when the ARP table has the MAC address for eth0, all works fine.
DNS queries get answered as expected.

When the ARP table has the MAC address for eth1, it fails. There is no
DNS answer and dig eventually times out. This affects both my test
machine (Ubuntu) and the firewall (Sonicwall) alike (testing via the
firewall is done from an outside server I reach via SSH running Debian).

So on the server (Ubuntu, server edition), which I get access to via an
IPv6 address so that the SSH connection is not affected by this issue,
I run tcpdump on eth1 and I see the packets actually arriving on eth1.
But there are no outgoing answers. The tcpdump command I did was:

tcpdump -nnl -i eth1 port 53

So next I strace the name server processes to see what is going on.
They never get the UDP datagrams. None of the processes or threads
wake up for this traffic (but sometimes they do for timed events and
such ... and they also do when I switch the test over to eth0).

If I change the /etc/network/interfaces file to comment out automatically
starting up eth0, so only eth1 starts up, and then reboot, after it comes
back up now eth1 is working just fine. Since it is down, eth0 does not
work. I manually bring up eth0, but it does not work. So with this, the
working vs. non-working roles are reversed.

There are no iptables rules in effect.

Maybe they are using SO_BINDTODEVICE as a socket option when binding?
I grabbed the source code to NSD. There is no instance of SO_BINDTODEVICE
in the source code at all in all versions from 3.0.0 to 3.2.5 (Ubuntu has
version 3.2.2).

Any other ideas why these packets are getting lost on one of the interfaces?

--
-----------------------------------------------------------------------------
| Phil Howard KA9WGN | http://linuxhomepage.com/ http://ham.org/ |
| (first name) at ipal.net | http://phil.ipal.org/ http://ka9wgn.ham.org/ |
-----------------------------------------------------------------------------

David Schwartz

unread,

Apr 15, 2010, 1:11:51 AM4/15/10

to

On Apr 14, 5:15 pm, phil-news-nos...@ipal.net wrote:

> Both eth0 and eth1 are configured with the same set of IP addresses:
>
> eth0 172.30.16.5/16
> eth0:0 172.30.0.7/16
> eth0:1 172.30.0.13/16
> eth0:2 172.30.0.48/16
>
> eth1 172.30.16.5/16
> eth1:0 172.30.0.7/16
> eth1:1 172.30.0.13/16
> eth1:2 172.30.0.48/16

If there is some way this could possibly make sense, I can't think of
it. It would help if you explained what these interfaces were
connected to so that we have some hope of understanding what you're
trying to do. Why would you assign the same IP address to the same
host two different ways?

DS

phil-new...@ipal.net

unread,

Apr 15, 2010, 9:38:55 AM4/15/10

to

Both are connected to different physical ports of the same VLAN on currently
the same physical switch. Later, they will be connected to two different
physical switches carrying the same VLAN. The purpose is NOT for bonding.
Instead, the purpose is merely for backup redundancy. If one cable fails,
or in the future one switch fails, there will be a fallback path in and out.
The fallback isn't expected to be instant; it would happen once ARP entries
expire and new entries are acquired with (hopefully) the MAC address that
is still working. As long as both paths are working, it should not matter
which path is used.

Monitoring both eth0 and eth1 side by side, I see both get all ARP requests.
Since those are broadcasts, that should be expected. Only one is answered.
I don't have a problem with only one answering, in principle. The problem
is that sometimes, the ARP answer is for the MAC address of the interace that
has the issue that arriving packets are not being delivered to the listening
process. In all cases, one of the interfaces works and the other does not,
in terms of getting the UDP datagrams to the listening processes. I can see
this happening by having one host with a static ARP entry for one MAC address
and another host with a static ARP entry for another MAC address, each sending
the DNS queries. The packets from BOTH always arrive on the inetrface and
are shown by tcpdump. But only for ONE do the UDP datagrams get delivered
to wake up the listening DNS server.

This actually worked many years ago, the last time I was doing redundant
ethernet connections. But, I believe that was a 2.4 kernel back then.
But who knows what is really causing this issue. It may be the kernel (as
in a regressive bug). It may be a changed default setting. It may be the
library diddling with sockets. I'm running out of speculative ideas, so
that is why I'm posting to ask about this.

David Schwartz

unread,

Apr 15, 2010, 3:58:27 PM4/15/10

to

On Apr 15, 6:38 am, phil-news-nos...@ipal.net wrote:

> Both are connected to different physical ports of the same VLAN on currently
> the same physical switch. Later, they will be connected to two different
> physical switches carrying the same VLAN. The purpose is NOT for bonding.
> Instead, the purpose is merely for backup redundancy. If one cable fails,
> or in the future one switch fails, there will be a fallback path in and out.
> The fallback isn't expected to be instant; it would happen once ARP entries
> expire and new entries are acquired with (hopefully) the MAC address that
> is still working. As long as both paths are working, it should not matter
> which path is used.

That's not how you do that. You create one virtual interface, assign
the IP address to the virtual interface, and then add both physical
interfaces (with no IP addresses assigned to them) to the virtual
interface.

Your configuration makes no sense, since it assigns the same IP
address to the same machine through two different physical interfaces.

http://www.cyberciti.biz/howto/question/static/linux-ethernet-bonding-driver-howto.php
http://www.howtoforge.com/network_card_bonding_centos

The bonding driver supports active backup mode.

DS

phil-new...@ipal.net

unread,

Apr 15, 2010, 5:22:11 PM4/15/10

to

On Thu, 15 Apr 2010 12:58:27 -0700 (PDT) David Schwartz <dav...@webmaster.com> wrote:
| On Apr 15, 6:38 am, phil-news-nos...@ipal.net wrote:
|
|> Both are connected to different physical ports of the same VLAN on currently
|> the same physical switch. Later, they will be connected to two different
|> physical switches carrying the same VLAN. The purpose is NOT for bonding.
|> Instead, the purpose is merely for backup redundancy. If one cable fails,
|> or in the future one switch fails, there will be a fallback path in and out.
|> The fallback isn't expected to be instant; it would happen once ARP entries
|> expire and new entries are acquired with (hopefully) the MAC address that
|> is still working. As long as both paths are working, it should not matter
|> which path is used.
|
| That's not how you do that. You create one virtual interface, assign
| the IP address to the virtual interface, and then add both physical
| interfaces (with no IP addresses assigned to them) to the virtual
| interface.

That's bonding. I don't need bonding. I'd prefer not to have bonding.
I just want a plain interface to just work. And I want the other one to
work, too. Simple as that.

| Your configuration makes no sense, since it assigns the same IP
| address to the same machine through two different physical interfaces.

I originally did it with ONE IP address. Advice in another forum said
it needed to have the IP on both because if the first interface went
down all the way, then the IP address would not be part of the system's
set of IP addresses.

See ${kernelsource}/Documentation/networking/ip-sysctl.txt about 750 lines
down under the "arp_filter" description, for "0 - (default)". It reads:

0 - (default) The kernel can respond to arp requests with addresses
from other interfaces. This may seem wrong but it usually makes
sense, because it increases the chance of successful communication.
IP addresses are owned by the complete host on Linux, not by
particular interfaces. Only for more complex setups like load-
balancing, does this behaviour cause problems.

As long as eth0 is up and has the IP address, I should be able to use eth1
for that IP address. That much doesn't even work as documented because
the arriving packets (they do arrive, tcpdump shows this) on that interface
just get discarded somewhere in the network stack before they ever get to
the process.

Using the same IP address on both interfaces is just assurance that the IP
address is always "... owned by the complete host on Linux". If eth0 goes
completely down and is the only interface with that IP address, then I can
see why that IP address would not longer be part of the "complete host" as
there is no active interface with it. So having the same IP address on
both interfaces avoids this issue.

There is no excuse that I see documented anywhere for packets arriving on
either interface to not be processed by the network stack and delivered to
the process listening for them (as UDP datagrams in this case, but TCP has
the same problem). There is complete symmetry of configuration: Both
interfaces have all the same IP addresses except for special cases based
on the MAC/EUI link layer address. Routes are the same for both:

=============================================================================
fermat/root/x0 /root 2# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
172.30.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
172.30.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth1
0.0.0.0 172.30.0.2 0.0.0.0 UG 1 0 0 eth1
0.0.0.0 172.30.0.2 0.0.0.0 UG 1 0 0 eth0
fermat/root/x0 /root 3#
=============================================================================

Here I run tcpdump in parallel separately for each interface, then on the
machine with 172.30.72.0 (my laptop) I use "arp -s" to set specific MACs
in the ARP table, then use "dig" to do one DNS query. Note the MAC
addresses in the tcpdump output. Note that the ones going to *28 work
(the process responds) while the ones going to *29 do not work.

=============================================================================
fermat/root/x1 /root 1# printf '0\n1\n' | multi -v 2 bash -c 'tcpdump -elnn -i eth% port 53 | prefix "eth% "'
2010-04-15.16:58:32 pid=1917 started: bash -c tcpdump -elnn -i eth0 port 53 | prefix "eth0 "
2010-04-15.16:58:32 pid=1918 started: bash -c tcpdump -elnn -i eth1 port 53 | prefix "eth1 "
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes
eth0 16:59:12.452276 00:24:e8:e7:1e:cc > 00:15:17:da:e0:28, ethertype IPv4 (0x0800), length 75: 172.30.72.0.49210 > 172.30.0.13.53: 45033+ A? foo.example.com. (33)
eth0 16:59:12.452375 00:15:17:da:e0:28 > 00:24:e8:e7:1e:cc, ethertype IPv4 (0x0800), length 75: 172.30.0.13.53 > 172.30.72.0.49210: 45033 ServFail- 0/0/0 (33)
eth1 16:59:26.706613 00:24:e8:e7:1e:cc > 00:15:17:da:e0:29, ethertype IPv4 (0x0800), length 75: 172.30.72.0.39168 > 172.30.0.13.53: 59972+ A? foo.example.net. (33)
eth1 16:59:31.703972 00:24:e8:e7:1e:cc > 00:15:17:da:e0:29, ethertype IPv4 (0x0800), length 75: 172.30.72.0.39168 > 172.30.0.13.53: 59972+ A? foo.example.net. (33)
eth1 16:59:36.701431 00:24:e8:e7:1e:cc > 00:15:17:da:e0:29, ethertype IPv4 (0x0800), length 75: 172.30.72.0.39168 > 172.30.0.13.53: 59972+ A? foo.example.net. (33)
^C
3 packets captured
3 packets received by filter
0 packets dropped by kernel
2 packets captured
2 packets received by filter
0 packets dropped by kernel
fermat/root/x1 /root 2#
=============================================================================

This isn't even a fallback case. This is just two interfaces that should be
be working, and physically do work (the packets do arrive exactly where told
to go) ... but a network stack for some reason discriminating between them.

I want to know where those last 3 packets went, and why.

| http://www.cyberciti.biz/howto/question/static/linux-ethernet-bonding-driver-howto.php
| http://www.howtoforge.com/network_card_bonding_centos
|
| The bonding driver supports active backup mode.

These appear to be configurations that require both ports be connected to
the same switch. That won't be my situation at deployment. There will be
two separate switches in parallel. That's one of the reasons I do not
want bonding.

All I want is passive backup ... which only needs for the network stack to
actually work (e.g. those last 3 packets to be correctly acted on and sent
on to the listening process that has a socket bound to listen on 172.30.0.13
and does not use SO_BINDTODEVICE). It just needs for the kernel to NOT
throw away packets that have a clear destination (the process that is doing
the listen on port 53 address 172.30.0.13) and actually did arrive where
they were supposed to (the interface with the link layer address specified).

David Schwartz

unread,

Apr 15, 2010, 6:41:47 PM4/15/10

to

You are welcome to continue to insist that a configuration that makes
no logical sense should work. But it doesn't, and it won't. What you
want is called bonding under Linux, whether you think it should be or
not.

You claim this is supposed to be for failover, but you have no
failover mechanism. There are many ways to do network interface
failover in Linux, but this is not one of them.

DS

phil-new...@ipal.net

unread,

Apr 15, 2010, 8:31:59 PM4/15/10

to

On Thu, 15 Apr 2010 15:41:47 -0700 (PDT) David Schwartz <dav...@webmaster.com> wrote:

| You are welcome to continue to insist that a configuration that makes
| no logical sense should work. But it doesn't, and it won't. What you
| want is called bonding under Linux, whether you think it should be or
| not.

I continue to insist that the configuration is very logical.

| You claim this is supposed to be for failover, but you have no
| failover mechanism. There are many ways to do network interface
| failover in Linux, but this is not one of them.

The mechanism is called ARP.

For cases of trying to reach this server, other machines will try to send
an IP packet to 172.30.0.13 (for example). Initially and periodically the
ARP table will not have an entry for 172.30.0.13 (which expires in 2 to 10
minutes depending on the machine involved, configurable on most systems).
An ARP request is transmitted over the interfaces and network segments the
routing table on those machines indicates is appropriate. That ARP request
reaches this server over whichever interface is working, possibly both if
both are working (I see them coming in on both via tcpdump).

If eth0 and eth1 are both working, the ARP request can come in on either or
both interface. It will be answered over at least one of them, and that is
the interface the traffic from the other machine will arrive to this server.
Traffic will proceed over either eth0 or eth1 (and possibly all of it over
just one of them, which is fine).

If eth0 is the only physically working interface, then it will be the only
interface the ARP request arrives on, and will be the only interface the
ARP answer is sent on. The other machine will get the MAC address of eth0.
Traffic will proceed over eth0.

If eth1 is the only physically working interface, then it will be the only
interface the ARP request arrives on, and will be the only interface the
ARP answer is sent on. The other machine will get the MAC address of eth1.
Traffic will proceed over eth1.

If both eth0 and eth1 are not physically working, the this server canoot be
reached.

It may not be the fastest mechanism around because of the time it takes for
ARP table caches to expire. It can be defeated by ill-advise setting of
permanent ARP table entries. But it is a mechanism that can work. In fact
it does work around a failure in the kernel itself where the kernel is not
passing the arriving packets on for one of the interfaces (usually this is
eth1, but sometimes it is eth0).

If this isn't the mechanism you were expecting, that doesn't make it any
less a mechanism.

I don't deny that port bonding would be a valid mechanism. But port bonding
requires both ends to terminate on the same machine. So that mechanism is
not suitable for parallel redundant networks. Thus it is not suitable for
my situation, even though it would certainly be much faster to keep going
when one of the interfaces or cables or switches fails. But I do not need
rapid failover. I do not need for existing connections to be kept. If new
connections have to be made by the applications retrying many minutes later,
that is acceptable for this.

You have mentioned one mechanism so far. You claim "There are many ways to
do network interface failover in Linux". Maybe you can list or name at least
two more of them just so I can see what you have in mind. If one of them is
suitable for my situation, I'll even consider using it.

David Schwartz

unread,

Apr 15, 2010, 8:59:22 PM4/15/10

to

On Apr 15, 5:31 pm, phil-news-nos...@ipal.net wrote:

> | You claim this is supposed to be for failover, but you have no
> | failover mechanism. There are many ways to do network interface
> | failover in Linux, but this is not one of them.
>
> The mechanism is called ARP.

You have no method to prevent one interface from responding to ARP
requests under any circumstances. If you had logic to disable one
interface when the other was working, then you could use ARP as
failover mechanism.

> You have mentioned one mechanism so far. You claim "There are many ways to
> do network interface failover in Linux". Maybe you can list or name at least
> two more of them just so I can see what you have in mind. If one of them is
> suitable for my situation, I'll even consider using it.

VRRP and OSPF. My personal preference is to use OSPF, assigning the
service IP address to a loopback interface and advertising its
availability across both physical interfaces.

DS

phil-new...@ipal.net

unread,

Apr 16, 2010, 6:38:02 AM4/16/10

to

Irrelevant. If the connection is not working, there will not be any ARP
requests coming in on that interface. I don't need any added logic.
The ARP requests will come in on one or more of the working interfaces.
Nothing needs to be further disabled.

|> You have mentioned one mechanism so far. You claim "There are many ways to
|> do network interface failover in Linux". Maybe you can list or name at least
|> two more of them just so I can see what you have in mind. If one of them is
|> suitable for my situation, I'll even consider using it.
|
| VRRP and OSPF. My personal preference is to use OSPF, assigning the
| service IP address to a loopback interface and advertising its
| availability across both physical interfaces.

There is no VRRP. VRRP is for failover of two or more routers as seen by
other hosts via a single virtual MAC address. This is NOT a case where I
want a quick fallback to an alternate machine that picks up that MAC when
the other fails. VRRP is similar to HSRP. OpenBSD has a similar system
called CARP. But, again, it is for two or more machines operating in a
way where they coordinate with each other to share the same virtual MAC.

There is no OSPF. OSPF is routing. There is no routing taking place here.
This is switching redundancy; nothing more.

BTW, I did try assigning the IP address only to the loopback interface a
couple days ago. The end result was that ARPs were answered as before,
but ALL packets arriving on BOTH ethernet interfaces were discarded. The
only packets that would reach the listening process were those that were
transmitted from the same machine itself. Advertising OSPF isn't even
applicable because the peer machines share the same subnet and ethernet
segments. ARP is the mechanism of neighbor discovery here (for IPv4).

Also, assigning the IP address to the loopback interface used to work in
older kernels. I know because I have done it that way before. It SHOULD
work. It doesn't now. In software development we call this "regression".

Now, back to the original problem ... why the kernel discards packets for
one interface and not the other. If it were to discard packets on both,
then it would be clear something is wrong with how they are configured.
But it discards packets for JUST ONE and not for the other. Both interfaces
are working and both are configured the same. Keep your mind away from the
why aspect of my configuration. If you don't know why the kernel does one
thing with on interface, and something different for another, when both are
configured the same, then you have nothing to contribute.

David Schwartz

unread,

Apr 16, 2010, 10:19:41 AM4/16/10

to

On Apr 16, 3:38 am, phil-news-nos...@ipal.net wrote:

> On Thu, 15 Apr 2010 17:59:22 -0700 (PDT) David Schwartz <dav...@webmaster.com> wrote:

> | You have no method to prevent one interface from responding to ARP
> | requests under any circumstances. If you had logic to disable one
> | interface when the other was working, then you could use ARP as
> | failover mechanism.

> Irrelevant. If the connection is not working, there will not be any ARP
> requests coming in on that interface. I don't need any added logic.
> The ARP requests will come in on one or more of the working interfaces.
> Nothing needs to be further disabled.

The problem is if the connection is *working*, not if it's not
working. You're using a failover scheme, not a dual-active.

> |> You have mentioned one mechanism so far. You claim "There are many ways to
> |> do network interface failover in Linux". Maybe you can list or name at least
> |> two more of them just so I can see what you have in mind. If one of them is
> |> suitable for my situation, I'll even consider using it.
> |
> | VRRP and OSPF. My personal preference is to use OSPF, assigning the
> | service IP address to a loopback interface and advertising its
> | availability across both physical interfaces.

> There is no VRRP. VRRP is for failover of two or more routers as seen by
> other hosts via a single virtual MAC address. This is NOT a case where I
> want a quick fallback to an alternate machine that picks up that MAC when
> the other fails. VRRP is similar to HSRP. OpenBSD has a similar system
> called CARP. But, again, it is for two or more machines operating in a
> way where they coordinate with each other to share the same virtual MAC.

This is actually what you want, it just so happens that the "two
routers" are the same host. But want you want is the same -- you want
the same mac and IP to keep working even if one link goes down.

> There is no OSPF. OSPF is routing. There is no routing taking place here.
> This is switching redundancy; nothing more.

There is no switching redundancy either. If you want interface
failover to the same IP address, you can do it with routing.

> BTW, I did try assigning the IP address only to the loopback interface a
> couple days ago. The end result was that ARPs were answered as before,
> but ALL packets arriving on BOTH ethernet interfaces were discarded. The
> only packets that would reach the listening process were those that were
> transmitted from the same machine itself. Advertising OSPF isn't even
> applicable because the peer machines share the same subnet and ethernet
> segments. ARP is the mechanism of neighbor discovery here (for IPv4).

Sounds like you have some configuration problems. Did you leave
rp_filter on or something silly like that? Is IP forwarding on?

> Now, back to the original problem ... why the kernel discards packets for
> one interface and not the other. If it were to discard packets on both,
> then it would be clear something is wrong with how they are configured.
> But it discards packets for JUST ONE and not for the other. Both interfaces
> are working and both are configured the same. Keep your mind away from the
> why aspect of my configuration. If you don't know why the kernel does one
> thing with on interface, and something different for another, when both are
> configured the same, then you have nothing to contribute.

I no longer have any desire to help you. Good luck.

I will, however, one last time warn everyone else *NOT* to do what
you're doing, as it doesn't make any sense.

DS

phil-new...@ipal.net

unread,

Apr 16, 2010, 10:53:50 AM4/16/10

to

On Fri, 16 Apr 2010 07:19:41 -0700 (PDT) David Schwartz <dav...@webmaster.com> wrote:

|> Irrelevant. If the connection is not working, there will not be any ARP
|> requests coming in on that interface. I don't need any added logic.
|> The ARP requests will come in on one or more of the working interfaces.
|> Nothing needs to be further disabled.
|
| The problem is if the connection is *working*, not if it's not
| working. You're using a failover scheme, not a dual-active.

I'm doing both. When neither has failed, both are working (at least up
to the kernel network stack). The problem is inside the kernel network
stack because it treats one of the interfaces PARTIALLY as up and also
PARTIALLY as down (by not passing packets it actually does get on to the
process listening with a socket bound to the IP address the packet is
destined for).

|> |> You have mentioned one mechanism so far. You claim "There are many ways to
|> |> do network interface failover in Linux". Maybe you can list or name at least
|> |> two more of them just so I can see what you have in mind. If one of them is
|> |> suitable for my situation, I'll even consider using it.
|> |
|> | VRRP and OSPF. My personal preference is to use OSPF, assigning the
|> | service IP address to a loopback interface and advertising its
|> | availability across both physical interfaces.
|
|> There is no VRRP. VRRP is for failover of two or more routers as seen by
|> other hosts via a single virtual MAC address. This is NOT a case where I
|> want a quick fallback to an alternate machine that picks up that MAC when
|> the other fails. VRRP is similar to HSRP. OpenBSD has a similar system
|> called CARP. But, again, it is for two or more machines operating in a
|> way where they coordinate with each other to share the same virtual MAC.
|
| This is actually what you want, it just so happens that the "two
| routers" are the same host. But want you want is the same -- you want
| the same mac and IP to keep working even if one link goes down.

I don't need VRRP to achieve that. If one goes down, the other *IS* working.
I don't need a virtual MAC to accomplish that.

VRRP is for FAST failover and is designed for alternate routers.

|> There is no OSPF. OSPF is routing. There is no routing taking place here.
|> This is switching redundancy; nothing more.
|
| There is no switching redundancy either. If you want interface
| failover to the same IP address, you can do it with routing.

OSPF just updates the routing tables when you have a bunch of machines
on the same network. But the route tables are already correct, so OSPF
in this simple case is pointless. OSPF solves the problem of getting
packets to the correct machine. That isn't a problem that needs a
solution because packets already get to the correct machine.

The problem is that AFTER the packet gets to the correct machine, the
kernel network stack decides to discard some of them merely because of
which interface they arrived on, even though both interfaces are
configured exactly the same way. The route table is irrelevant because
this problem involves INCOMING packets (being handled incorrectly).

|> BTW, I did try assigning the IP address only to the loopback interface a
|> couple days ago. The end result was that ARPs were answered as before,
|> but ALL packets arriving on BOTH ethernet interfaces were discarded. The
|> only packets that would reach the listening process were those that were
|> transmitted from the same machine itself. Advertising OSPF isn't even
|> applicable because the peer machines share the same subnet and ethernet
|> segments. ARP is the mechanism of neighbor discovery here (for IPv4).
|
| Sounds like you have some configuration problems. Did you leave
| rp_filter on or something silly like that? Is IP forwarding on?

I don't know about rp_filter. I will check that when I get a chance.

Why would IP forwarding need to be on? These machines are not going to
be forwarding anyone else's traffic ... at least not with the current
intentions. I did not set it one way or the other, yet. Still, I
recognize it can be useful in some situations, such as running virtual
machines within the host on a non-bridged basis. I will check it when I
there.

|> Now, back to the original problem ... why the kernel discards packets for
|> one interface and not the other. If it were to discard packets on both,
|> then it would be clear something is wrong with how they are configured.
|> But it discards packets for JUST ONE and not for the other. Both interfaces
|> are working and both are configured the same. Keep your mind away from the
|> why aspect of my configuration. If you don't know why the kernel does one
|> thing with on interface, and something different for another, when both are
|> configured the same, then you have nothing to contribute.
|
| I no longer have any desire to help you. Good luck.

I don't think you ever did. But if the above suggestions were set wrong
and make things work when I change them, I bet you'll sure be angry that
you did help. We'll see.

| I will, however, one last time warn everyone else *NOT* to do what
| you're doing, as it doesn't make any sense.

It is completely and totally logical. And it did work in the 2.4 kernel
(at some subversion level I cannot remember). It worked by having the
secondary addresses configured ONLY on the loopback interface (each had
a primary IP address that wasn't actually used for anything just to get
the interface up). This loopback aliasing isn't working, either, right
now. If both ways can be made to work, the loopback way would be more
preferrable since it would be easier to configure (just put secondary
IPs on one interface that stays up), and be more consistent with the
documented notion that all IP addresses are system-wide rather than
interface specific.

Ersek, Laszlo

unread,

Apr 16, 2010, 11:40:59 AM4/16/10

to

On Fri, 16 Apr 2010, phil-new...@ipal.net wrote:

> OSPF just updates the routing tables when you have a bunch of machines
> on the same network. But the route tables are already correct, so OSPF
> in this simple case is pointless. OSPF solves the problem of getting
> packets to the correct machine. That isn't a problem that needs a
> solution because packets already get to the correct machine.
>
> The problem is that AFTER the packet gets to the correct machine, the
> kernel network stack decides to discard some of them merely because of
> which interface they arrived on, even though both interfaces are
> configured exactly the same way. The route table is irrelevant because
> this problem involves INCOMING packets (being handled incorrectly).

[snip]

> On Fri, 16 Apr 2010 07:19:41 -0700 (PDT) David Schwartz
> <dav...@webmaster.com> wrote:

> | Sounds like you have some configuration problems. Did you leave
> | rp_filter on or something silly like that? Is IP forwarding on?
>
> I don't know about rp_filter. I will check that when I get a chance.

"rp" stands for "reverse path", and AFAICT it does something like this:

For each incoming internet protocol packet with src addr REMOTE_IP and dst
addr ONE_OF_YOUR_IPS, arriving over iface IFACE, check whether an outgoing
packet with src addr SAME_IP_OF_YOURS and dst addr REMOTE_IP would be
routed through the same iface IFACE.

Thus it very much depends on your routing table.

http://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.kernel.rpf.html

(David, sorry if I completely missed the point.)

Cheers,
lacos

phil-new...@ipal.net

unread,

Apr 16, 2010, 12:41:19 PM4/16/10

to

On Fri, 16 Apr 2010 17:40:59 +0200 Ersek, Laszlo <la...@caesar.elte.hu> wrote:

| "rp" stands for "reverse path", and AFAICT it does something like this:
|
| For each incoming internet protocol packet with src addr REMOTE_IP and dst
| addr ONE_OF_YOUR_IPS, arriving over iface IFACE, check whether an outgoing
| packet with src addr SAME_IP_OF_YOURS and dst addr REMOTE_IP would be
| routed through the same iface IFACE.
|
| Thus it very much depends on your routing table.
|
| http://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.kernel.rpf.html

Yes, I just read up about it (Documentation/networking/ip-sysctl.txt and
RFC3704). It should not have affected this, based on how I read it.
But I just tested it on and off a few times and it does affect it, with
some funny lags. Further checking and I notice that it will have an
impact once the ARP entry for the reverse path on the interface that it
impacts expires. I suspect that cases I have seen where the failure
flips over to the other interface (e.g. eth1 works and eth0 fails) is
just a case of either eth0's ARP expiring first, or new ARP queries being
done on the other interface, instead.

The kernel has it off by default. Ubuntu turns it on by default at least
in 9.10 server edition.

Anyway, I don't actually need this particular security feature. It does
not apply, anyway, because all traffic is supposed to be able to come in
on any connected interface (maybe it isn't the best path back, but it is
valid). I tested the failover on one of the servers by alternately
unplugging the eth0 cable (traffic works over eth1) and the eth1 cable
(traffic works over eth0).

| (David, sorry if I completely missed the point.)

I'm not sure what his point ever was. If he really understood this, then
why didn't he mention rp_filter initially? And why didn't he understand
how multihoming can make a cheap form of fallback (it won't be fast, but
not everyone needs fast).

He'll be pissed because, as it turns out, he did help ... just not in the
way he seemed to want (to get me to change my whole network around to do
ethernet bonding and/or VRRP, etc).

David Schwartz

unread,

Apr 16, 2010, 2:19:14 PM4/16/10

to

On Apr 16, 7:53 am, phil-news-nos...@ipal.net wrote:

> OSPF just updates the routing tables when you have a bunch of machines
> on the same network. But the route tables are already correct, so OSPF
> in this simple case is pointless. OSPF solves the problem of getting
> packets to the correct machine. That isn't a problem that needs a
> solution because packets already get to the correct machine.

The challenge is getting packets to the correct machine when the
possible routes to that machine may change. OSPF is specifically made
to do exactly this.

The classic way to do interface failover is by adjusting the routing
tables so that the route to the service's IP address no longer points
to the address of the interface that failed. That keeps the switches
(and ARP) out of the failover mechanism which is a good thing, since
they're not designed to do that.

DS

phil-new...@ipal.net

unread,

Apr 16, 2010, 7:02:42 PM4/16/10

to

On Fri, 16 Apr 2010 11:19:14 -0700 (PDT) David Schwartz <dav...@webmaster.com> wrote:

| On Apr 16, 7:53 am, phil-news-nos...@ipal.net wrote:
|
|> OSPF just updates the routing tables when you have a bunch of machines
|> on the same network. But the route tables are already correct, so OSPF
|> in this simple case is pointless. OSPF solves the problem of getting
|> packets to the correct machine. That isn't a problem that needs a
|> solution because packets already get to the correct machine.
|
| The challenge is getting packets to the correct machine when the
| possible routes to that machine may change. OSPF is specifically made
| to do exactly this.

Why do you think ARP (NDP for IPv6) can't accomplish that within a subnet?

| The classic way to do interface failover is by adjusting the routing
| tables so that the route to the service's IP address no longer points
| to the address of the interface that failed. That keeps the switches
| (and ARP) out of the failover mechanism which is a good thing, since
| they're not designed to do that.

The only route table entries that are needed are:

1. A route to the interface for the subnet:

Destination Gateway Genmask Flags Metric Ref Use Iface
172.30.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
172.30.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth1

2. A route to any static gateways to other networks:

Destination Gateway Genmask Flags Metric Ref Use Iface

0.0.0.0 172.30.0.2 0.0.0.0 UG 1 0 0 eth0
0.0.0.0 172.30.0.2 0.0.0.0 UG 1 0 0 eth1

If I had multiple ROUTED paths to other networks, then OSPF would likely be
a very good idea. Because I do not have that in my situation, it is not
needed. If I had that, I'd rather use VRRP within each subnet so I can keep
minimal software on individual servers, and just do the OSPF on the grid of
routers. Maybe some day I will need to do that.

David Schwartz

unread,

Apr 16, 2010, 8:12:34 PM4/16/10

to

On Apr 16, 4:02 pm, phil-news-nos...@ipal.net wrote:

> Why do you think ARP (NDP for IPv6) can't accomplish that within a subnet?

Because it is fundamentally premised around the idea that an IP
address is associated with one and only one MAC address on a LAN.
Switches are built around this assumption. Also, it includes no
failover mechanism. There is no way to detect that a link is dead and
withdraw the ARP mappings associated with it.

> | The classic way to do interface failover is by adjusting the routing
> | tables so that the route to the service's IP address no longer points
> | to the address of the interface that failed. That keeps the switches
> | (and ARP) out of the failover mechanism which is a good thing, since
> | they're not designed to do that.
>
> The only route table entries that are needed are:
>
> 1. A route to the interface for the subnet:
>
> Destination Gateway Genmask Flags Metric Ref Use Iface
> 172.30.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
> 172.30.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth1
>
> 2. A route to any static gateways to other networks:
>
> Destination Gateway Genmask Flags Metric Ref Use Iface
> 0.0.0.0 172.30.0.2 0.0.0.0 UG 1 0 0 eth0
> 0.0.0.0 172.30.0.2 0.0.0.0 UG 1 0 0 eth1
>
> If I had multiple ROUTED paths to other networks, then OSPF would likely be
> a very good idea. Because I do not have that in my situation, it is not
> needed. If I had that, I'd rather use VRRP within each subnet so I can keep
> minimal software on individual servers, and just do the OSPF on the grid of
> routers. Maybe some day I will need to do that.

You don't seem to understand what I'm suggesting. OSPF is just as good
for multiple paths to the same network as it is for multiple paths to
different networks. And it's just as good for networks that consist of
only a single IP address as it is for gigantic networks.

You have a "network" that consists of the one IP address that you want
to maintain reachability to. And you have two paths to it. You want to
take the working path, and if both paths are up, you don't care what
happens. OSPF can do exactly this, and it is used widely for exactly
this purpose.

DS

phil-new...@ipal.net

unread,

Apr 17, 2010, 1:51:06 AM4/17/10

to

On Fri, 16 Apr 2010 17:12:34 -0700 (PDT) David Schwartz <dav...@webmaster.com> wrote:
| On Apr 16, 4:02 pm, phil-news-nos...@ipal.net wrote:
|
|> Why do you think ARP (NDP for IPv6) can't accomplish that within a subnet?
|
| Because it is fundamentally premised around the idea that an IP
| address is associated with one and only one MAC address on a LAN.

That's "one and only one MAC address on a LAN ... at ONE TIME". Which
MAC it is can change.

| Switches are built around this assumption. Also, it includes no
| failover mechanism. There is no way to detect that a link is dead and
| withdraw the ARP mappings associated with it.

ARP mappings have expiration times. Sure, it would be great if there
was a way to instantly withdraw it. Then this method might be faster
than it actually is. Instead, it depends on ARP expirations, which do
take some time. It also depends on the administrator not publishing
permanent ARPs or overly extending the expiration time.

On which host do you think OSPF should be run on? I'm sure not going to
be running it on all the servers.

| You have a "network" that consists of the one IP address that you want
| to maintain reachability to. And you have two paths to it. You want to
| take the working path, and if both paths are up, you don't care what
| happens. OSPF can do exactly this, and it is used widely for exactly
| this purpose.

It can provide routes for a working, and usually best, path. But if the IP
addresses are in the same ethernet broadcast domain, as in being on the same
ethernet segment within the same subnet, what kind of routes are you going
to get? Or does OSPF take over and manage the ARP table?

David Schwartz

unread,

Apr 17, 2010, 3:54:05 AM4/17/10

to

On Apr 16, 10:51 pm, phil-news-nos...@ipal.net wrote:

> That's "one and only one MAC address on a LAN ... at ONE TIME". Which
> MAC it is can change.

How does your setup ensure that?

> On which host do you think OSPF should be run on? I'm sure not going to
> be running it on all the servers.

Why not? If you want an actual failover mechanism, something has to
detect the failure and provide the failover, right?

> | You have a "network" that consists of the one IP address that you want
> | to maintain reachability to. And you have two paths to it. You want to
> | take the working path, and if both paths are up, you don't care what
> | happens. OSPF can do exactly this, and it is used widely for exactly
> | this purpose.

> It can provide routes for a working, and usually best, path. But if the IP
> addresses are in the same ethernet broadcast domain, as in being on the same
> ethernet segment within the same subnet, what kind of routes are you going
> to get? Or does OSPF take over and manage the ARP table?

Every modern IP device will take the most-specific route. If they have
a route to 1.2.3.0/24 that's directly connected but a route to
1.2.3.99/32 from OSPF, they will prefer the more specific route to
reach 1.2.3.99. But personally, I prefer to use an IP address that
isn't associated with any particular Ethernet network.

DS

phil-new...@ipal.net

unread,

Apr 17, 2010, 10:31:23 AM4/17/10

to

On Sat, 17 Apr 2010 00:54:05 -0700 (PDT) David Schwartz <dav...@webmaster.com> wrote:
| On Apr 16, 10:51�pm, phil-news-nos...@ipal.net wrote:
|
|> That's "one and only one MAC address on a LAN ... at ONE TIME". �Which
|> MAC it is can change.
|
| How does your setup ensure that?

It doesn't need to. The ARP mechanism in the network interface does. It
also doesn't depend on it, so if there were two or more, it could still
work, anyway. But it works when there is only one. It works because ARP
entries eventually expire and get re-acquired. This is essential when
one is renumbering computers in a simple way (which can happen when DHCP
reallocates an IP address that was previously used).

|> On which host do you think OSPF should be run on? �I'm sure not going to
|> be running it on all the servers.
|
| Why not? If you want an actual failover mechanism, something has to
| detect the failure and provide the failover, right?

The failure detection is the mechanism of an ARP entry expiring and NOT
being re-acquired for the MAC with the now-bad interface/cable/switch.
The failover is re-acquiring a MAC for that IP address over a path that
is working.

|> | You have a "network" that consists of the one IP address that you want
|> | to maintain reachability to. And you have two paths to it. You want to
|> | take the working path, and if both paths are up, you don't care what
|> | happens. OSPF can do exactly this, and it is used widely for exactly
|> | this purpose.
|
|> It can provide routes for a working, and usually best, path. �But if the IP
|> addresses are in the same ethernet broadcast domain, as in being on the same
|> ethernet segment within the same subnet, what kind of routes are you going
|> to get? �Or does OSPF take over and manage the ARP table?
|
| Every modern IP device will take the most-specific route. If they have
| a route to 1.2.3.0/24 that's directly connected but a route to
| 1.2.3.99/32 from OSPF, they will prefer the more specific route to
| reach 1.2.3.99. But personally, I prefer to use an IP address that
| isn't associated with any particular Ethernet network.

The route to 1.2.3.99 is ... *drum roll* ... 1.2.3.99 ... for computers on
the same network segments, such as 1.2.3.44 (the subnet being /24 in size).

So what is OSPF going to do ... provide a MAC address to insert into the
ARP table instead of giving us an IP address we already know of?

David Schwartz

unread,

Apr 17, 2010, 7:15:31 PM4/17/10

to

On Apr 17, 7:31 am, phil-news-nos...@ipal.net wrote:
> On Sat, 17 Apr 2010 00:54:05 -0700 (PDT) David Schwartz <dav...@webmaster.com> wrote:
> | On Apr 16, 10:51 pm, phil-news-nos...@ipal.net wrote:
> |
> |> That's "one and only one MAC address on a LAN ... at ONE TIME". Which
> |> MAC it is can change.
> |
> | How does your setup ensure that?
>
> It doesn't need to.

Yes, it does.

> The ARP mechanism in the network interface does.

No, it doesn't.

> | Every modern IP device will take the most-specific route. If they have
> | a route to 1.2.3.0/24 that's directly connected but a route to
> | 1.2.3.99/32 from OSPF, they will prefer the more specific route to
> | reach 1.2.3.99. But personally, I prefer to use an IP address that
> | isn't associated with any particular Ethernet network.

> The route to 1.2.3.99 is ... *drum roll* ... 1.2.3.99 ... for computers on
> the same network segments, such as 1.2.3.44 (the subnet being /24 in size).

No, it's not. Check your routing table. The route to 1.2.3.99 will be,
assuming there's no more-specific route, 1.2.3.0/24, the network
route. There will be no specific route to 1.2.3.99/32 unless someone
places one there. (Think about it. If you number a network interface
inside a /16, do you really thing 65,536 routes are added, a /32 to
each IP address inside the subnet?)

> So what is OSPF going to do ... provide a MAC address to insert into the
> ARP table instead of giving us an IP address we already know of?

No, OSPF will advertise /32's to the loopback address over both
interfaces. In normal cases, the routers will see both routes and
insert them as equal-cost routes. The MAC addresses will never change
-- each interface will have its own. Should an interface go down, OSPF
will withdraw the route over the working interface, causing the only
remaining /32 to the service IP address to be the one through the
working interface.

DS

Michel Talon

unread,

Apr 18, 2010, 5:12:14 AM4/18/10

to

David Schwartz <dav...@webmaster.com> wrote:
> > The route to 1.2.3.99 is ... *drum roll* ... 1.2.3.99 ... for computers on
> > the same network segments, such as 1.2.3.44 (the subnet being /24 in size).
>
> No, it's not. Check your routing table. The route to 1.2.3.99 will be,
> assuming there's no more-specific route, 1.2.3.0/24, the network
> route. There will be no specific route to 1.2.3.99/32 unless someone
> places one there. (Think about it. If you number a network interface
> inside a /16, do you really thing 65,536 routes are added, a /32 to
> each IP address inside the subnet?)
>

At least on FreeBSD, specific routes are added to machines on the local
network when some exchange with them has been done:

niobe% netstat -nr
Routing tables

Internet:
Destination Gateway Flags Refs Use Netif
Expire
default 134.157.10.254 UGS 0 259095 fxp0
127.0.0.1 127.0.0.1 UH 0 101620 lo0
134.157.10.0/24 link#2 UC 0 0 fxp0
134.157.10.1 00:14:22:7b:ba:83 UHLW 1 157746 fxp0 1199
134.157.10.2 00:15:17:29:89:51 UHLW 1 234 fxp0 1147
.....
192.168.1.0/24 link#1 UC 0 0 bfe0

This machine has 2 NICs hence the two links in the routing table. Note
how several machines inside the 134.157.10.0/24 are present in the
table, but of course not the whole 256 numbers.

--

Michel TALON

David Schwartz

unread,

Apr 18, 2010, 2:28:32 PM4/18/10

to

On Apr 18, 2:12 am, ta...@lpthe.jussieu.fr (Michel Talon) wrote:

> At least on FreeBSD, specific routes are added to machines on the local
> network when some exchange with them has been done:

I hope those are some kind of caching entry that's removed if a
different route is inserted.

> This machine has 2 NICs hence the two links in the routing table. Note
> how several machines inside the 134.157.10.0/24 are present in the
> table, but of course not the whole 256 numbers.

I wonder what exactly that means. In any event, I prefer to use an IP
address that's not associated with the LAN block for the service IP
address, though it is supposed to work either way.

My preferred setup is to use three ranges of IP addresses, one for
each LAN, and one just for service IP addresses advertised as /32s.
That way, the only routes to the service IP address will be the OSPF
routes. The ideal setup uses two routers, each with an interface in
each LAN and each host having two interfaces, one in each LAN. Under
non-failure conditions, each router should have two equal-cost OSPF
routes to each service IP address. Clients connet to the service IP
addresses.

DS

phil-new...@ipal.net

unread,

Apr 18, 2010, 3:23:40 PM4/18/10

to

On Sat, 17 Apr 2010 16:15:31 -0700 (PDT) David Schwartz <dav...@webmaster.com> wrote:
| On Apr 17, 7:31 am, phil-news-nos...@ipal.net wrote:
|> On Sat, 17 Apr 2010 00:54:05 -0700 (PDT) David Schwartz <dav...@webmaster.com> wrote:
|> | On Apr 16, 10:51 pm, phil-news-nos...@ipal.net wrote:
|> |
|> |> That's "one and only one MAC address on a LAN ... at ONE TIME". Which
|> |> MAC it is can change.
|> |
|> | How does your setup ensure that?
|>
|> It doesn't need to.
|
| Yes, it does.

If there are somehow two MACs, one or the other would be used. If before
these expire, the interface that gets used goes down, there would be a time
period of nothing working. As soon as the ARP entry for that MAC expires,
the next ARP query would succeed over the working interface.

|> The ARP mechanism in the network interface does.
|
| No, it doesn't.

If you think something is operating to make this work, now, feel free to
say what it is in defense of your assertion that the ARP machanism is not
the one doing it.

|> | Every modern IP device will take the most-specific route. If they have
|> | a route to 1.2.3.0/24 that's directly connected but a route to
|> | 1.2.3.99/32 from OSPF, they will prefer the more specific route to
|> | reach 1.2.3.99. But personally, I prefer to use an IP address that
|> | isn't associated with any particular Ethernet network.
|
|> The route to 1.2.3.99 is ... *drum roll* ... 1.2.3.99 ... for computers on
|> the same network segments, such as 1.2.3.44 (the subnet being /24 in size).
|
| No, it's not. Check your routing table. The route to 1.2.3.99 will be,
| assuming there's no more-specific route, 1.2.3.0/24, the network
| route. There will be no specific route to 1.2.3.99/32 unless someone
| places one there. (Think about it. If you number a network interface
| inside a /16, do you really thing 65,536 routes are added, a /32 to
| each IP address inside the subnet?)

Actually, there is NO route to 1.2.3.99. There is a route FOR 1.2.3.0/24
which specifies 0.0.0.0 as the gateway (effectively meaning "no gateway").
So it treats anything destined to an address in 1.2.3.0/24 as going to that
LAN segment subnet directly ... via the MAC address known to belong to the
destination IP address ... if it is known. No other routing is needed.

|> So what is OSPF going to do ... provide a MAC address to insert into the
|> ARP table instead of giving us an IP address we already know of?
|
| No, OSPF will advertise /32's to the loopback address over both
| interfaces. In normal cases, the routers will see both routes and
| insert them as equal-cost routes. The MAC addresses will never change
| -- each interface will have its own. Should an interface go down, OSPF
| will withdraw the route over the working interface, causing the only
| remaining /32 to the service IP address to be the one through the
| working interface.

Where will OSPF say that 1.2.3.99 is to be routed to? 1.2.3.99?

phil-new...@ipal.net

unread,

Apr 18, 2010, 3:25:18 PM4/18/10

to

So it is routing to MAC addresses? Or is the table just a reflection of
both routing (to gateways) and ARP (to MACs)?

David Schwartz

unread,

Apr 18, 2010, 4:56:01 PM4/18/10

to

On Apr 18, 12:23 pm, phil-news-nos...@ipal.net wrote:

> If you think something is operating to make this work, now, feel free to
> say what it is in defense of your assertion that the ARP machanism is not
> the one doing it.

There is no mechanism to ensure that one and only one ARP entry at a
time is associated with the IP address.

> Actually, there is NO route to 1.2.3.99. There is a route FOR 1.2.3.0/24
> which specifies 0.0.0.0 as the gateway (effectively meaning "no gateway").

The route 1.2.3.0/24 is a route to 1.2.3.99, and also a route to
1.2.3.43, and it is a route to any address inside that range.

> So it treats anything destined to an address in 1.2.3.0/24 as going to that
> LAN segment subnet directly ... via the MAC address known to belong to the
> destination IP address ... if it is known. No other routing is needed.

Well, whether or not any other routing is needed depends on whether
the IP address you're trying to reach is in fact reachable by that
route. For example, 0.0.0.0/0 is a route to 1.2.3.4. But other routing
is needed if 1.2.3.4 is your default gateway.

> | No, OSPF will advertise /32's to the loopback address over both
> | interfaces. In normal cases, the routers will see both routes and
> | insert them as equal-cost routes. The MAC addresses will never change
> | -- each interface will have its own. Should an interface go down, OSPF
> | will withdraw the route over the working interface, causing the only
> | remaining /32 to the service IP address to be the one through the
> | working interface.

> Where will OSPF say that 1.2.3.99 is to be routed to? 1.2.3.99?

In which case? In the failover case where 1.2.3.99 is reachable
through the 1.2.3.0/24 LAN, it will say that 1.2.3.99 is to be routed
to the IP address assigned to the other (still working) Ethernet
interface.

DS

phil-new...@ipal.net

unread,

Apr 19, 2010, 9:04:32 PM4/19/10

to

On Sun, 18 Apr 2010 13:56:01 -0700 (PDT) David Schwartz <dav...@webmaster.com> wrote:
| On Apr 18, 12:23 pm, phil-news-nos...@ipal.net wrote:
|
|> If you think something is operating to make this work, now, feel free to
|> say what it is in defense of your assertion that the ARP machanism is not
|> the one doing it.
|
| There is no mechanism to ensure that one and only one ARP entry at a
| time is associated with the IP address.

What failure do you think will happen if there are two or more correct
ARP entries?

|> Actually, there is NO route to 1.2.3.99. There is a route FOR 1.2.3.0/24
|> which specifies 0.0.0.0 as the gateway (effectively meaning "no gateway").
|
| The route 1.2.3.0/24 is a route to 1.2.3.99, and also a route to
| 1.2.3.43, and it is a route to any address inside that range.

And what is the gateway? 0.0.0.0?

|> So it treats anything destined to an address in 1.2.3.0/24 as going to that
|> LAN segment subnet directly ... via the MAC address known to belong to the
|> destination IP address ... if it is known. No other routing is needed.
|
| Well, whether or not any other routing is needed depends on whether
| the IP address you're trying to reach is in fact reachable by that
| route. For example, 0.0.0.0/0 is a route to 1.2.3.4. But other routing
| is needed if 1.2.3.4 is your default gateway.

Have a static route entry for network 1.2.3.0/24 with a gateway of 0.0.0.0
and each interface device (if there are 3 interfaces, 3 such routes). Then
1.2.3.43 can send to 1.2.3.99 the usual way: broadcast an ARP query to get
a working MAC address of 1.2.3.99, and send the IP packet in an ethernet
frame destined to that MAC.

Both ethernet interfaces of the 1.2.3.99 machine will have 1.2.3.99 assigned.
Either can answer with its own MAC address when an ARP query arrives. The
one that works gets its MAC out. There is no need for other IP addresses
nor for any IP addresses to be gateways (except to get to places outside of
this redundant LAN).

David Schwartz

unread,

Apr 19, 2010, 9:27:00 PM4/19/10

to

On Apr 19, 6:04 pm, phil-news-nos...@ipal.net wrote:

> | There is no mechanism to ensure that one and only one ARP entry at a
> | time is associated with the IP address.

> What failure do you think will happen if there are two or more correct
> ARP entries?

It doesn't matter. You cannot design a system based on whether you can
think of ways it can fail or not. The problem is that it will fail in
ways you cannot think of. (Google 'arp flux' for some examples.)

> |> Actually, there is NO route to 1.2.3.99. There is a route FOR 1.2.3.0/24
> |> which specifies 0.0.0.0 as the gateway (effectively meaning "no gateway").
> |
> | The route 1.2.3.0/24 is a route to 1.2.3.99, and also a route to
> | 1.2.3.43, and it is a route to any address inside that range.

> And what is the gateway? 0.0.0.0?

The route doesn't have a gateway, it's an interface route. Obviously,
not every route could have a gateway or you'd have an infinite
regress.

> Both ethernet interfaces of the 1.2.3.99 machine will have 1.2.3.99 assigned.

You're confusing examples again. That's your crazy ARP scheme. I'm
talking about a sane OSPF scheme we we assign the 1.2.3.99 address to
a virtual interface. (Or to just one physical interface.) In that
case, there will be two routes to 1.2.3.99 under normal conditions
(possibly one of them the network route that's always there). When one
or the other interfaces fail, OSPF will add or remove routes as needed
to make 1.2.3.99 reachable through the other interface. (Either by
adding a /32 route, removing one, or whatever).

DS

phil-new...@ipal.net

unread,

Apr 20, 2010, 7:23:36 AM4/20/10

to

On Mon, 19 Apr 2010 18:27:00 -0700 (PDT) David Schwartz <dav...@webmaster.com> wrote:
| On Apr 19, 6:04�pm, phil-news-nos...@ipal.net wrote:
|
|> | There is no mechanism to ensure that one and only one ARP entry at a
|> | time is associated with the IP address.
|
|> What failure do you think will happen if there are two or more correct
|> ARP entries?
|
| It doesn't matter. You cannot design a system based on whether you can
| think of ways it can fail or not. The problem is that it will fail in
| ways you cannot think of. (Google 'arp flux' for some examples.)

That would be a problem if there are other interfaces to other subnets
besides just the one of interest. But that is not the case for what I am
doing. In my case, where I have 2 interfaces to the same subnet (and I
am saying subnet, not segment ... there can be one segment or two in this
case), this is not an issue.

For every IP address in the attached subnet, the interface route is what
will be used ... not a route with a gateway IP address. And even for the
packets destined off-subnet, where a gateway IP is used, the interface
route is still used to get to the gateway (which itself is on subnet).

|> |> Where will OSPF say that 1.2.3.99 is to be routed to? �1.2.3.99?
|> |
|> | In which case? In the failover case where 1.2.3.99 is reachable
|> | through the 1.2.3.0/24 LAN, it will say that 1.2.3.99 is to be routed
|> | to the IP address assigned to the other (still working) Ethernet
|> | interface.
|
|> Both ethernet interfaces of the 1.2.3.99 machine will have 1.2.3.99 assigned.
|
| You're confusing examples again. That's your crazy ARP scheme. I'm
| talking about a sane OSPF scheme we we assign the 1.2.3.99 address to
| a virtual interface. (Or to just one physical interface.) In that
| case, there will be two routes to 1.2.3.99 under normal conditions
| (possibly one of them the network route that's always there). When one
| or the other interfaces fail, OSPF will add or remove routes as needed
| to make 1.2.3.99 reachable through the other interface. (Either by
| adding a /32 route, removing one, or whatever).

I don't have a crazy ARP scheme ... I'm using ARP's default behaviour.
It's when the behaviour deviated from default that problems happened.
Now that the deviation has been resolved, it's working.

You're still trying to route packets to some gateway that should be routed
on-subnet via the interface directly. Why do you think that is right?

The simple solution actually works for the simple requirements (my little
twist on Ockham's Razor). I suspect you still don't grok the requirements.

David Schwartz

unread,

Apr 20, 2010, 3:03:05 PM4/20/10

to

On Apr 20, 4:23 am, phil-news-nos...@ipal.net wrote:

> You're still trying to route packets to some gateway that should be routed
> on-subnet via the interface directly. Why do you think that is right?

That's exactly how OSPF is supposed to work. If the normal route to a
destination is down, the OSPF link over that route also goes down.
OSPF detects this, and either revokes the link over the lost route or
adds a new route (that was previously not advertised because it was
not used).

I am not doing anything unusual. It just seems unusual to you because
you seem to have some confusion about network routes, as if they are
somehow different from other routes when it comes to the way routers
choose which route to take. It is quite normal for devices to have
multiple adjacencies to other devices and for a network route not to
be the preferred route even when it's operational.

Here's a simple example: Imagine you have two routers and two LANs.
Each router has one interface in each LAN. Like this:

Router 1:
100Mbps interface in 10.1.1.0/24 LAN, numbered 10.1.1.1
1000Mbps interface in 10.1.2.0/24 LAN, numbered 10.1.2.1

Router 2:
100Mbps interface in 10.1.1.0/24 LAN, numbered 10.1.1.2
1000 Mpbs interface in 10.1.2.0/24 LAN, numbered 10.1.2.2

Now, consider if router 1 has lots and lots of traffic destined to
10.1.1.2. It has a direct network route to 10.1.1.2 through the
10.1.1.0/24 LAN. But it should prefer the indirect route through the
10.1.2.0/24 LAN because that link has 10x the capacity. Of course, it
either router's interface in the 10.1.2.0/24 LAN goes down, traffic
between the routers will have to take the lower-capacity 10.1.1.0/24
LAN.

This is 100% standard, typical OSPF failover behavior. It is what OSPF
is designed to do.

In contrast, ARP is not designed to do any kind of failover at all.

DS

phil-new...@ipal.net

unread,

Apr 20, 2010, 9:09:29 PM4/20/10

to

On Tue, 20 Apr 2010 12:03:05 -0700 (PDT) David Schwartz <dav...@webmaster.com> wrote:
| On Apr 20, 4:23 am, phil-news-nos...@ipal.net wrote:
|
|> You're still trying to route packets to some gateway that should be routed
|> on-subnet via the interface directly. Why do you think that is right?
|
| That's exactly how OSPF is supposed to work. If the normal route to a
| destination is down, the OSPF link over that route also goes down.
| OSPF detects this, and either revokes the link over the lost route or
| adds a new route (that was previously not advertised because it was
| not used).

How does OSPF select which interface when both have the same IP address?

| I am not doing anything unusual. It just seems unusual to you because
| you seem to have some confusion about network routes, as if they are
| somehow different from other routes when it comes to the way routers
| choose which route to take. It is quite normal for devices to have
| multiple adjacencies to other devices and for a network route not to
| be the preferred route even when it's operational.

I do understand routes. I just don't need them.

| Here's a simple example: Imagine you have two routers and two LANs.

OK. We'll imagine and pretend. Just keep in mind that this is NOT the
situation I have.

| Each router has one interface in each LAN. Like this:
|
| Router 1:
| 100Mbps interface in 10.1.1.0/24 LAN, numbered 10.1.1.1
| 1000Mbps interface in 10.1.2.0/24 LAN, numbered 10.1.2.1

Two different subnets, I see.

| Router 2:
| 100Mbps interface in 10.1.1.0/24 LAN, numbered 10.1.1.2
| 1000 Mpbs interface in 10.1.2.0/24 LAN, numbered 10.1.2.2

Same two subnets here.

| Now, consider if router 1 has lots and lots of traffic destined to
| 10.1.1.2. It has a direct network route to 10.1.1.2 through the
| 10.1.1.0/24 LAN. But it should prefer the indirect route through the
| 10.1.2.0/24 LAN because that link has 10x the capacity. Of course, it
| either router's interface in the 10.1.2.0/24 LAN goes down, traffic
| between the routers will have to take the lower-capacity 10.1.1.0/24
| LAN.

Why would traffic be destined to a router? Sure, if I were logging in to
the router to reconfigure it or pull down stats, then I would connect to
the router. Regular traffic should be destined to other places like web
servers and such.

Yes, it would seem that going over the 1000Mbps LAN would be preferred.
And yes, falling back to the 100Mbps LAN when the 1000Mbps LAN is down
is very much preferred.

| This is 100% standard, typical OSPF failover behavior. It is what OSPF
| is designed to do.

Sure.

| In contrast, ARP is not designed to do any kind of failover at all.

ARP is much more simplistic. It isn't designed to do failover specifically.
But it is designed to find where to deliver packets with an ethernet segment
(e.g. it gets the MAC address), and it does so only over the working segment
when only one is working. The net effect is simplist failover happens.

Your above scenario does not describe my network. So nothing has been
established. But if you'd like to take a shot, here we go:

There are 2 gigabit switches. Most servers have two interfaces, with one
connected to LAN 1, and the other connected to LAN 2. Both LANs are
numbered with the same subnet ... 172.30.0.0/16. Every server has the
same IP address on both interfaces. A couple servers have only one
interface and are connected to LAN 1 only. There is a firewall with two
interfaces, but it is not connected to LAN 2. One firewall interface
is connected to LAN 1. The other firewall interface is connected to a
different DMZ LAN. The access router is also connected to the DMZ LAN
(and the internet connection circuits). Later, a honey pot server will
also be connected to the LAN.

There are no other routers and none will be added. The bulk (99%) of
traffic is between these various servers. It's enough to saturate a
100 mbps LAN, but one gigabit LAN is handling it just fine.

If the LANs were to fail, the work of these servers stops, and that is a
big issue. If the router fails, it's not an issue. Internet access is
not critical to the server function. In fact, all but a few servers are
blocked from accessing the internet. If the firewall or router or DMZ LAN
goes down, we'll fix it in the next day or so.

If a LAN goes down, work can continue for a while. Most jobs can keep on
running. A few will wait for file server access. If the LAN comes back
within 15 minutes, things will resume smoothly. The scheme I have set up
will use LAN 2 if LAN 1 fails, or use LAN 1 if LAN 2 fails ... within a
few minutes, usually 2 minutes or so.

Whether servers use LAN 1 or LAN 2 to talk among themselves is not important.
Obviously they need to use LAN 1 to get to the internet. But they don't do
much of that most of the time (only when someone accesses the job scheduling
web site and that's generally during the day).

David Schwartz

unread,

Apr 20, 2010, 9:58:37 PM4/20/10

to

On Apr 20, 6:09 pm, phil-news-nos...@ipal.net wrote:

> How does OSPF select which interface when both have the same IP address?

Nobody ever talked about using OSPF in an environment where multiple
interfaces have the same IP address. I suggested OSPF as a sane
alternative to giving both interfaces the same IP address.

> Why would traffic be destined to a router? Sure, if I were logging in to
> the router to reconfigure it or pull down stats, then I would connect to
> the router. Regular traffic should be destined to other places like web
> servers and such.

I used routers to make things simple. The situation is precisely the
same if they're hosts.

> Yes, it would seem that going over the 1000Mbps LAN would be preferred.
> And yes, falling back to the 100Mbps LAN when the 1000Mbps LAN is down
> is very much preferred.

Exactly. And the situation is the same even if they're in the same
LAN. And the situation is the same if it's only interface on one end
but two on the other.

> ARP is much more simplistic. It isn't designed to do failover specifically.

Exactly.

> But it is designed to find where to deliver packets with an ethernet segment
> (e.g. it gets the MAC address), and it does so only over the working segment
> when only one is working. The net effect is simplist failover happens.

Right, but not by design. So you are trying to make a scheme work by
accident rather than by design.

This can easily be handled by OSPF. Just get rid of the ridiculous
duplicate numbering and let OSPF routes do the job. This is what OSPF
is *designed* to do.

DS

phil-new...@ipal.net

unread,

Apr 21, 2010, 12:31:53 PM4/21/10

to

On Tue, 20 Apr 2010 18:58:37 -0700 (PDT) David Schwartz <dav...@webmaster.com> wrote:
| On Apr 20, 6:09 pm, phil-news-nos...@ipal.net wrote:
|
|> How does OSPF select which interface when both have the same IP address?
|
| Nobody ever talked about using OSPF in an environment where multiple
| interfaces have the same IP address. I suggested OSPF as a sane
| alternative to giving both interfaces the same IP address.

So you want to confuse the issue by having additional IP addresses?
How many additional IP addresses would be involved?

Doesn't sound sane to me at all.

|> Why would traffic be destined to a router? Sure, if I were logging in to
|> the router to reconfigure it or pull down stats, then I would connect to
|> the router. Regular traffic should be destined to other places like web
|> servers and such.
|
| I used routers to make things simple. The situation is precisely the
| same if they're hosts.

Sounds like you want the hosts to play like they are being routers,
with the "gateway" being the additional IP address on the same machine
as the packet is destined to. We are drifting further from KISS.

|> Yes, it would seem that going over the 1000Mbps LAN would be preferred.
|> And yes, falling back to the 100Mbps LAN when the 1000Mbps LAN is down
|> is very much preferred.
|
| Exactly. And the situation is the same even if they're in the same
| LAN. And the situation is the same if it's only interface on one end
| but two on the other.

What if there are two LANs of equal speed?

|> ARP is much more simplistic. It isn't designed to do failover specifically.
|
| Exactly.

Your mindset seems to be that failover requires something to change some
element of configuration. Thus the configuration is dynamic. What I am
using is a static configuration. The configuration is not changed. The
state of awareness of the environment (the ARP table) is all that changes
in my scheme. ARP does not "do" a failover ... it "is" a failover.

|> But it is designed to find where to deliver packets with an ethernet segment
|> (e.g. it gets the MAC address), and it does so only over the working segment
|> when only one is working. The net effect is simplist failover happens.
|
| Right, but not by design. So you are trying to make a scheme work by
| accident rather than by design.

I see no accident involved. It's state change, not configuration change.

[snip]

| This can easily be handled by OSPF. Just get rid of the ridiculous
| duplicate numbering and let OSPF routes do the job. This is what OSPF
| is *designed* to do.

I look at it the other way. Just get rid of the ridiculous redundant IP
addresses and use ARP table states to get things to go directly where
they are supposed to go by whatever path is working.

Let me know when OSPF learns how to update the ARP table. I have no
intention to pretend that I am doing routing over an ethernet segment
where every host is directly reachable from every other. For a network
of multiple LANs where there are multiple paths, then I will likely be
running OSPF ... on the routers ... but not the individual hosts.