CoreOS EC2 Multiple ENI and multiple static IP ping problem -- source routing issue?

397 views
Skip to first unread message

Johnny

unread,
Jul 1, 2017, 1:36:44 PM7/1/17
to CoreOS User
Hello,

Newbie here and I'm trying to sort through some confusion I hope someone can help me with.  This is long winded based on how I've tried to figure this out so far (and it probably doesn't need to be if I knew more!) -- so grab a coffee if you actually want to read this ;).

On AWS, I'm launching some CoreOS EC2 instances via custom Cloudformation (CF) template.  At the moment, this is just a t2.micro launch where I'm attaching 2 ENI network interfaces and assigning 2 private IP addresses per interface to each EC2 instance which the t2.micro size allows for.  There is no custom user data being passed for ignition or anything else at this point (I still have to learn about that).  After creating the instances with CF, the EC2 console shows each instance has 2 network interfaces attached and 2 IP per interface have been assigned successfully.

As an example, for one of these instances the assignment for interface 0 is 172.16.11.11 (primary interface IP) and 172.16.11.12 (secondary IP).  For interface 1 the IPs are 172.16.11.13 (primary interface IP) and 172.16.11.14 (secondary).

These IPs are all in the same subnet and all in the same VPC.  The subnet is 172.16.0.0/17, the gateway is 172.16.0.1 and DNS is 172.16.0.2.

To test ping, I'm doing so from a bastion host located in a different subnet: 172.16.255.0/24 at IP 172.16.255.11.

When I test ping from the bastion host to each of the CoreOS's IPs after the initial instance creation, I get:

PING 172.16.11.11 : No Response
PING 172.16.11.12 : No Response
PING 172.16.11.13 : Success, but some ICMP sequence numbers are missing (see 2, 4, 6, 8, 9 - 12 below):

PING 172.16.11.13 (172.16.11.13) 56(84) bytes of data.
64 bytes from 172.16.11.13: icmp_seq=2 ttl=64 time=0.476 ms
64 bytes from 172.16.11.13: icmp_seq=4 ttl=64 time=0.502 ms
64 bytes from 172.16.11.13: icmp_seq=6 ttl=64 time=0.442 ms
64 bytes from 172.16.11.13: icmp_seq=8 ttl=64 time=0.496 ms
64 bytes from 172.16.11.13: icmp_seq=9 ttl=64 time=0.482 ms
64 bytes from 172.16.11.13: icmp_seq=10 ttl=64 time=0.532 ms
64 bytes from 172.16.11.13: icmp_seq=11 ttl=64 time=2.10 ms
64 bytes from 172.16.11.13: icmp_seq=12 ttl=64 time=0.468 ms

PING 172.16.11.14 : Fail with the message below, also note that the response IPs differ (11 and 13) and the ICMP sequence is not sequential:

PING 172.16.11.14 (172.16.11.14) 56(84) bytes of data.
From 172.16.11.13 icmp_seq=1 Destination Host Unreachable
From 172.16.11.13 icmp_seq=2 Destination Host Unreachable
From 172.16.11.13 icmp_seq=3 Destination Host Unreachable
From 172.16.11.13 icmp_seq=4 Destination Host Unreachable
From 172.16.11.13 icmp_seq=6 Destination Host Unreachable
From 172.16.11.13 icmp_seq=8 Destination Host Unreachable
From 172.16.11.11 icmp_seq=5 Destination Host Unreachable
From 172.16.11.11 icmp_seq=7 Destination Host Unreachable
From 172.16.11.13 icmp_seq=10 Destination Host Unreachable
From 172.16.11.11 icmp_seq=9 Destination Host Unreachable
From 172.16.11.13 icmp_seq=12 Destination Host Unreachable
From 172.16.11.11 icmp_seq=11 Destination Host Unreachable

If I run the same ping tests from a host within the same subnet (172.16.99.35), I get:

PING 172.16.11.11 : No Response
PING 172.16.11.12 : No Response
PING 172.16.11.13 : Success, and this time all ICMP response sequence numbers are there and in order, in contrast to above.
PING 172.16.11.14 : Fail, with a mixture of errors:

PING 172.16.11.14 (172.16.11.14) 56(84) bytes of data.

From 172.16.11.13: icmp_seq=1 Redirect Host(New nexthop: 172.16.11.14)
From 172.16.11.13: icmp_seq=2 Redirect Host(New nexthop: 172.16.11.14)
From 172.16.11.13: icmp_seq=3 Redirect Host(New nexthop: 172.16.11.14)
From 172.16.11.13: icmp_seq=4 Redirect Host(New nexthop: 172.16.11.14)
From 172.16.11.13 icmp_seq=5 Destination Host Unreachable


After SSHing into the CoreOS instance from the bastion box located in a different subnet (via 172.16.11.13 and having to try at least twice before it works, presumably due the same issue that caused some ICMP sequence responses to be lost for pinging the same IP above), I can review the following info:

ip addr shows the following for eth0 and eth1.  Nothing is seen for .12 and .14, the secondary addresses on each interface:

eth0: inet 172.16.11.11/17 brd 172.16.127.255 scope global dynamic eth0
eth1: inet 172.16.11.13/17 brd 172.16.127.255 scope global dynamic eth1


"route -n" and "ip route" show (and I don't really understand routing info yet -- I need to look at this further):


core@ip-172-16-11-11 ~ $ route -n

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.16.0.1      0.0.0.0         UG    1024   0        0 eth1
0.0.0.0         172.16.0.1      0.0.0.0         UG    1024   0        0 eth0
172.16.0.0      0.0.0.0         255.255.128.0   U     0      0        0 eth1
172.16.0.0      0.0.0.0         255.255.128.0   U     0      0        0 eth0
172.16.0.1      0.0.0.0         255.255.255.255 UH    1024   0        0 eth1
172.16.0.1      0.0.0.0         255.255.255.255 UH    1024   0        0 eth0

core@ip-172-16-11-11 ~ $ ip route

default via 172.16.0.1 dev eth1  proto dhcp  src 172.16.11.13  metric 1024
default via 172.16.0.1 dev eth0  proto dhcp  src 172.16.11.11  metric 1024
172.16.0.0/17 dev eth1  proto kernel  scope link  src 172.16.11.13
172.16.0.0/17 dev eth0  proto kernel  scope link  src 172.16.11.11
172.16.0.1 dev eth1  proto dhcp  scope link  src 172.16.11.13  metric 1024
172.16.0.1 dev eth0  proto dhcp  scope link  src 172.16.11.11  metric 1024


Networkctl status shows both adapters are grabbing DHCP addresses which makes sense due to the zz-default.network file and explains why the secondary IP assignments are not made:

core@ip-172-16-11-11 ~ $ networkctl status eth0
2: eth0
       Link File: /usr/lib64/systemd/network/99-default.link
    Network File: /usr/lib64/systemd/network/zz-default.network
            Type: ether
           State: routable (configured)
            Path: xen-vif-0
          Driver: vif
         Address: 172.16.11.11
         Gateway: 172.16.0.1
             DNS: 172.16.0.2
  Search Domains: us-east-2.compute.internal

core@ip-172-16-11-11 ~ $ networkctl status eth1
3: eth1
       Link File: /usr/lib64/systemd/network/99-default.link
    Network File: /usr/lib64/systemd/network/zz-default.network
            Type: ether
           State: routable (configured)
            Path: xen-vif-1
          Driver: vif
         Address: 172.16.11.13
         Gateway: 172.16.0.1
             DNS: 172.16.0.2
  Search Domains: us-east-2.compute.internal


So I looked at the networkingd document found on the CoreOS site (https://coreos.com/os/docs/latest/network-config-with-networkd.html) and tried placing the following two files in /etc/systemd/network.  (The file format here is changed a little from what is shown in the CoreOS docs because of an issue an Ubuntu user posted who said this format resolved his issue.  Both this file format and the one shown in the CoreOS link above result in the same outcome described below).


Filename: 10-static.network


[Match]
Name=eth0

[Network]
Gateway=172.16.0.1
DNS=172.16.0.2

[Address]

[Address]


and


Filename: 20-static.network


[Match]
Name=eth1

[Network]
Gateway=172.16.0.1
DNS=172.16.0.2

[Address]

[Address]



Now, after rebooting:

ip addr shows that the secondary IPs have been successfully added to each interface:

eth0: inet 172.16.11.11/17 brd 172.16.127.255 scope global eth0
        inet 172.16.11.12/17 brd 172.16.127.255 scope global secondary eth0

eth1: inet 172.16.11.13/17 brd 172.16.127.255 scope global eth1
        inet 172.16.11.14/17 brd 172.16.127.255 scope global secondary eth1


"route -n" and "ip route" show:


core@ip-172-16-11-11 ~ $ route -n

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         172.16.0.1      0.0.0.0         UG    0      0        0 eth1
0.0.0.0         172.16.0.1      0.0.0.0         UG    0      0        0 eth0
172.16.0.0      0.0.0.0         255.255.128.0   U     0      0        0 eth1
172.16.0.0      0.0.0.0         255.255.128.0   U     0      0        0 eth0


core@ip-172-16-11-11 ~ $ ip route

default via 172.16.0.1 dev eth1  proto static
default via 172.16.0.1 dev eth0  proto static
172.16.0.0/17 dev eth1  proto kernel  scope link  src 172.16.11.13
172.16.0.0/17 dev eth0  proto kernel  scope link  src 172.16.11.11


Networkctl status shows:


core@ip-172-16-11-11 ~ $ networkctl status eth0

2: eth0
       Link File: /usr/lib64/systemd/network/99-default.link
    Network File: /etc/systemd/network/10-static.network
            Type: ether
           State: routable (configured)
            Path: xen-vif-0
          Driver: vif
         Address: 172.16.11.11
                  172.16.11.12
         Gateway: 172.16.0.1
             DNS: 172.16.0.2

core@ip-172-16-11-11 ~ $ networkctl status eth1

3: eth1
       Link File: /usr/lib64/systemd/network/99-default.link
    Network File: /etc/systemd/network/20-static.network
            Type: ether
           State: routable (configured)
            Path: xen-vif-1
          Driver: vif
         Address: 172.16.11.13
                  172.16.11.14
         Gateway: 172.16.0.1
             DNS: 172.16.0.2


So the final results are:

For ping testing from a host in the same subnet (172.16.99.35/17):

PING 172.16.11.11 : No Response
PING 172.16.11.12 : No Response
PING 172.16.11.13 : Success, and all ICMP response sequence numbers are there and in order.
PING 172.16.11.14 : Success, and all ICMP response sequence numbers are there and in order.


For ping testing from a host in a different subnet (172.16.255.11/24):

PING 172.16.11.11 : No Response
PING 172.16.11.12 : No Response
PING 172.16.11.13 : Some success, but some ICMP response sequence numbers are missing.
PING 172.16.11.14 : Success, and all ICMP response sequence numbers are there and in order.


If I ping from the CoreOS instance to the other host in the same subnet and specify the IP and interface to ping from, the results are:


ping -I 172.16.11.11 172.16.99.35: No response
ping -I 172.16.11.12 172.16.99.35: No response
ping -I 172.16.11.13 172.16.99.35: Success, and all ICMP response sequence numbers are there and in order.
ping -I 172.16.11.14 172.16.99.35: Success, and all ICMP response sequence numbers are there and in order.
ping -I eth0 172.16.99.35: No response
ping -I eth1 172.16.99.35: Success, and all ICMP response sequence numbers are there and in order.


The same results are obtained upon pinging from the CoreOS instance to the bastion host in the other subet (172.16.255.11).


So, clearly eth0 is still not working properly even though the addresses are assigned.  There also seems to be a remaining intermittent ICMP packet loss or wrong sequence issue when pinging from a different subnet.  I'm hoping the later issue disappears when the former gets straightened out.

I'm still on a really steep learning curve with all this, but based on a bunch of Googling and reading, it seems like my issue is now a "source routing" issue and I should learn more about the routing table, possibly using multiple routing tables with rules (one table and rule for each IP) to make sure each inbound network stream on a given IP receives an outbound response from the same IP.

If anyone is familiar with this and can offer some advice that would be great.  I think if I do it alone I'm probably looking at another many hours of work to get this figured out.

Thanks,

Johnny



Seán C. McCord

unread,
Jul 1, 2017, 7:39:34 PM7/1/17
to Johnny, CoreOS User

This is normal behaviour for your setup, and it is not unique to CoreOS Container Linux.  The problem is that you are setting up two different interfaces to the same subnet.  You can do this, but not without seeing up some kernel routing rules.  The switch is expecting traffic to match the port ARP tables.  Instead, you're sending responses out whatever interface is currently the leading route for that subnet.  The switch drops any of those which don't match what it thinks should be the correct source.

The real question is why you would want to do this.  If it is for aggregation, use interface bonding.  If it is for software separation, use local interface aliases.

Otherwise, it makes no sense to assign two interfaces on the same host to the same subnet.


--
You received this message because you are subscribed to the Google Groups "CoreOS User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Seán C McCord
CyCore Systems, Inc

Johnny

unread,
Jul 1, 2017, 10:02:56 PM7/1/17
to CoreOS User, johnnyp...@gmail.com
Hi Sean, thanks for the reply.

Some software being developed will require access to a large number of ports to avoid port exhaustion.  My understanding is that each IP adds another 65k of port availability (ex: https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/).  On EC2 in AWS, the number of IP aliases allowed is determined by the instance size.  There is no instance size that has the desired number of aliased IPs available to put onto a single network interface, but if multiple interfaces are used, each with the allowed number of IP per interface that would work.  So, that's why I'm looking at this approach.

By use "local interface aliases" below, I think you mean alias the IPs desired to a single interface.  If you're referring to something else, let me know.  I googled the quoted words and only got a few hits.

So unless I've understood you or am missing your point, my next step will be to learn some more routing and traffic control, such as http://lartc.org/lartc.pdf.  Maybe there's also a way to avoid port exhaustion with containerization and CoreOS, but for now, I'm working this angle and hope to resolve the issue so it's an option going forward. 

I haven't really found any documentation or examples for implementing routing tables and rules in CoreOS with networkd.  If you, or anyone else, can offer any pointers or resources, that would be awesome.

Thanks,

Johnny

Seán C. McCord

unread,
Jul 2, 2017, 11:30:45 AM7/2/17
to Johnny, CoreOS User
You are correct about the port exhaustion.  In a normal environment, this would be absurd, but it sounds like AWS is making the absurd a requirement.  If only that were novel.

Oh, well.  Yes, you'll want to tag traffic on ingress to send it back out the same interface.  If you are going to be sending packets outside the local network, you will also likely need multiple routing tables.  All of this is covered in the Linux Advanced Router documentation you referenced.  This is also a common scheme for "multi-homed" systems... there are many examples of that available.




--
You received this message because you are subscribed to the Google Groups "CoreOS User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Johnny

unread,
Jul 2, 2017, 1:24:51 PM7/2/17
to CoreOS User, johnnyp...@gmail.com
Hi Sean, thanks again for this feedback.  I'll post again if I run into a brick wall trying to get this working in CoreOS ;).  Happy 4th!

Johnny
Reply all
Reply to author
Forward
0 new messages