Hello,
Newbie here and I'm trying to sort through some confusion I hope someone can help me with. This is long winded based on how I've tried to figure this out so far (and it probably doesn't need to be if I knew more!) -- so grab a coffee if you actually want to read this ;).
On AWS, I'm launching some CoreOS EC2 instances via custom Cloudformation (CF) template. At the moment, this is just a t2.micro launch where I'm attaching 2 ENI network interfaces and assigning 2 private IP addresses per interface to each EC2 instance which the t2.micro size allows for. There is no custom user data being passed for ignition or anything else at this point (I still have to learn about that). After creating the instances with CF, the EC2 console shows each instance has 2 network interfaces attached and 2 IP per interface have been assigned successfully.
As an example, for one of these instances the assignment for interface 0 is 172.16.11.11 (primary interface IP) and 172.16.11.12 (secondary IP). For interface 1 the IPs are 172.16.11.13 (primary interface IP) and 172.16.11.14 (secondary).
These IPs are all in the same subnet and all in the same VPC. The subnet is
172.16.0.0/17, the gateway is 172.16.0.1 and DNS is 172.16.0.2.
To test ping, I'm doing so from a bastion host located in a different subnet:
172.16.255.0/24 at IP 172.16.255.11.
When I test ping from the bastion host to each of the CoreOS's IPs after the initial instance creation, I get:
PING 172.16.11.11 : No Response
PING 172.16.11.12 : No Response
PING 172.16.11.13 : Success, but some ICMP sequence numbers are missing (see 2, 4, 6, 8, 9 - 12 below):
PING 172.16.11.13 (172.16.11.13) 56(84) bytes of data.
64 bytes from
172.16.11.13: icmp_seq=10 ttl=64 time=0.532 ms
64 bytes from
172.16.11.13: icmp_seq=12 ttl=64 time=0.468 ms
PING 172.16.11.14 : Fail with the message below, also note that the response IPs differ (11 and 13) and the ICMP sequence is not sequential:
PING 172.16.11.14 (172.16.11.14) 56(84) bytes of data.
From 172.16.11.13 icmp_seq=1 Destination Host Unreachable
From 172.16.11.13 icmp_seq=2 Destination Host Unreachable
From 172.16.11.13 icmp_seq=3 Destination Host Unreachable
From 172.16.11.13 icmp_seq=4 Destination Host Unreachable
From 172.16.11.13 icmp_seq=6 Destination Host Unreachable
From 172.16.11.13 icmp_seq=8 Destination Host Unreachable
From 172.16.11.11 icmp_seq=5 Destination Host Unreachable
From 172.16.11.11 icmp_seq=7 Destination Host Unreachable
From 172.16.11.13 icmp_seq=10 Destination Host Unreachable
From 172.16.11.11 icmp_seq=9 Destination Host Unreachable
From 172.16.11.13 icmp_seq=12 Destination Host Unreachable
From 172.16.11.11 icmp_seq=11 Destination Host Unreachable
If I run the same ping tests from a host within the same subnet (172.16.99.35), I get:
PING 172.16.11.11 : No Response
PING 172.16.11.12 : No Response
PING 172.16.11.13 : Success, and this time all ICMP response sequence numbers are there and in order, in contrast to above.
PING 172.16.11.14 : Fail, with a mixture of errors:
PING 172.16.11.14 (172.16.11.14) 56(84) bytes of data.
From
172.16.11.13: icmp_seq=1 Redirect Host(New nexthop: 172.16.11.14)
From
172.16.11.13: icmp_seq=2 Redirect Host(New nexthop: 172.16.11.14)
From
172.16.11.13: icmp_seq=3 Redirect Host(New nexthop: 172.16.11.14)
From
172.16.11.13: icmp_seq=4 Redirect Host(New nexthop: 172.16.11.14)
From 172.16.11.13 icmp_seq=5 Destination Host Unreachable
After SSHing into the CoreOS instance from the bastion box located in a different subnet (via 172.16.11.13 and having to try at least twice before it works, presumably due the same issue that caused some ICMP sequence responses to be lost for pinging the same IP above), I can review the following info:
ip addr shows the following for eth0 and eth1. Nothing is seen for .12 and .14, the secondary addresses on each interface:
"route -n" and "ip route" show (and I don't really understand routing info yet -- I need to look at this further):
core@ip-172-16-11-11 ~ $ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.16.0.1 0.0.0.0 UG 1024 0 0 eth1
0.0.0.0 172.16.0.1 0.0.0.0 UG 1024 0 0 eth0
172.16.0.0 0.0.0.0 255.255.128.0 U 0 0 0 eth1
172.16.0.0 0.0.0.0 255.255.128.0 U 0 0 0 eth0
172.16.0.1 0.0.0.0 255.255.255.255 UH 1024 0 0 eth1
172.16.0.1 0.0.0.0 255.255.255.255 UH 1024 0 0 eth0
core@ip-172-16-11-11 ~ $ ip route
default via 172.16.0.1 dev eth1 proto dhcp src 172.16.11.13 metric 1024
default via 172.16.0.1 dev eth0 proto dhcp src 172.16.11.11 metric 1024
172.16.0.1 dev eth1 proto dhcp scope link src 172.16.11.13 metric 1024
172.16.0.1 dev eth0 proto dhcp scope link src 172.16.11.11 metric 1024
Networkctl status shows both adapters are grabbing DHCP addresses which makes sense due to the zz-default.network file and explains why the secondary IP assignments are not made:
core@ip-172-16-11-11 ~ $ networkctl status eth0
2: eth0
Link File: /usr/lib64/systemd/network/99-default.link
Network File: /usr/lib64/systemd/network/zz-default.network
Type: ether
State: routable (configured)
Path: xen-vif-0
Driver: vif
Address: 172.16.11.11
Gateway: 172.16.0.1
DNS: 172.16.0.2
Search Domains: us-east-2.compute.internal
core@ip-172-16-11-11 ~ $ networkctl status eth1
3: eth1
Link File: /usr/lib64/systemd/network/99-default.link
Network File: /usr/lib64/systemd/network/zz-default.network
Type: ether
State: routable (configured)
Path: xen-vif-1
Driver: vif
Address: 172.16.11.13
Gateway: 172.16.0.1
DNS: 172.16.0.2
Search Domains: us-east-2.compute.internal
So I looked at the networkingd document found on the CoreOS site (
https://coreos.com/os/docs/latest/network-config-with-networkd.html) and tried placing the following two files in /etc/systemd/network. (The file format here is changed a little from what is shown in the CoreOS docs because of an issue an Ubuntu user posted who said this format resolved his issue. Both this file format and the one shown in the CoreOS link above result in the same outcome described below).
Filename: 10-static.network
[Match]
Name=eth0
[Network]
Gateway=172.16.0.1
DNS=172.16.0.2
[Address]
[Address]
and
Filename: 20-static.network
[Match]
Name=eth1
[Network]
Gateway=172.16.0.1
DNS=172.16.0.2
[Address]
[Address]
Now, after rebooting:
ip addr shows that the secondary IPs have been successfully added to each interface:
"route -n" and "ip route" show:
core@ip-172-16-11-11 ~ $ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.16.0.1 0.0.0.0 UG 0 0 0 eth1
0.0.0.0 172.16.0.1 0.0.0.0 UG 0 0 0 eth0
172.16.0.0 0.0.0.0 255.255.128.0 U 0 0 0 eth1
172.16.0.0 0.0.0.0 255.255.128.0 U 0 0 0 eth0
core@ip-172-16-11-11 ~ $ ip route
default via 172.16.0.1 dev eth1 proto static
default via 172.16.0.1 dev eth0 proto static
Networkctl status shows:
core@ip-172-16-11-11 ~ $ networkctl status eth0
2: eth0
Link File: /usr/lib64/systemd/network/99-default.link
Network File: /etc/systemd/network/10-static.network
Type: ether
State: routable (configured)
Path: xen-vif-0
Driver: vif
Address: 172.16.11.11
172.16.11.12
Gateway: 172.16.0.1
DNS: 172.16.0.2
core@ip-172-16-11-11 ~ $ networkctl status eth1
3: eth1
Link File: /usr/lib64/systemd/network/99-default.link
Network File: /etc/systemd/network/20-static.network
Type: ether
State: routable (configured)
Path: xen-vif-1
Driver: vif
Address: 172.16.11.13
172.16.11.14
Gateway: 172.16.0.1
DNS: 172.16.0.2
So the final results are:
PING 172.16.11.11 : No Response
PING 172.16.11.12 : No Response
PING 172.16.11.13 : Success, and all ICMP response sequence numbers are there and in order.
PING 172.16.11.14 : Success, and all ICMP response sequence numbers are there and in order.
PING 172.16.11.11 : No Response
PING 172.16.11.12 : No Response
PING 172.16.11.13 : Some success, but some ICMP response sequence numbers are missing.
PING 172.16.11.14 : Success, and all ICMP response sequence numbers are there and in order.
If I ping from the CoreOS instance to the other host in the same subnet and specify the IP and interface to ping from, the results are:
ping -I 172.16.11.13
172.16.99.35: Success, and all ICMP response sequence numbers are there and in order.
ping -I 172.16.11.14
172.16.99.35: Success, and all ICMP response sequence numbers are there and in order.
ping -I eth1
172.16.99.35: Success, and all ICMP response sequence numbers are there and in order.
The same results are obtained upon pinging from the CoreOS instance to the bastion host in the other subet (172.16.255.11).
So, clearly eth0 is still not working properly even though the addresses are assigned. There also seems to be a remaining intermittent ICMP packet loss or wrong sequence issue when pinging from a different subnet. I'm hoping the later issue disappears when the former gets straightened out.
I'm still on a really steep learning curve with all this, but based on a bunch of Googling and reading, it seems like my issue is now a "source routing" issue and I should learn more about the routing table, possibly using multiple routing tables with rules (one table and rule for each IP) to make sure each inbound network stream on a given IP receives an outbound response from the same IP.
If anyone is familiar with this and can offer some advice that would be great. I think if I do it alone I'm probably looking at another many hours of work to get this figured out.
Thanks,
Johnny