VMware scrolling NIC Link is Up 10000 Mbps

702 views
Skip to first unread message

Chris Featherstone

unread,
Apr 13, 2016, 4:19:41 PM4/13/16
to CoreOS User
I am pretty new to coreos. I am setting up a 3 node cluster in our lab using vmware 5.5. I have been able to get the 3 nodes setup through discovery and etcd2 seems to start and keys synchronize between the nodes.

The issue i am running across is eventually the nodes will die becuase of something going on with the network. The console will constantly scroll "vmxnet3 0000:b0:00.0 ens192" NIC Link is Up 10000 Mbps". There is never a link down event, and if i disconnect the nic in vmware a link down event does occur. If i leave the nic disconnected the console will scroll link is down.

Being new at coreos and etcd, i am unsure if this is an issue with vmware or bug in coreos. I have followed the docs on building my nodes. I have tried the ova files from stable, beta and alpha with the same results. In my vmware environment i have tried the e1000 nic and vmxnet3. I have tried different hosts and datastores in my environment. Always seem oto have the same result. Performance is so degraded that it is essentially unusable and many times the vm will panic after a while.

I pass in these parameters to the .vmx file

guestinfo.hostname = "intcoreos1"
guestinfo.interface.0.role = "private"
guestinfo.dns.server.0 = "100.81.225.2"
guestinfo.interface.0.route.0.gateway = "100.81.35.254"
guestinfo.interface.0.route.0.destination = "0.0.0.0/0"
guestinfo.interface.0.mac = "00:50:56:a3:33:28"
guestinfo.interface.0.dhcp = "no"
guestinfo.interface.0.ip.0.address = "100.81.34.52/22"
guestinfo.coreos.config.data.encoding = "base64"
guestinfo.coreos.config.data = "I2Nsb3VkLWNvbmZpZwoKc3NoX2F1dGhvcml6ZWRfa2V5czoKICAgIC0gInNzaC1yc2EgQUFBQUIzTnphQzF5YzJFQUFBQUJKUUFBQUlCelF0VWIyN2I4V2RIOXVEN0Y3S3hoUnYxazhjaDVBVTNjWGFnZVZaZHdBTGJsVW5NSlA2UVY5Wk5VeVdwSDkxN1N6d0hSTFdleHV2bGIvcE9aeWJZMXgvVGpGVncrWDNXTjZ4WHQrMFlBRkhTQWhyOEhyWHdnY0NwclBmakxLTnFsU2FCc0NqOUdpZUxoK2l3NmN6KytKdjhRcE1wVXcwNG1hTSsveWhzbEt3PT0iCgpjb3Jlb3M6CiAgdW5pdHM6CiAgICAtIG5hbWU6IGV0Y2QyLnNlcnZpY2UKICAgICAgY29tbWFuZDogc3RhcnQKICAgIC0gbmFtZTogZmxlZXQuc2VydmljZQogICAgICBjb21tYW5kOiBzdGFydAogICAgLSBuYW1lOiB2bXRvb2xzZC5zZXJ2aWNlCiAgICAgIGNvbW1hbmQ6IHN0YXJ0CiAgICAgIGNvbnRlbnQ6IHwKICAgICAgICBbVW5pdF0KICAgICAgICBEZXNjcmlwdGlvbj1WTXdhcmUgVG9vbHMgQWdlbnQKICAgICAgICBEb2N1bWVudGF0aW9uPWh0dHA6Ly9vcGVuLXZtLXRvb2xzLnNvdXJjZWZvcmdlLm5ldC8KICAgICAgICBDb25kaXRpb25WaXJ0dWFsaXphdGlvbj12bXdhcmUKCiAgICAgICAgW1NlcnZpY2VdCiAgICAgICAgRXhlY1N0YXJ0UHJlPS91c3IvYmluL2xuIC1zZlQgL3Vzci9zaGFyZS9vZW0vdm13YXJlLXRvb2xzIC9ldGMvdm13YXJlLXRvb2xzCiAgICAgICAgRXhlY1N0YXJ0PS91c3Ivc2hhcmUvb2VtL2Jpbi92bXRvb2xzZAogICAgICAgIFRpbWVvdXRTdG9wU2VjPTUKCiAgICAtIG5hbWU6IG9lbS1jbG91ZGluaXQuc2VydmljZQogICAgICBjb21tYW5kOiByZXN0YXJ0CiAgICAgIHJ1bnRpbWU6IHllcwogICAgICBjb250ZW50OiB8CiAgICAgICAgW1VuaXRdCiAgICAgICAgRGVzY3JpcHRpb249Q2xvdWRpbml0IGZyb20gVk13YXJlIEJhY2tkb29yCgogICAgICAgIFtTZXJ2aWNlXQogICAgICAgIFR5cGU9b25lc2hvdAogICAgICAgIEV4ZWNTdGFydD0vdXNyL2Jpbi9jb3Jlb3MtY2xvdWRpbml0IC0tb2VtPXZtd2FyZQoKICBldGNkMjoKICAgIGRpc2NvdmVyeTogaHR0cHM6Ly9kaXNjb3ZlcnkuZXRjZC5pby82NmRhNjhhN2M3OTc2MTJlNTcwMzUwZThhNjZmZTU2OQogICAgYWR2ZXJ0aXNlLWNsaWVudC11cmxzOiBodHRwOi8vJHByaXZhdGVfaXB2NDoyMzc5LGh0dHA6Ly8kcHJpdmF0ZV9pcHY0OjQwMDEKICAgIGluaXRpYWwtYWR2ZXJ0aXNlLXBlZXItdXJsczogaHR0cDovLyRwcml2YXRlX2lwdjQ6MjM4MAogICAgIyBsaXN0ZW4gb24gYm90aCB0aGUgb2ZmaWNpYWwgcG9ydHMgYW5kIHRoZSBsZWdhY3kgcG9ydHMKICAgICMgbGVnYWN5IHBvcnRzIGNhbiBiZSBvbWl0dGVkIGlmIHlvdXIgYXBwbGljYXRpb24gZG9lc24ndCBkZXBlbmQgb24gdGhlbQogICAgbGlzdGVuLWNsaWVudC11cmxzOiBodHRwOi8vMC4wLjAuMDoyMzc5LGh0dHA6Ly8wLjAuMC4wOjQwMDEKICAgIGxpc3Rlbi1wZWVyLXVybHM6IGh0dHA6Ly8kcHJpdmF0ZV9pcHY0OjIzODAKCiAgb2VtOgogICAgYnVnLXJlcG9ydC11cmw6IGh0dHBzOi8vZ2l0aHViLmNvbS9jb3Jlb3MvYnVncy9pc3N1ZXMKICAgIGlkOiB2bXdhcmUKICAgIG5hbWU6IFZNV2FyZQogICAgdmVyc2lvbi1pZDogOS4xMC4wLXIzCg=="

sample dmesg output
[  263.059045] vmxnet3 0000:0b:00.0 ens192: NIC Link is Up 10000 Mbps
[  273.296078] vmxnet3 0000:0b:00.0 ens192: intr type 3, mode 0, 3 vectors allocated
[  273.297974] vmxnet3 0000:0b:00.0 ens192: NIC Link is Up 10000 Mbps
[  283.525365] vmxnet3 0000:0b:00.0 ens192: intr type 3, mode 0, 3 vectors allocated
[  283.527866] vmxnet3 0000:0b:00.0 ens192: NIC Link is Up 10000 Mbps
[  293.783001] vmxnet3 0000:0b:00.0 ens192: intr type 3, mode 0, 3 vectors allocated
[  293.784755] vmxnet3 0000:0b:00.0 ens192: NIC Link is Up 10000 Mbps
[  304.056872] vmxnet3 0000:0b:00.0 ens192: intr type 3, mode 0, 3 vectors allocated
[  304.059129] vmxnet3 0000:0b:00.0 ens192: NIC Link is Up 10000 Mbps
[  314.295719] vmxnet3 0000:0b:00.0 ens192: intr type 3, mode 0, 3 vectors allocated
[  314.297675] vmxnet3 0000:0b:00.0 ens192: NIC Link is Up 10000 Mbps
[  324.540258] vmxnet3 0000:0b:00.0 ens192: intr type 3, mode 0, 3 vectors allocated
[  324.542472] vmxnet3 0000:0b:00.0 ens192: NIC Link is Up 10000 Mbps
[  334.801820] vmxnet3 0000:0b:00.0 ens192: intr type 3, mode 0, 3 vectors allocated
[  334.803693] vmxnet3 0000:0b:00.0 ens192: NIC Link is Up 10000 Mbps

sample nic disconnect in vmware
[   96.544444] vmxnet3 0000:0b:00.0 ens192: NIC Link is Down
[   99.031733] vmxnet3 0000:0b:00.0 ens192: intr type 3, mode 0, 3 vectors allocated
[   99.033595] vmxnet3 0000:0b:00.0 ens192: NIC Link is Down
[   99.035318] IPv6: ADDRCONF(NETDEV_UP): ens192: link is not ready
[  109.252009] vmxnet3 0000:0b:00.0 ens192: intr type 3, mode 0, 3 vectors allocated
[  109.253969] vmxnet3 0000:0b:00.0 ens192: NIC Link is Down
[  109.261308] IPv6: ADDRCONF(NETDEV_UP): ens192: link is not ready
[  119.505295] vmxnet3 0000:0b:00.0 ens192: intr type 3, mode 0, 3 vectors allocated
[  119.506648] vmxnet3 0000:0b:00.0 ens192: NIC Link is Down
[  119.507611] IPv6: ADDRCONF(NETDEV_UP): ens192: link is not ready
[  124.462412] vmxnet3 0000:0b:00.0 ens192: NIC Link is Up 10000 Mbps
[  124.463607] IPv6: ADDRCONF(NETDEV_CHANGE): ens192: link becomes ready

Is there something i am missing in my configs that is causing a loop condition?

Brandon Philips

unread,
Apr 13, 2016, 10:19:07 PM4/13/16
to Chris Featherstone, CoreOS User

Hrm, might be a bug in networkd, vmware, or in the kernel. Hard to tell.


As a first swing can you enable networkd debugging and attach the logs? https://coreos.com/os/docs/latest/network-config-with-networkd.html#debugging-networkd


--
You received this message because you are subscribed to the Google Groups "CoreOS User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chris Featherstone

unread,
Apr 13, 2016, 11:59:31 PM4/13/16
to Brandon Philips, CoreOS User
Thank you Brandon. Ive attached the debug logs. I am unsure why line 43 shows a DHCPv4 lease being acquired. Also there are some ipv6 lines that seem strange. They repeat all over, there is a set at line 924: Gained Carrier, updating the ipv6 address and shortly after, flag change, lost carrier, removing address and then the service stops.

Below is the 00-ens192.network file that was created by my cloud config:

[Match]
Name=ens192
MACAddress=00:50:56:a3:25:ca

[Network]
DNS=100.81.225.2

[Address]

[Route]
Destination=0.0.0.0/0
Gateway=100.81.35.254
networkd.txt

Brandon Philips

unread,
Apr 14, 2016, 2:03:48 PM4/14/16
to Chris Featherstone, CoreOS User
Weird, networkd looks like it keeps restarting. Do you have any idea why? Are you doing that?

Chris Featherstone

unread,
Apr 14, 2016, 4:56:17 PM4/14/16
to Brandon Philips, CoreOS User
At the risk of looking foolish and new, I have located the issue. In my early attempts at making a cloud-config, at some point I believe I took parts of the /usr/share/oem/cloud-config.yml as a template for my own.

I believe I added this section from the /usr/share/oem cloud-config to my config when it shouldnt have been there

coreos:
  units:
    - name: etcd2.service
      command: start
    - name: fleet.service
      command: start
    - name: vmtoolsd.service
      command: start
      content: |
        [Unit]
        Description=VMware Tools Agent
        Documentation=http://open-vm-tools.sourceforge.net/
        ConditionVirtualization=vmware

        [Service]
        ExecStartPre=/usr/bin/ln -sfT /usr/share/oem/vmware-tools /etc/vmware-tools
        ExecStart=/usr/share/oem/bin/vmtoolsd
        TimeoutStopSec=5

    - name: oem-cloudinit.service
      command: restart
      runtime: yes
      content: |
        [Unit]
        Description=Cloudinit from VMware Backdoor

        [Service]
        Type=oneshot
        ExecStart=/usr/bin/coreos-cloudinit --oem=vmware

From my further learning, /usr/share/oem/cloud-config.yml seems to be a built in config that executes on first run of the machine. So having these units in my own cloud-config as well as the built in config was causing a loop. I think the sequence of events went something like:

-coreos boots, launches oem-cloudinit.service
-cloudinit reads my guestinfo vmware config
-the vmware config launches oem-cloudinit with the vmware switch again in a restart command
-cloudinit reads my guestinfo again
-etc, etc, etc

I stripped my cloud-config down to essentially just ssh keys and etcd config options and injected that into guestinfo. The looping seems to be gone. Thank you for directing me to the networkd debug. that eventually let me do a bigger journalctl entry which led me to my looping service.

Brandon Philips

unread,
Apr 14, 2016, 8:44:14 PM4/14/16
to Chris Featherstone, CoreOS User
Glad you tracked it down! Very useful lesson :)
Reply all
Reply to author
Forward
0 new messages