Lost Network Connectivity

483 views
Skip to first unread message

Bill Richards

unread,
Jul 5, 2018, 4:59:13 PM7/5/18
to gce-discussion
My google cloud instance (10.128.0.3) lost network connectivity somewhere just after 0400 this AM. I am running Centos 6.10) The network interfaces are up and have IP addresses. Unable to ping default gateway (10.128.0.1). Firewall rules (google and local) have not been changed/modified. This instance has been online for several years with no recent changes made. Any suggestions would be helpful and appreciated.

David Siuta

unread,
Jul 5, 2018, 7:00:08 PM7/5/18
to gce-discussion
Hi Bill,

I think I am having the same problems you are.  I have noticed that my machine recently upgraded from CentOS 6.9 to 6.10 shortly after 4am on July 4 via the yum nightly update. A bunch of other packages were updated too. The next time my machine rebooted, which happened to be this morning, I could not reconnect with it.

I haven't come up with a solution yet, but I was able to see that this upgrade occurred by sending the "yum history" command through a startup script in the cloud console and subsequently reading the serial console output. I've been able to reproduce this error from a stored snapshot that works with CentOS 6.9, but fails as soon as the 6.10 update occurs and the instance is restarted.

One thing that I see in my serial console output are "martian source" errors, which start appearing as soon as I try to ssh to the machine after the update.  In case any solutions appear in my group message, the link is: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/gce-discussion/jRM0EbpXYPY/L5hFiU3SAgAJ

-David

Justin Reiners

unread,
Jul 5, 2018, 7:07:34 PM7/5/18
to David Siuta, gce-discussion
This is strange. It's not happening anywhere in us-central1-a for me. I also run nothing but centos hosts. I do have an instance in west-1 not having this issue.


I've never had yum run but itself, do you have a nightly Cron? 

--
© 2018 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-dis...@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.
---
You received this message because you are subscribed to the Google Groups "gce-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gce-discussio...@googlegroups.com.
To post to this group, send email to gce-dis...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gce-discussion/ffa53cc3-54b2-4415-ba8c-b3e56d539926%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bill Richards

unread,
Jul 5, 2018, 11:39:30 PM7/5/18
to gce-discussion
Yes, there is a nightly cron that runs and performs updates.  The problem appears to be Centos 6.10 and last week the server was at 6.9.  I've posted a message to the Centos forums for help.

Bill Richards

unread,
Jul 5, 2018, 11:39:30 PM7/5/18
to gce-discussion
I apparently am not the only who had trouble with an auto-upgrade from 6.9 to 6.10 as I heard someone with the same issue.  Thus it seems like the upgrade is what broke the servers.  I've posted to the Centos forums in hopes of getting some suggestions to fix the issue.

Bill

David Siuta

unread,
Jul 6, 2018, 2:37:46 PM7/6/18
to gce-discussion
Hello Bill,

I think I've been able to trace down the problem.  It seems like after the yum updates to 6.10, there is a problem with the "Kernel IP routing table" where there is no default gateway.  Once I was able to get on to my serial port, I ran the "route" command on my affected machine and saw:

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.142.0.1      *               255.255.255.255 UH    0      0        0 eth0
link-local      *               255.255.0.0     U     1002   0        0 eth0

I compared that to a pre-upgraded snapshot still running CentOS 6.9, which showed a default gateway was present:

[root@instance-2 sysconfig]# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.128.0.1      *               255.255.255.255 UH    0      0        0 eth0
link-local      *               255.255.0.0     U     1002   0        0 eth0
default         10.128.0.1      0.0.0.0         UG    0      0        0 eth0

So to fix the error, I added a default gateway to the upgraded machine:

[david@instance-1 network-scripts]$ sudo route add default gw 10.142.0.1 eth0
[sudo] password for david: 
[david@instance-1 network-scripts]$ route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.142.0.1      *               255.255.255.255 UH    0      0        0 eth0
link-local      *               255.255.0.0     U     1002   0        0 eth0
default         10.142.0.1      0.0.0.0         UG    0      0        0 eth0

After this, I was able to connect to my instance as normal.  I hope this helps with your situation too!

Bill Richards

unread,
Jul 6, 2018, 3:21:21 PM7/6/18
to gce-discussion
David 

Yep!  That was it.  I looked at the routing and saw my gateway listed but didn't notice the missing "default" route.  Good catch!  My server is back  up!


On Thursday, July 5, 2018 at 3:59:13 PM UTC-5, Bill Richards wrote:

Fady (Google Cloud Platform)

unread,
Jul 6, 2018, 7:01:22 PM7/6/18
to gce-dis...@googlegroups.com

This is a known issue when updating to kernel 2.6.32-754 that is affecting both Red Hat, and CentOS images, and seems related to this DHCP update. The Compute Engine team are already aware of this issue.


Meanwhile, and in addition to the great suggestions above, you may also use a startup script ( add the default gateway IP address) to fix this issue, and then restart your instance.


#!/bin/bash
route add
default gw [default_gateway_ip] eth0


For further information/updates about this issue, you may check this issue tracker link.



Reply all
Reply to author
Forward
0 new messages