Internal network connection between VMs on same network frequently drops out for extended periods

579 views
Skip to first unread message

Poo Bah

unread,
Dec 9, 2016, 3:44:19 AM12/9/16
to gce-discussion
Hi,

I have a simple project setup with a few VMs in the same region:
* A VM running standard debian-8 created in August this year
* A VM running standard Ubuntu 16.04 created early this month (Dec)

Both machines are on the same internal network, with 10.128.0.* internal addresses. Both also have public static ip addresses, but this is irrelevant to the problem. Neither machine are doing anything CPU / memory / network intensive - they are pretty much idle.

I find that for a period of time I can ping and otherwise talk to one machine from the other. However, for minutes or occasionally hours, they can't see each other - ping and any other comms between both machines fail, however they are both otherwise fully functioning and reachable from the internet.

The only way to get them talking again - albeit temporarily - is to bounce the Ubuntu VM. Bouncing the Debian VM seems ot make no difference. Restarting the network service (/etc/init.d/network restart) also seems to make no difference.

Some time ago when this first happened, I did some packet sniffing and saw that the destination machine was receiving the ping request and sending the acknowledgement, but it never made it to the source machine.

Has anyone experienced this or have any suggestion? I wanted to launch my site before christmas, but don't feel confident in the Google Cloud platform to do so. Was also going to use Cloud SQL with my app, but again, am worried the DB connection will drop out too.

Thanks,
Craig

George (Google Cloud Support)

unread,
Dec 9, 2016, 11:19:26 AM12/9/16
to gce-discussion
Hello Craig,

Note that idle TCP connections are disconnected after 10 minutes. If your instance initiates or accepts long-lived connections with an external host, you can adjust TCP keep-alive settings to prevent these timeouts from dropping connections. You can configure the keep-alive settings on the Compute Engine instance, your external client, or both, depending on the host that typically initiates the connection. You should set the keep-alives to less than 600 seconds to ensure that connections are refreshed before the timeout occurs.

You can adjust the TCP Keep Alive on your VM instances using the following command:

sudo /sbin/sysctl -w net.ipv4.tcp_keepalive_time=60 net.ipv4.tcp_keepalive_intvl=60 net.ipv4.tcp_keepalive_probes=5

Note that applications running on Linux systems don't enable keep-alive by default. Thus server or client need to explicitly set the SO_KEEPALIVE socket option when opening TCP connections (see also Linux TCP Keepalive HOWTO).

On a side note, during the term of the Google Cloud Platform License Agreement, Google Cloud Storage, Google Prediction API, Google BigQuery Service, Google Cloud SQL and Google Compute Engine License Agreement, or Google Cloud Platform Reseller Agreement (as applicable, the "Agreement"), the Covered Service will provide a Monthly Uptime Percentage to Customer of at least 99.95% (the "Service Level Objective" or "SLO". More information about the Google Cloud Platform Service Level Agreements can be found in this Help Center article.

I hops this helps.

Sincerely,
George
Reply all
Reply to author
Forward
0 new messages