Health check log

856 views
Skip to first unread message

Pavel Madr

unread,
Oct 18, 2017, 4:14:11 AM10/18/17
to gce-discussion
Hi,

yesterday we have an issue with managed instance group.
Within 5 hours we had 22 restarts of VM probably cause by failing health check. It's strange that it was always the VM form us-central1-b zone. Other machines in different zones was OK for whole period.

Is there any option to see all heath check requests in GCE Logs Viewer or somewhere else? So we could analyze the reason of restarts.

Best Regards,
Pavel

Karthick (Cloud Platform Support)

unread,
Oct 18, 2017, 3:24:52 PM10/18/17
to gce-discussion
Health check failing does not restart the instance. If Autoscaling is enabled; depending upon the configuration it will create instances. I have verified that there were no known issues reported on us-central1-b zone causing restart to its VM instances yesterday.

Unfortunately, at the moment GCP does not have an option to view health check request to the instances. 

If you can provide me with your project ID and instance name through private message, I’ll try to investigate the issue further to find the root cause of the reboot.

Ron Skantz

unread,
Oct 19, 2017, 2:09:29 PM10/19/17
to gce-dis...@googlegroups.com

I’m not sure if this is what you are looking for or not.  But I contacted GCE support to see if I can know when GCE reboots one of my Instances.

I was able to do that using Stackdriver (for Free).

 

In Stackdriver Logging:

 

 

Then I created an alert from this user-defined metric in Stackdriver:

 

 

 

 

 

 


PRIVILEGED AND CONFIDENTIAL: This electronic message and any attachments are confidential property of the sender. The information is intended only for the use of the person to whom it was addressed. Any other interception, copying, accessing, or disclosure of this message is prohibited. The sender takes no responsibility for any unauthorized reliance on this message. If you have received this message in error, please immediately notify the sender and purge the message you received. Do not forward this message without permission.



From: 'Karthick (Cloud Platform Support)' via gce-discussion [mailto:gce-dis...@googlegroups.com]
Sent: Wednesday, October 18, 2017 3:25 PM
To: gce-discussion <gce-dis...@googlegroups.com>
Subject: [gce-discussion] Re: Health check log

 

EXTERNAL

--
© 2017 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-dis...@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.
---
You received this message because you are subscribed to the Google Groups "gce-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gce-discussio...@googlegroups.com.
To post to this group, send email to gce-dis...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gce-discussion/36e3fb1b-3536-420f-baa6-de0b0f1e8140%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Karthick (Cloud Platform Support)

unread,
Oct 19, 2017, 3:58:12 PM10/19/17
to gce-discussion
I verified the logs. It is a failing Autohealer health check recreated the instance. Less timeout on health checks would have affected the behaviour. I am continuing to investigate and will let you know my findings. 

Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-discussion@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.


---
You received this message because you are subscribed to the Google Groups "gce-discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to gce-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to gce-discussion@googlegroups.com.

Pavel Madr

unread,
Oct 20, 2017, 7:41:38 AM10/20/17
to Ron Skantz, gce-dis...@googlegroups.com
Hello Ron,

I'll try alerting. Thank you.

Best Regards,
Pavel


Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-discussion@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.
---

You received this message because you are subscribed to the Google Groups "gce-discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to gce-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to gce-discussion@googlegroups.com.

--
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-discussion@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.
---
You received this message because you are subscribed to a topic in the Google Groups "gce-discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gce-discussion/M8TmlQXmCkk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gce-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to gce-discussion@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gce-discussion/SN2PR14MB0926746CA19DD3354541F22C9B420%40SN2PR14MB0926.namprd14.prod.outlook.com.

Karthick (Cloud Platform Support)

unread,
Oct 25, 2017, 5:00:42 PM10/25/17
to gce-discussion
Hello Pavel,

Does the health check on port 1717 cause any DB requests?  This is to confirm that if the health check causes a DB query, there could be slightly more latency for us-central1-b, since the Cloud SQL instance is not co-located.

Can you install the stackdriver logging agent to get request logs from the health check?

Pavel Madr

unread,
Oct 27, 2017, 3:30:24 AM10/27/17
to Karthick (Cloud Platform Support), gce-discussion
Hello Karthick,

The health check doesn't use any DB requests. It's just accessing UWSGI stats. It should be a very fast response.
I can't include gcloud logging into this request due to the direct communication "health check->UWSGI".

Here is our UWSGI stats config:

[uwsgi]                                                                                                                     
stats = :1717
stats-http = true
stats-min = true
stats-no-cores = true
stats-no-metrics = true

Regards,
Pavel



To post to this group, send email to gce-dis...@googlegroups.com.

--
© 2017 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-discussion@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.
---
You received this message because you are subscribed to a topic in the Google Groups "gce-discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gce-discussion/M8TmlQXmCkk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gce-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to gce-dis...@googlegroups.com.

--
© 2017 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-discussion@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.
---
You received this message because you are subscribed to a topic in the Google Groups "gce-discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gce-discussion/M8TmlQXmCkk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gce-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to gce-discussion@googlegroups.com.

Karthick (Cloud Platform Support)

unread,
Oct 30, 2017, 9:57:31 AM10/30/17
to gce-discussion

Pavel,


Installing the Stackdriver Logging Agent will make the nginx access logs available in the cloud console. There is no need to change the application. This helps us to get a view of what is happening on the backend, as we believe it is the instance that failed to respond to the health check in time.



On Friday, October 27, 2017 at 3:30:24 AM UTC-4, Pavel Madr wrote:
Hello Karthick,

The health check doesn't use any DB requests. It's just accessing UWSGI stats. It should be a very fast response.
I can't include gcloud logging into this request due to the direct communication "health check->UWSGI".

Here is our UWSGI stats config:

[uwsgi]                                                                                                                     
stats = :1717
stats-http = true
stats-min = true
stats-no-cores = true
stats-no-metrics = true

Regards,
Pavel


To post to this group, send email to gce-discussion@googlegroups.com.

--
© 2017 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-discussion@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.
---
You received this message because you are subscribed to a topic in the Google Groups "gce-discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gce-discussion/M8TmlQXmCkk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gce-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to gce-discussion@googlegroups.com.

Pavel Madr

unread,
Nov 1, 2017, 4:34:54 AM11/1/17
to gce-discussion
Hello Karthick,

Our health check is not going through nginx. It goes directly to the UWSGI port.

It would be great to see health check requests made by google servers in Logging section (like e.g. for HTTP load balancer) so we could inspect system logs at the time when health checks failed.

Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-dis...@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.
---

You received this message because you are subscribed to the Google Groups "gce-discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to gce-discussio...@googlegroups.com.
To post to this group, send email to gce-di...@googlegroups.com.

Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-dis...@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.

---
You received this message because you are subscribed to a topic in the Google Groups "gce-discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gce-discussion/M8TmlQXmCkk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gce-discussio...@googlegroups.com.
To post to this group, send email to gce-dis...@googlegroups.com.
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-dis...@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.

---
You received this message because you are subscribed to a topic in the Google Groups "gce-discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gce-discussion/M8TmlQXmCkk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gce-discussio...@googlegroups.com.
To post to this group, send email to gce-dis...@googlegroups.com.

Pavel Madr

unread,
Nov 2, 2017, 8:08:44 PM11/2/17
to gce-discussion
Hello Karthick,

A few hours ago the same situation happened. VM was recreated 17 times and again only in zone us-central1-b.
It can't be coincidence because it's always zone us-central1-b. All other 2 zones were ok.
Can't it be some hardware problem in that zone? Or some network issue between machines with health checks and us-central1-b zone?

With Best Regards,
Pavel

Karthick (Cloud Platform Support)

unread,
Nov 3, 2017, 3:30:37 PM11/3/17
to gce-discussion
Hello Pavel,

I have shared this information with the engineering team. I will update you at once I have any information. 

Karthick (Cloud Platform Support)

unread,
Nov 8, 2017, 9:21:14 AM11/8/17
to gce-discussion
Hello Pavel,

Were there any recurrences of the issue since around 2017-11-07 08:00 PDT? While the investigation is still underway into the root cause relating to a some unhealthy load balancers in us-central1-b, if you can confirm, so this may be the likely cause. 
Reply all
Reply to author
Forward
0 new messages