Instance Group Autoscale size turns to 0 automatically

354 views
Skip to first unread message

Augusta Tech Support

unread,
May 14, 2018, 9:06:51 AM5/14/18
to gce-discussion
Hi Team,

  I've created an instance groups with managed instances and the autoscale size is minimum 1 and maximum is 5. But my instance group went to 0 automatically and returns to 1 in another few minutes. Kindly help me on this. 


   FYI, there was no CPU load. Please find the attachment for the details. 
instance-grp.PNG

Fady (Google Cloud Platform)

unread,
May 14, 2018, 4:08:45 PM5/14/18
to gce-dis...@googlegroups.com

It seems that the instance group manager is re-creating the instance for you. This may happen in two scenarios. Either when the initial creation of the instance fails per this document, or when the health check fails (auto-healing). Looking at the graph, it seems the latter is causing a recreation of the instance, as per this document, “Health checks that you apply to managed instance groups will proactively signal to the managed instance group to delete and recreate instances if they become UNHEALTHY.”  You may check if this is the case by checking the operations log as explained in this old thread. (Same Log message, but status could be different)


This does not necessarily mean that the instance itself is not healthy, but rather a configuration of the health check. For example, if you created an "http" health check, and you do not have a web server returning “valid HTTP response with code 200, (that) close the connection normally within the configured period.”, several probes will fail and the instance will be marked unhealthy". You may check this document for further explanation about configuring health checks. I hope this helps.


Augusta Tech Support

unread,
May 15, 2018, 6:18:47 AM5/15/18
to Fady (Google Cloud Platform), gce-discussion
Hello Fady,

  I suspect scenario2 in my case, also I've configured Auto-heal health checks by checking the port 80. I don't think so the port went down or not reachable as I've enabled Nagios monitoring to my Compute Engine's port 80 and I didn't receive any alert related to port 80 down on the instance. 

  I've checked in the Stackdriver Logs for health checks, but couldn't find any logs. Please let me know where can I find the logs and any troubleshooting steps to resolve this issue.  Thanks.


On Tue, May 15, 2018 at 1:38 AM, 'Fady (Google Cloud Platform)' via gce-discussion <gce-dis...@googlegroups.com> wrote:

It seems that the instance group manager is re-creating the instance for you. This may happen in two scenarios. Either when the initial creation of the instance fails per this document, or when the health check fails (auto-healing). Looking at the graph, it seems the latter is causing a recreation of the instance, as per this document, “Health checks that you apply to managed instance groups will proactively signal to the managed instance group to delete and recreate instances if they become UNHEALTHY.”  You may check if this is the case by checking the operations log as explained in this old thread. (could be a different operation log message)


This does not necessarily mean that the instance itself is not healthy, but rather a configuration of the health check. For example, if you created an "http" health check, and you do not have a web server returning “valid HTTP response with code 200, (that) close the connection normally within the configured period.”, several probes will fail and the instance will be marked unhealthy". You may check this document for further explanation about configuring health checks. I hope this helps.


--
© 2018 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-discussion@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.
---
You received this message because you are subscribed to the Google Groups "gce-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gce-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to gce-discussion@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gce-discussion/ea54784a-01c6-4b87-add6-3b9dccd840a3%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Fady (Google Cloud Platform)

unread,
May 15, 2018, 3:10:54 PM5/15/18
to gce-discussion

As mentioned in this thread, you should have a message in the operation’s log “Recreate an instance” with operation type: “compute.instances.repair.recreateInstance.” The same message should be logged in stackdriver logs under ““GCE VM instance”. However, You may get more information in the “Status message” in operations log. (status message differ, and not operation log message above)


As to verify the health check configuration, you may check this document. However, if using an http health check, it does not check if port 80 is open, but rather “the instance must return a valid HTTP response with code 200 and close the connection normally within the configured period”. Therefore, depending on your web server, you can adjust the check-interval, unhealthy-threshold, etc. But, the most common http health check failure is due to the timeout period. I hope this helps.
Reply all
Reply to author
Forward
0 new messages