How highly availability is the master zone in GKE?

267 views
Skip to first unread message

kust...@gmail.com

unread,
Jun 5, 2017, 1:29:06 PM6/5/17
to Kubernetes user discussion and Q&A
GKE creates a cluster with one master zone. Although there is an SLA of 99.5% uptime, this adds up to almost 4 hours in a month. This is not acceptable for my application in production.

Furthermore, cluster with one master zone is vulnerable to corresponding availability zone going down. How likely an availability zone can go down? I understand that my app will continue to run when master is down but it is still a big concern that I cannot scale up my service or deploy a new version of my app, etc.

What is the GKE best practice to deploy a business critical app that strives for 99.999% uptime?

Thanks for your help in advance.

Robert Bailey

unread,
Jun 6, 2017, 2:04:12 PM6/6/17
to kubernet...@googlegroups.com
On Mon, Jun 5, 2017 at 10:29 AM, <kust...@gmail.com> wrote:
GKE creates a cluster with one master zone. Although there is an SLA of 99.5% uptime, this adds up to almost 4 hours in a month. This is not acceptable for my application in production.

Furthermore, cluster with one master zone is vulnerable to corresponding availability zone going down. How likely an availability zone can go down?

I can't find any published documentation about the availability of a single zone. They can go down, as can a whole region, but both events are rare. 
 
I understand that my app will continue to run when master is down but it is still a big concern that I cannot scale up my service or deploy a new version of my app, etc.

What is the GKE best practice to deploy a business critical app that strives for 99.999% uptime?

To get to 5 nines you'll want to have multiple independent control planes (e.g. multiple clusters) across not only zones but regions as well. And then you would use a global load balancer and/or DNS records spread incoming load across service backends in your multiple clusters. 

This allows you to do staged rollouts of both your application and the cluster control plane so that you can route traffic away from problematic clusters to maintain availability. 
 

Thanks for your help in advance.

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.

Ahmet Alp Balkan

unread,
Jun 6, 2017, 10:31:26 PM6/6/17
to kubernet...@googlegroups.com
On Mon, Jun 5, 2017 at 10:29 AM, <kust...@gmail.com> wrote:
GKE creates a cluster with one master zone. Although there is an SLA of 99.5% uptime, this adds up to almost 4 hours in a month. This is not acceptable for my application in production.

I believe you recently asked this on StackOverflow as well. It looks like your best bet for higher availability for masters is to set up your own cluster with multiple masters on top of GCE with kubeadm or similar tools.
 

Furthermore, cluster with one master zone is vulnerable to corresponding availability zone going down. How likely an availability zone can go down?

I don't think you can find information on likeliness of an availability zone going down. Such an outage may impact one or multiple services, and both cases may or may not impact availability of a GKE master. Moreover, I don't think there has been natural disasters that impacted an entire zone in the history of public cloud; however that is not an indicator of likeliness of a natural disaster happening.

If you think GKE master as a Compute instance running on a zone the SLA is 99.95% (https://cloud.google.com/compute/sla) uptime (downtime is defined by period of >1 consecutive minute).
Reply all
Reply to author
Forward
0 new messages