Support active-standby applications in Kubernetes

1,806 views
Skip to first unread message

xiaoni...@huawei.com

unread,
May 2, 2017, 11:50:52 PM5/2/17
to kubernetes-sig-apps

Hello everyone,


I’d like to check if there is any ongoing community activities to support active-standby style applications in Kubernetes. If not I’d like to file a feature request and work with community on it.


By “active-standby” applications I mean the applications that have several running instances, but only one instance is active and serving workloads. All other instances are idle (standby). Once the active instance is down, one of the standby instances will be promoted to active.


Lots of enterprise legacy applications are such applications. In fact Kubernetes scheduler and controller manager are also such applications. In a HA Kubernetes deployment they run in active-standby mode.


We are working on an internal project which involve lots of such applications. These applications share a few requirements like role assignment, role switching and traffic routing. These requirements cannot be handled by StatefulSet yet. Instead of duplicating implementation in all such applications, we’re thinking if we can extract these common parts into Kubernetes.


Regards,

Xiaoning

Brandon Philips

unread,
May 3, 2017, 1:15:33 PM5/3/17
to xiaoni...@huawei.com, kubernetes-sig-apps
Who would do the election in your thinking? Do you want some sort of active heart beating?


--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-apps" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.
To post to this group, send email to kubernete...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-apps/da12c19a-a115-4a68-90b7-5c1329817c44%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

XiaoNing Ding

unread,
May 3, 2017, 2:03:12 PM5/3/17
to Brandon Philips, kubernetes-sig-apps

Yes I’ve read the blog before. Currently these applications do the election by themselves. Typically they compete for a global lock. To ease application’s burden, I’m thinking of two possible approaches:

 

1.      A Kubernetes controller decide roles – which instance is active and which instance is standby. This information will be passed to application containers through labels or annotations.

2.      Or, we can integrate the “sidecar” containers into application pods. So applications will be resilient to control plane failures.

 

For heart-beating, maybe we can reuse existing healthiness and readiness probe. But a key part here is latency. For some internal critical applications, they want the failure detection and role switching finish within 2 seconds.  This is hard to achieve with current Kubernetes heart-beat implementation. We are also thinking some other improvements to speed up this process.

Brandon Philips

unread,
May 3, 2017, 2:17:30 PM5/3/17
to XiaoNing Ding, kubernetes-sig-apps
You may need to consider running etcd on-top of kubernetes if you need lower latency or more complicated failover. It is fairly easy to run an etcd cluster on-top of Kubernetes with the etcd Operator.

Feel free to email https://groups.google.com/forum/#!forum/etcd-dev if that is something you want to consider.


XiaoNing Ding

unread,
May 3, 2017, 2:43:10 PM5/3/17
to Brandon Philips, kubernetes-sig-apps

This is one option. Though it requires another deployment of etcd, and applications need to be aware of this etcd deployment. I’m thinking if we can make this more transparent to applications, and reduce their deployment efforts.

 

Lead election is only one of the problems. There are other problems like traffic routing:  these applications expect a single exposed service, but the traffic should always be routed to the current active instance. And also controlled update: when update such a pod collection, we should update standby instances first and then update the active one.

XiaoNing Ding

unread,
May 3, 2017, 2:52:06 PM5/3/17
to Brandon Philips, kubernetes-sig-apps

I summarized my thoughts and filed an issue to track this feature request:

 

https://github.com/kubernetes/kubernetes/issues/45300

 

 

From: XiaoNing Ding
Sent: Wednesday, May 03, 2017 11:30 AM
To: 'Brandon Philips' <brandon...@coreos.com>; kubernetes-sig-apps <kubernete...@googlegroups.com>
Subject: RE: Support active-standby applications in Kubernetes

 

This is one option. Though it requires another deployment of etcd, and applications need to be aware of this etcd deployment. I’m thinking if we can make this more transparent to applications, and reduce their deployment efforts.

 

Lead election is only one of the problems. There are other problems like traffic routing:  these applications expect a single exposed service, but the traffic should always be routed to the current active instance. And also controlled update: when update such a pod collection, we should update standby instances first and then update the active one.

 

XiaoNing Ding

unread,
May 3, 2017, 3:06:32 PM5/3/17
to Brandon Philips, Quinton Hoole, Deepak Vij (A), kubernetes-sig-apps

(Adding Quinton and Deepak explicitly from our team so they won’t miss the thread. They were involved in the discussion before.)

 

From: XiaoNing Ding
Sent: Wednesday, May 03, 2017 11:52 AM
To: 'Brandon Philips' <brandon...@coreos.com>; 'kubernetes-sig-apps' <kubernete...@googlegroups.com>
Subject: RE: Support active-standby applications in Kubernetes

 

I summarized my thoughts and filed an issue to track this feature request:

 

https://github.com/kubernetes/kubernetes/issues/45300

 

 

From: XiaoNing Ding
Sent: Wednesday, May 03, 2017 11:30 AM
To: 'Brandon Philips' <brandon...@coreos.com>; kubernetes-sig-apps <kubernete...@googlegroups.com>
Subject: RE: Support active-standby applications in Kubernetes

 

This is one option. Though it requires another deployment of etcd, and applications need to be aware of this etcd deployment. I’m thinking if we can make this more transparent to applications, and reduce their deployment efforts.

 

Lead election is only one of the problems. There are other problems like traffic routing:  these applications expect a single exposed service, but the traffic should always be routed to the current active instance. And also controlled update: when update such a pod collection, we should update standby instances first and then update the active one.

 

Brandon Philips

unread,
May 3, 2017, 4:21:08 PM5/3/17
to XiaoNing Ding, Quinton Hoole, Deepak Vij (A), kubernetes-sig-apps
replied on the bug
Reply all
Reply to author
Forward
0 new messages