watch for leader change

400 views
Skip to first unread message

Joseph Swaminathan

unread,
Sep 18, 2015, 2:48:02 PM9/18/15
to etcd-dev
Is there a way I can watch if the leader changes. I tried the following but it is not helping.

etcdctl  watch /stats --recursive

Using {"releaseVersion":"2.0.4","internalVersion":"2"}


Yicheng Qin

unread,
Sep 18, 2015, 5:20:15 PM9/18/15
to Joseph Swaminathan, etcd-dev
No way to do it.

We think users have no need to know about leader election because this is an underlying implementation details. Any reasons for you to do that?

--
You received this message because you are subscribed to the Google Groups "etcd-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to etcd-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Joseph Swaminathan

unread,
Sep 19, 2015, 9:40:17 PM9/19/15
to etcd-dev
I am building a distributed application using etcd. Application needs to elect a leader as certain things only one of the distributed app instance should do. I have two choices

a) Implement an app level election process
b) piggy back on the etcd election process

When there is already an election process in the underlying DB, piggy backing makes much sense to me. Also since write is always serialized through etcd master it provides even more optimization regarding writes to align etcd master and app instance master. Ofcourse app  can do a simple poll on stats and see if id == leader. But a notification will simplify this a lot and will be instantaneous.

thanks
Joseph

Seán C. McCord

unread,
Sep 20, 2015, 9:00:07 AM9/20/15
to Joseph Swaminathan, etcd-dev
etcd's (and similar systems) leader election is needfully complicated.  However, you can _leverage_ etcd to much more simply implement your own leader/master system.  For most cases, a simple CAS (check-and-set) is sufficient. 

An example from one of my systems:
1) Each node of a certain classification runs "redis"
2) Each node which runs "redis" runs "redisd" (a sidekick process)
3) "redisd" watches an etcd key (i.e. "/redis/master")
4) If/when "/redis/master" is not defined, "redisd" will CAS it to itself
5) If "/redis/master" changes, "redisd" will set its associated "redis" instance with the correct new master

In this situation, etcd performs all of the cluster-level synchonization required.  You are guaranteed that only one request for master will succeed.

Of course, if you are running fleet, there is an even easier way:  just tell fleet to run one instance of your "master" service.  Again, etcd will be doing the heavy lifting, and fleet will handle the "one and only one" part.  You can still have sidekick processes for announcements (or simple ExecStartPost statements in your primary units).


--
You received this message because you are subscribed to the Google Groups "etcd-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to etcd-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Seán C McCord
CyCore Systems, Inc

Joseph Swaminathan

unread,
Sep 20, 2015, 10:53:42 AM9/20/15
to etcd-dev, joeswam...@gmail.com
Hi Sean,

     Thanks for the solution. I have a question on this proposal. What if the redis master instance dies (say, the node went offline) how does a new master gets elected.

     I was going along this lines initially with a timestamp or increment a counter. But this still requires all the nodes to read periodically and compete when timestamp/counter becomes stale. This still involves periodic writing, polling and setting. By polling the /watch I avoid periodic writings and one CAS.

thanks
Joseph

Seán C. McCord

unread,
Sep 20, 2015, 11:46:55 AM9/20/15
to Joseph Swaminathan, etcd-dev
I did forget to point out a critical element:  the master key has a TTL (time to live or expiration), and "redisd" continually updates it while the master is alive.  

Should the master fail, that key will disappear, and the other "redisd" instances will get notification (because they are watching that key).  All of these (TTL,CAS, and watch) are handled by the etcd client API, so all you have to do is glue it together.

You don't need to "read or write periodically"; just watch the key.  When a change occurs, each instance will CAS.  If CAS succeeds, you are the master.  If it fails, read the key and get the master.

Joseph Swaminathan

unread,
Sep 20, 2015, 11:54:58 AM9/20/15
to etcd-dev, joeswam...@gmail.com
 This method still depends on periodic writes. But I agree use of TTL is much better way than using a timestamp / counter.

thanks
Joseph

Seán C. McCord

unread,
Sep 20, 2015, 11:56:57 AM9/20/15
to Joseph Swaminathan, etcd-dev
It requires periodic writes only in the sense of updating the TTL, which is a single periodic write for the system; each node does _not_ perform a periodic write.  I don't think that should be load-intensive.

Joseph Swaminathan

unread,
Sep 20, 2015, 12:31:24 PM9/20/15
to etcd-dev, joeswam...@gmail.com
When a TTL is updated doesn't the log gets propagated to all nodes (at-least the member nodes) and the TTL updated on all nodes ?  In essence its a modification of the key isn't it. 

Nevertheless this is a better solution as it avoids any polling and all other nodes will be notified asynchronously.  Thank you very much for this solution. I will use this.

I still think, if I am building an distributed application, if there is only one cluster management it is much better, rather than each layer having to come up with an independent mechanism. Ability to watch leader change would help here. But I understand one use case may not justify the cause, and there might be reasons not to expose it (to avoid any dependencies which will need to be sustained and could be an impediment in future changes in etcd).
 

Joseph

Seán C. McCord

unread,
Sep 20, 2015, 1:42:29 PM9/20/15
to Joseph Swaminathan, etcd-dev
I would also consider that, in general, the master of any particular service should not be tied to the etcd master.  There are a number of reasons for this, but the two main ones for my consideration are:
a) Tying too much "master" load onto one particular node is counter-productive.
b) The range of possible etcd master nodes may be smaller than the desired range of service "master" nodes.  In particular, if you scale to having a dedicated etcd cluster, tying a service "master" to the etcd master would be disallowed.

Joseph Swaminathan

unread,
Sep 20, 2015, 2:15:36 PM9/20/15
to Seán C. McCord, etcd-dev
Good Points. Thanks Sean. Much appreciate it.

thanks
Joseph
Reply all
Reply to author
Forward
0 new messages