Greetings,
I have a use case where I would like to have a highly-available singleton service: exactly one host can be running said service (we'll call it singleserv). If singleserv is terminated on that host, or the host dies, another host must take over responsibility for running singleserv.
I'm currently using Redis to elect the managing host (using the Redis-Semaphore Ruby Gem), but Consul looks like an interesting alternative.
I read the documentation and performed a few simulations, but what's escaping me is how to manage session IDs: If the node that holds the lock associated with the session is deemed failed, after the lock-delay expires, the session is destroyed. This phenomenon is not mentioned in the Leader Election guide, which seems to suggest that this is not a problem:
Watching for changes is done by doing a blocking query against key. If we ever notice that the Session of the key is blank, then there is no leader, and we should retry acquiring the lock.
But you can't acquire a lock associated with a destroyed session!
To acquire the lock, all candidate nodes must know its Session ID. When we're first setting up the candidate nodes for singleserv, we can generate a new session ID manually and communicate the ID to them out of band. That's a bit annoying but workable.
But if that session is destroyed due to a node failure, to recover singleserv, manual intervention is required: someone must generate a new session ID and communicate the new Session ID to the remaining candidate nodes so that they can successfully perform an election. In the meantime, singleserv is down, which is obviously bad for service availability.
What are your recommendations?
Thanks,
--Michael