From: Rajiv Kurian Sent: Saturday, December 14, 2013 12:37 AM Subject: How do applications commonly reverse transitions on the follower state machine? |
It's a little late, so bear with me :)1. Log entries are only applied to the state machine once they're committed. Committed entries are never rolled back.
2. I think there are two questions here:"How do clients discover the cluster leader?""Should reads require consensus?"
2a. This is implementation dependent. You could have the client pick and send a request to an arbitrary server. If the destination is the leader the request is processed; if not, it is rejected and the location of the leader returned. Alternatively, you could have another directory service that stores the leader location. Or, you could have follower servers forward requests to the leader. There are probably more options that would make sense for particular use cases.
2b. I believe the paper covers this. If you absolutely need the latest system state a read could be implemented as a state machine command, which would require a log entry to be committed before a result is returned. Alternatively, reads could be serviced locally. In that case yes, there's a potential for the client to receive stale data. But through the use of timeouts you can place a bound on how old that data is.
For example, if a follower services a read request the data is potentially election_timeout time units old. I think things are a little more complicated if you somehow manage to talk to an old leader that doesn't realize that another leader has been elected. You'd probably have to implement some sort of leader lease mechanism to deal with that. (I think that's right...)
On Sat, Dec 14, 2013 at 10:56 AM, Rajiv Kurian <geet...@gmail.com> wrote:
> On Saturday, December 14, 2013 3:09:54 AM UTC-8, Allen George wrote:
>>
>> 1. Log entries are only applied to the state machine once they're committed. Committed entries are never rolled back.
>
> Aah this makes sense. Am I correct in saying that every commit is then a 2PC between the leader and any majority of followers?
I wouldn't call it 2PC to avoid confusion, since two-phase commit
refers to a different protocol with different behavior (see that
Wikipedia page).
>> 2b. I believe the paper covers this. If you absolutely need the latest system state a read could be implemented as a state machine command, which would require a log entry to be committed before a result is returned. Alternatively, reads could be serviced locally. In that case yes, there's a potential for the client to receive stale data. But through the use of timeouts you can place a bound on how old that data is.
>
>
>>
>> For example, if a follower services a read request the data is potentially election_timeout time units old. I think things are a little more complicated if you somehow manage to talk to an old leader that doesn't realize that another leader has been elected. You'd probably have to implement some sort of leader lease mechanism to deal with that. (I think that's right...)
>
> Yeah I was worried about the case where the client connects to an old leader who doesn't know about the existence of a new leader. I'll have to look up what leader lease is.
To guarantee the leader has the latest system state, it needs to make
sure it's still leader. Without relying on clocks, this requires
communication with a majority of the cluster, but it doesn't require a
new log entry to be committed (so there's no need for any disk
writes). LogCabin implements this so that leaders never return stale
information on reads.