follower reads when there's no quorum

18 views
Skip to first unread message

Andrei Matei

unread,
Jun 12, 2018, 5:40:41 PM6/12/18
to CockroachDB, Tobias Schottdorf, Spencer Kimball, Ben Darnell
Heya Tobi, Spencer,

I've said twice today to people that follower reads might allow us to perform historical reads even when a range has lost its quorum. I'm wondering though if I'm full of shit.
So the setting is whether reads as of, say, 10s behind present time would continue working (as time and the read timestamp move forward) when the respective range has lost quorum, according to the current design of the feature. I guess it wouldn't work when the leaseholder dies - cause none of the survivors can get a new lease. But would it work if the leaseholder happens to survive? I'm assuming that the leaseholder can maintain its lease because the liveness range still has quorum. But would it still be able to broadcast the timestamps under which it promises not to accept writes? Or is even that broadcast broken by the lack of quorum, given that it's intertwined with Raft heartbeating?

Thanks,

- a_m

Tobias Schottdorf

unread,
Jun 12, 2018, 7:06:09 PM6/12/18
to Andrei Matei, CockroachDB, Spencer Kimball, Ben Darnell
I don't think this will work optimally in the first version of the feature, but theoretically you have the guarantee that whenever a given timestamp was proven safe to serve, it remains safe (until the GC queue bumps the GCThreshold past the timestamp). This means that in principle, when quorum is lost, surviving followers can serve data from ~10s before the loss of quorum event, and that that can continue indefinitely. There's probably some place in the current WIP implementation where we eject that state too aggressively (especially for quiescent ranges) and even if we don't, there's still the problem of determining the "most up to date timestamp that works", for which you could imagine running a binary search if you don't have anything better (this is related to [1]). My TL;DR for this would be that yes, this feature will help that use case eventually, but I'm not sure we'll have our ducks in a row by 2.1.

--

-- Tobias

Nathan VanBenschoten

unread,
Jun 12, 2018, 7:18:52 PM6/12/18
to Tobias Schottdorf, Andrei Matei, CockroachDB, Spencer Kimball, Ben Darnell
This means that in principle, when quorum is lost, surviving followers can serve data from ~10s before the loss of quorum event, and that that can continue indefinitely.

My understanding of Andrei's question when he asked earlier today was that he's wondering if the closed timestamp will continue to rise without quorum. Since the closed timestamp is only incremented by a leaseholder and no replica can aquire the lease without quorum, I don't think this is the case in the WIP implementation if the leaseholder is lost.
 
There's probably some place in the current WIP implementation where we eject that state too aggressively (especially for quiescent ranges)

I don't think we ever revoke a closed timestamp in the current follower reads WIP, so I think a follower will always continue to serve data from ~10s before the loss of quorum event.

Tobias Schottdorf

unread,
Jun 12, 2018, 7:27:51 PM6/12/18
to Nathan VanBenschoten, Andrei Matei, CockroachDB, Spencer Kimball, Ben Darnell
> I don't think we ever revoke a closed timestamp in the current follower reads WIP, so I think a follower will always continue to serve data from ~10s before the loss of quorum event.

No, I don't think that's correct for quiescent ranges. If a range is quiescent from $beginning_of_time and then an outage happens, it won't get to follower-read anything because the liveness is dead at that point. I was too lazy to type this out in the initial comment so thanks for making me do it. We can work around that, and probably will have to, once we really want to use follower reads for recovery.
--

-- Tobias

Spencer Kimball

unread,
Jun 13, 2018, 12:29:30 PM6/13/18
to Tobias Schottdorf, Andrei Matei, Ben Darnell, CockroachDB, Nathan VanBenschoten
Follower replicas won’t even get an updated closed time stamp unless they’re non quiescent or utilized for follower reads. An obvious improvement here would be to periodically scan through the replicas and make sure we update their closed timestamps. We also probably want to persist them so they can keep serving at earlier historical timestamps, even on restart. 

I think a “read-only” mode on crdb is a good idea but would probably need a new execution pathway which prefers but doesn’t require leaseholders, and throws a new kind of error if unable to read at the requested time stamp. This would be similar to a “ReadWithinUncertaintyIntervalError”, but would do the reverse: regress the time stamp to the minimum indicated by errors until the query can be fully satisfied at the time stamp. This would then be available to the sql client to be reported (eg, “query satisfied at t=x”).
--
Spencer Kimball | Co-founder & CEO
Cockroach Labs

Andrei Matei

unread,
Jun 13, 2018, 12:50:51 PM6/13/18
to Spencer Kimball, Tobias Schottdorf, Ben Darnell, CockroachDB, Nathan VanBenschoten
I'm learning all sorts of things, but I'm still curious about my original question: does the closed timestamp advance under any circumstances when quorum is lost? So, assuming the best possible scenario (nobody's quiesced, everybody's "utilized for follower reads" (what does that mean exactly?)), a replica is capable of maintaining its (epoch-based) lease, but there's no quorum for the respective range, will the closed timestamp advance then?

Spencer Kimball

unread,
Jun 13, 2018, 1:22:42 PM6/13/18
to Andrei Matei, Ben Darnell, CockroachDB, Nathan VanBenschoten, Spencer Kimball, Tobias Schottdorf
If quorum is lost, then the closed timestamp cannot be advanced. There would always be the possibility that the lost majority are on the other side of a network partition and have written values subsequent to the minority’s last know closed timestamp. So no way to advance it.

“Utilized for follower reads” means that unless a replica is either part of a non-quiescent range and being heartbeat, or is actively serving follower reads, its closed timestamp won’t advance.

--
You received this message because you are subscribed to the Google Groups "CockroachDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cockroach-db...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cockroach-db/CAPqkKg%3D2htwucmiAJEJVZA3HL9ON7LfyAF8BHJ%2BEF_mw2pReCg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Andrei Matei

unread,
Jun 13, 2018, 1:25:19 PM6/13/18
to Spencer Kimball, Ben Darnell, CockroachDB, Nathan VanBenschoten, Spencer Kimball, Tobias Schottdorf
If quorum is lost, then the closed timestamp cannot be advanced. There would always be the possibility that the lost majority are on the other side of a network partition and have written values subsequent to the minority’s last know closed timestamp. So no way to advance it.

Well, I'm referring to replicas that can still talk to the leaseholder (which maintains its lease cause the liveness range has quorum). So there's no writes going on on the other side of a partition. 

Spencer Kimball

unread,
Jun 13, 2018, 1:45:32 PM6/13/18
to Andrei Matei, Ben Darnell, CockroachDB, Nathan VanBenschoten, Spencer Kimball, Tobias Schottdorf
In that case you could advance but that’s a case where you’re pretty likely to have quorum. At least with r=3. To really make this kind of thing work would be like squeezing blood from a rock: difficult and don’t expect much blood.

With current implementation the leaseholder still sends updates even when quorum is lost for any individual range. And followers will still advance closed timestamps; having quorum is orthogonal to that mechanism.

Assuming the node liveness range has quorum, this could go on indefinitely. But if node liveness range loses quorum, the leases will expire and closed timestamps won’t move forward any longer. 

On Wed, Jun 13, 2018 at 1:25 PM Andrei Matei <and...@cockroachlabs.com> wrote:
If quorum is lost, then the closed timestamp cannot be advanced. There would always be the possibility that the lost majority are on the other side of a network partition and have written values subsequent to the minority’s last know closed timestamp. So no way to advance it.

Well, I'm referring to replicas that can still talk to the leaseholder (which maintains its lease cause the liveness range has quorum). So there's no writes going on on the other side of a partition. 
--
Reply all
Reply to author
Forward
0 new messages