As you hint, I believe our linearizable mode is broken. Our linearizable mode was never really a thing; you could enable commit waits with the COCKROACH_EXPERIMENTAL_LINEARIZABLE environment variable, but I don't think anybody's ever used that. COCKROACH_EXPERIMENTAL_LINEARIZABLE does not give you linearizability. It does, however, help to properly order writes that don't overlap from the client's perspective.
Namely, the following execution is possible generally in CRDB, but not with COCKROACH_EXPERIMENTAL_LINEARIZABLE
- r1 starts a read
- w1 writes row A; the write completes
- w2 writes row B; the write completes
- r1 retrieves B and not A
However, the following execution is possible with COCKROACH_EXPERIMENTAL_LINEARIZABLE, even though it's not linearizable:
- r1 starts a read
- w1 writes row A; the write doesn't complete yet (lingers in commit-wait)
- r2 reads A; completes
- w2 writes row B; the write completes
- w1 completes
- r1 retrieves B and not A
This execution is consistent with the serializable schedule w2, r1, w1, r2. This schedule is not linearizable - linearizability wants r2<w2, as they don't overlap.
As
@Nathan VanBenschoten points out, the Spanner paper says:
"
Before allowing any coordinator replica to apply the commit record, the coordinator leader waits until TT.after(s), so as to obey the commit-wait rule described in Section 4.1.2. Because the coordinator leader chose s based on TT.now().latest, and now waits until that timestamp is guaranteed to be in the past, the expected wait is at least 2 ∗ . This wait is typically overlapped with Paxos communication. After commit wait, the coordinator sends the commit timestamp to the client and all other participant leaders. Each participant leader logs the transaction’s outcome through Paxos. All participants apply at the same timestamp and then release locks.
"
So that sounds more similar to what you're saying - holding locks during the commit wait.
In the upcoming 21.1 release, we have a thing called "global tables", on which readers and writers interact with a different protocol than on normal tables, making readers generally able to use any replica (not just the current leaseholder). This protocol introduces a lot more waiting in the face of overlapping readers and writers, which I think brings us a lot closer to linearizability (for those tables). But I'm not sure if all the way.
- Andrei
--
You received this message because you are subscribed to the Google Groups "CockroachDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cockroach-db...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cockroach-db/CAOk%2BzffviLT2JhN%3DO-pkfgwOF37-JmxgBAr5P6uVfKFns3%3DPvA%40mail.gmail.com.