Commit wait and lineariazable configuration

25 views
Skip to first unread message

Unmesh Joshi

unread,
Apr 9, 2021, 12:53:47 PM4/9/21
to cockro...@googlegroups.com
Hi,

I was going through the commit wait implementation in cockroachdb which happens if the linearizable flag is set in the configuration.
I see that the commit wait happens after the EndTransaction request is processed and all the write intents are resolved.
So the wait / sleep in the TxnCoordSender affects only the client which originated the transaction, but because the write intents are already resolved at the transaction timestamp, any other client trying to read, will be able to read the values at the transaction timestamp, event before the maybeSleepForLinearizable returns?
I was thinking that the maybeSleepForLinearizable should happen before the write intents are resolved, so that all the clocks in the system are guaranteed to be past the transaction timestamp once the transactional writes are available in the MVCC storage. I am sure I am missing something obvious here.

Thanks,
Unmesh

Andrei Matei

unread,
Apr 9, 2021, 3:51:46 PM4/9/21
to Unmesh Joshi, Nathan VanBenschoten, CockroachDB
I was going through the commit wait implementation in cockroachdb which happens if the linearizable flag is set in the configuration.
I see that the commit wait happens after the EndTransaction request is processed and all the write intents are resolved.
So the wait / sleep in the TxnCoordSender affects only the client which originated the transaction, but because the write intents are already resolved at the transaction timestamp, any other client trying to read, will be able to read the values at the transaction timestamp, event before the maybeSleepForLinearizable returns?

That's right.
 
I was thinking that the maybeSleepForLinearizable should happen before the write intents are resolved, so that all the clocks in the system are guaranteed to be past the transaction timestamp once the transactional writes are available in the MVCC storage. I am sure I am missing something obvious here.

As you hint, I believe our linearizable mode is broken. Our linearizable mode was never really a thing; you could enable commit waits with the COCKROACH_EXPERIMENTAL_LINEARIZABLE environment variable, but I don't think anybody's ever used that. COCKROACH_EXPERIMENTAL_LINEARIZABLE does not give you linearizability. It does, however, help to properly order writes that don't overlap from the client's perspective.
Namely, the following execution is possible generally in CRDB, but not with COCKROACH_EXPERIMENTAL_LINEARIZABLE
- r1 starts a read
- w1 writes row A; the write completes
- w2 writes row B; the write completes
- r1 retrieves B and not A

However, the following execution is possible with COCKROACH_EXPERIMENTAL_LINEARIZABLE, even though it's not linearizable:
- r1 starts a read
- w1 writes row A; the write doesn't complete yet (lingers in commit-wait)
- r2 reads A; completes
- w2 writes row B; the write completes
- w1 completes
- r1 retrieves B and not A

This execution is consistent with the serializable schedule w2, r1, w1, r2. This schedule is not linearizable - linearizability wants r2<w2, as they don't overlap.

As @Nathan VanBenschoten points out, the Spanner paper says:
"
Before allowing any coordinator replica to apply the commit record, the coordinator leader waits until TT.after(s), so as to obey the commit-wait rule described in Section 4.1.2. Because the coordinator leader chose s based on TT.now().latest, and now waits until that timestamp is guaranteed to be in the past, the expected wait is at least 2 ∗ . This wait is typically overlapped with Paxos communication. After commit wait, the coordinator sends the commit timestamp to the client and all other participant leaders. Each participant leader logs the transaction’s outcome through Paxos. All participants apply at the same timestamp and then release locks.
"
So that sounds more similar to what you're saying - holding locks during the commit wait.

In the upcoming 21.1 release, we have a thing called "global tables", on which readers and writers interact with a different protocol than on normal tables, making readers generally able to use any replica (not just the current leaseholder). This protocol introduces a lot more waiting in the face of overlapping readers and writers, which I think brings us a lot closer to linearizability (for those tables). But I'm not sure if all the way.

- Andrei




 

Thanks,
Unmesh

--
You received this message because you are subscribed to the Google Groups "CockroachDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cockroach-db...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cockroach-db/CAOk%2BzffviLT2JhN%3DO-pkfgwOF37-JmxgBAr5P6uVfKFns3%3DPvA%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages