exchanging clock info between nodes

41 views
Skip to first unread message

Andrei Matei

unread,
Jan 9, 2019, 5:33:23 PM1/9/19
to Ben Darnell, Nathan VanBenschoten, Tobias Schottdorf, CockroachDB
Y'all,

I was pulling on a thread with Nathan and we got to the way nodes exchange clock information.
Here's my forensics about how it work: on the incoming request path, a store forwards its clock to ba.Timestamp. ba.Timestamp comes from ba.SetActiveTimestamp() which, for transactional requests sets it the OrigTimestamp or RefreshedTimestamp, and for non-transactional requests sets it to the node's (the receiver's) current clock.

So, for non-transactional requests there's no clock info, and for transactional requests there's stale clock info (pertaining to a transaction's start or read time, not even to it's current write timestamp).

Isn't all this just bizarre?
Am I missing a different, more straightforward, mechanism by which clock information is exchanged directly from clock to clock? Should I look into adding a gRPC interceptor to do piggy-back clock signals on RPCs independently of the type of RPC and the transactional details?

Thanks,

- a_m

Radu Berinde

unread,
Jan 9, 2019, 8:23:45 PM1/9/19
to Andrei Matei, Ben Darnell, Nathan VanBenschoten, Tobias Schottdorf, CockroachDB
There is an old issue about this: https://github.com/cockroachdb/cockroach/issues/7526

Best to discuss in there, especially if someone sees a problem with this. If not, I think we should do it. It's much easier to reason about things when all messages guarantee a HLC update.

-Radu


--
You received this message because you are subscribed to the Google Groups "CockroachDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cockroach-db...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cockroach-db/CAPqkKg%3DiEgix5hhX%3DC8c6j6KA1%2BmdEBxaCh7RNQr8M4JvWe3Ug%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Ben Darnell

unread,
Jan 17, 2019, 9:19:46 AM1/17/19
to Andrei Matei, Nathan VanBenschoten, Tobias Schottdorf, CockroachDB
On Thu, Jan 10, 2019 at 6:33 AM Andrei Matei <and...@cockroachlabs.com> wrote:
Am I missing a different, more straightforward, mechanism by which clock information is exchanged directly from clock to clock? Should I look into adding a gRPC interceptor to do piggy-back clock signals on RPCs independently of the type of RPC and the transactional details?

 
I don't think you're missing anything. The intention was always to exchange timestamp data on every RPC but I don't think we ever built anything beyond the `ba.Timestamp` part (which doesn't even really try to use the right timestamp in transactions). Adding a GRPC interceptor seems like the way to go. 

-Ben

Andrei Matei

unread,
Jan 27, 2019, 1:39:54 PM1/27/19
to Ben Darnell, Nathan VanBenschoten, Tobias Schottdorf, CockroachDB
I was also thinking of a somewhat related thing and wanna see if anybody's juices overflow: I think we could be more aggressive in how we use clock signals for our txn uncertainty windows. At the moment I think the clock signals are just used to ratchet clocks up, but they could also be used for limiting uncertainty. 
For background:
Each txn maintains a list of ObservedTimestamps, with ObservedTimestamp[NodeID] being the higher bound of the uncertainty window for values read from the respective node.
Currently, this map is maintained by each txn, individually, recording what timestamp it has observed the first time it goes to interact with another node. The idea is that, if txn Tx has observed ts and node B, then any value at a higher timestamp encountered on that node must have been written after Tx started, and so doesn't need to generate an uncertainty restart. In other words, the txn that has written such a value can be ordered after Tx.

I think we could be more aggressive in populating this ObserverdTimestamp map. At the moment, each transaction is on its own for figuring out its uncertainty wrt each other node. But we could be taking advantage of the following fact: once a transaction has started, every clock measurement from another node that's known to have been taken afterwards can be used to limit the uncertainty for that other node. I'm thinking of a scheme like:
- txn Tx starts on node A, gets ts ta0
- the txn run for a while, does its thing
- node A does some RPC to node B, at ts ta1 (from A's clock) and gets a response with a measurement of tb2 (from B's clock).
- Tx needs to read something from B. To see if it can bound the uncertainty, it could look through its history of clock signals and sees that ta1 > ta0 and so tb2 can be used as the the upper bound for B's uncertainty window

Ben Darnell

unread,
Jan 28, 2019, 12:22:45 PM1/28/19
to Andrei Matei, Nathan VanBenschoten, Tobias Schottdorf, CockroachDB
Yes, this seems sound to me. 

-Ben

Ben Darnell

unread,
Jan 29, 2019, 1:40:22 PM1/29/19
to Andrei Matei, Nathan VanBenschoten, Tobias Schottdorf, CockroachDB
And a related idea: an application that anticipates a high likelihood of uncertainty errors may want to preemptively populate the ObservedTimestamp map to avoid having to restart after the transaction has begun. As a strawman, this could be a special value for AOST: `AS OF SYSTEM TIME certain_timestamp()` (similar to the `follower_timestamp()` mechanism being proposed for follower reads). This would trigger a round of RPCs to all live nodes to populate the ObservedTimestamp map and set the transaction's timestamp to the maximum timestamp seen. This would prevent all uncertainty errors in the absence of node failures.

-Ben

Andrei Matei

unread,
Apr 6, 2019, 1:28:48 PM4/6/19
to Ben Darnell, Nathan VanBenschoten, Tobias Schottdorf, CockroachDB
I've started work on pushing clock signals from grpc clients to servers (and I also want to do servers->clients) here:
https://github.com/andreimatei/cockroach/pull/new/grpc.clock-interceptors

--
You received this message because you are subscribed to the Google Groups "CockroachDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cockroach-db...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages