I was also thinking of a somewhat related thing and wanna see if anybody's juices overflow: I think we could be more aggressive in how we use clock signals for our txn uncertainty windows. At the moment I think the clock signals are just used to ratchet clocks up, but they could also be used for limiting uncertainty.
For background:
Each txn maintains a list of ObservedTimestamps, with ObservedTimestamp[NodeID] being the higher bound of the uncertainty window for values read from the respective node.
Currently, this map is maintained by each txn, individually, recording what timestamp it has observed the first time it goes to interact with another node. The idea is that, if txn Tx has observed ts and node B, then any value at a higher timestamp encountered on that node must have been written after Tx started, and so doesn't need to generate an uncertainty restart. In other words, the txn that has written such a value can be ordered after Tx.
I think we could be more aggressive in populating this ObserverdTimestamp map. At the moment, each transaction is on its own for figuring out its uncertainty wrt each other node. But we could be taking advantage of the following fact: once a transaction has started, every clock measurement from another node that's known to have been taken afterwards can be used to limit the uncertainty for that other node. I'm thinking of a scheme like:
- txn Tx starts on node A, gets ts ta0
- the txn run for a while, does its thing
- node A does some RPC to node B, at ts ta1 (from A's clock) and gets a response with a measurement of tb2 (from B's clock).
- Tx needs to read something from B. To see if it can bound the uncertainty, it could look through its history of clock signals and sees that ta1 > ta0 and so tb2 can be used as the the upper bound for B's uncertainty window