Hybrid Clock values across disjoint set of servers

Unmesh Joshi

unread,

May 23, 2021, 1:25:32 PM5/23/21

to cockro...@googlegroups.com

Hi,

HLC mechanism allows causally consistent timestamps across the set of servers involved in transactions. But when two clients interact with a two separate, disjoint set of servers, the HLC mechanism will not be enough.

I see that there is an expected max drift configuration used in cockroachdb, but worst case scenario, if clients keep talking to two disjoint sets of servers always, the clocks across these disjoint sets will never be synced by the HLC mechanism.

I think this is safe, as it still guarantees serializability. But, will gossiping the hlc from each server make the hlc values converge and have closer timestamps even when clients always talk to totally disjoint sets of servers always?

I see that mongodb uses gossip protocol to converge clusterTime across servers. Will this also be useful in CockroachDb?

Thanks,

Unmesh

Tobias Grieger

unread,

May 24, 2021, 3:02:03 PM5/24/21

to Unmesh Joshi, cockroach-team

Hi Unmesh,

CRDB does not necessarily enforce real-time ordering between non-overlapping clients. You can read this blog post:

https://www.cockroachlabs.com/blog/consistency-model/

for more. (Search for "causal reverse"). The clock offset bound is something that CRDB *requests* to be true (it does not adjust the clocks to make them converge). We are however expecting the clock bounds provided by major cloud hosting providers to improve over time, meaning the bounds CRDB relies on can be tightened, which allows for improved performance in some cases.

Best,

Tobias

--
You received this message because you are subscribed to the Google Groups "CockroachDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cockroach-db...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cockroach-db/CAOk%2BzffifbBojuAWma7Uu-9hBnaVahY%3D9ntnDt%2BaEwo7Kdu13w%40mail.gmail.com.

Unmesh Joshi

unread,

May 25, 2021, 9:24:52 AM5/25/21

to Tobias Grieger, cockroach-team

Hi Tobias,

I went through the documentation, and I think causal reverse anomaly is fine as long as the key ranges are non overlapping.

I did not get why 'single key linearizability' be violated? As documentation says,

'While serializable consistency is maintained regardless of clock skew, skew outside the configured clock offset bounds can result in violations of single-key linearizability between causally dependent transactions'

Why would this ever happen?

Thanks,

Unmesh

Unmesh Joshi

unread,

May 25, 2021, 9:25:02 AM5/25/21

to Tobias Grieger, cockroach-team

Just to confirm I understand this..

I write k=val1 at T=2000. This goes to the range lease holder (leader of the raft group).

Before the commitIndex of raft is propagated to the follower raft replicas, I issue a read request,

read k, ReadAtTimestamp T=2000.

This request is served by the follower, which had a clock drift > 1000, so its HLC is well past 2000, and it will serve the request. But no val1 is not available in the store.

So the issue is about clock drift across leader and follower of same raft group (or replicas for same key range)?

On Tue, May 25, 2021 at 12:16 PM Tobias Grieger <tob...@cockroachlabs.com> wrote:

You might end up writing k=val1 first at timestamp T, then hit another node for a follow up read, be assigned a timestamp T-1000, and fail to read k=val1. (Here we assume 1000 > MaxOffset). Cockroach DB will only forcibly order operations within the configured MaxOffset. When the actual clock skew exceeds that, you may get stale reads, which are non-linearizable.

Tobias Grieger

unread,

May 25, 2021, 9:25:04 AM5/25/21

to Unmesh Joshi, cockroach-team

You might end up writing k=val1 first at timestamp T, then hit another node for a follow up read, be assigned a timestamp T-1000, and fail to read k=val1. (Here we assume 1000 > MaxOffset). Cockroach DB will only forcibly order operations within the configured MaxOffset. When the actual clock skew exceeds that, you may get stale reads, which are non-linearizable.

On Tue, May 25, 2021, 05:34 Unmesh Joshi <unmes...@gmail.com> wrote:

Tobias Grieger

unread,

May 25, 2021, 9:25:05 AM5/25/21

to Unmesh Joshi, cockroach-team

The CockroachDB node servicing your connection picks the timestamp. So "I issue a read request, read k, the node I talk to choses timestamp `now()` which happens to be a past timestamp at which `T=2000` is not visible, so you would fail to read the value."

Unmesh Joshi

unread,

May 25, 2021, 9:25:08 AM5/25/21

to Tobias Grieger, cockroach-team

Got it.

So going back to my original question, and comparing with how mongodb manages clusterTime through gossip. Will making HLC part of the gossip state, reduce the window of this kind of thing happening?

I think this looks like a inteherent problem with HLC, with two alternative solutions,

1. Rely on underlying clock synchronization. Putting some upper bound on clock drift, and track only max clock drift through something like Gossip (or heartbeating).

2. Have a continuous HLC sync happening across cluster nodes through Gossip.

Unmesh Joshi

unread,

Dec 21, 2021, 10:34:47 AM12/21/21

to Tobias Grieger, cockroach-team

Hi,

Just reviving this thread, as I was checking the ClockBound library from AWS. (https://github.com/aws/clock-bound). I assume this is probably how all the cloud providers will be exposing the time APIs. With this kind of API, possibility of 'causal reverse' can be avoided in CRDB assuming commit-wait and read restarts (if there are conflicting writes?).

Thanks,

Unmesh

Tobias Grieger

unread,

Dec 21, 2021, 10:34:53 AM12/21/21

to Unmesh Joshi, cockroach-team

Yes, that sounds correct.

Reply all

Reply to author

Forward