Clock offset between nodes

121 views
Skip to first unread message

Kathy Spradlin

unread,
Sep 22, 2014, 5:51:07 PM9/22/14
to cockro...@googlegroups.com
Moving this discussion out of #63

I've been thinking on the problem of measure clock offset that Spencer outlined, attached below. I wanted to make sure I have the full picture, however. 

Is my goal to actually use NTP for synchronization, then suicide on failure to synchronize within some epsilon error bound? Or just to exchange messages with other nodes, and use the general algorithm employed by NTP, or another method, to estimate a node's offset with the cluster as a whole? The latter assumes that the clock time of the local machine will be synchronized some other way, possibly with the default NTP implementation, which will be left to the user to define.

Well there is an important missing piece of the distributed transactions
puzzle right now: correctly measuring clock offset between nodes,
propagating this info via the gossip network and enforcing it by having
nodes commit suicide as soon as they notice they're out of band.
Further, I'd like to find some way to create a merged histogram between all
nodes for measured clocks skews and surface it in our status endpoint
(probably via gossip to start, though eventually we'll want to write this
information as time series information to the database itself).
The algorithm I have in mind is the one used by ntp:
http://en.wikipedia.org/wiki/Network_Time_Protocol
There are others. This paper discusses their preferred method using linear
programming and compares it to three other algorithms:
https://www.cs.umd.edu/class/spring2007/cmsc711/papers/1999-infocom-clock.pdf

Kathy 

Spencer Kimball

unread,
Sep 22, 2014, 6:06:30 PM9/22/14
to Kathy Spradlin, cockro...@googlegroups.com
I don't think it's practical to have nodes adjust their clocks. Instead, my feeling was better to have NTP (or whatever) handle clock synchronization independently (so your latter option!).

Internally, we send heartbeats between any two connected nodes in the system and this is where I imagine we'd measure clock offsets (cockroach as currently designed will end up with a very strongly-connected graph, though this may have to change to support larger clusters). Nodes will be run with a --max_offset (tbd) command line flag and if they measure offsets in excess of that, they should noisily commit suicide. The code for the heartbeats is in rpc/client.go and rpc/server.go.

Propagating information on measured offsets and merging that into a global picture will be a very important. We need to figure out what the reality of clock offsets in production is before I'm going to feel very confident about our guarantees. Certainly, we'll want to set our --max_offset at the 99th percentile.

--
You received this message because you are subscribed to the Google Groups "Cockroach DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cockroach-db...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kathy Spradlin

unread,
Sep 22, 2014, 6:21:11 PM9/22/14
to cockro...@googlegroups.com, kathys...@gmail.com
Thanks! I think I understand, and I agree that the database doesn't logically need to be in charge of synchronization, though it's correctness relies on it.

Spencer Kimball

unread,
Sep 22, 2014, 6:24:53 PM9/22/14
to Kathy Spradlin, cockro...@googlegroups.com
What's your current thinking on the algorithm used to measure offset?

Kathy Spradlin

unread,
Sep 22, 2014, 9:16:36 PM9/22/14
to cockro...@googlegroups.com, kathys...@gmail.com
Hmm, well I suspect that NTP could estimate a given's clocks offset from the average of the clocks of the nodes fairly well. Still, it is only reliable if you statistically filter the data for noise, so I would need be careful in that regard.

If you want an estimate of skew as well as offset (the relative frequency between clocks), just for the sake of monitoring the database, you could also apply the linear programming method in the paper. NTP doesn't particularly deal with skew, it seems.

Spencer Kimball

unread,
Sep 24, 2014, 7:20:51 PM9/24/14
to Kathy Spradlin, cockro...@googlegroups.com
Not sure we need to start with skew. However, if one method makes computing skew much simpler and is nearly equivalent in terms of implementation complexity, I'd say go with that one over NTP.
Reply all
Reply to author
Forward
0 new messages