PBS: Better understanding latency vs. consistency trade-offs in Riak

Showing 1-3 of 3 messages
PBS: Better understanding latency vs. consistency trade-offs in Riak Peter Bailis 1/19/12 10:03 AM
As the Jan. 4th Riak Recap briefly mentioned, we recently completed research at UC Berkeley that's highly relevant to Riak and are interested in feedback from the Riak community. In brief, eventually consistent replication (which is often faster than strongly consistent replication) provides no *guarantees* about the recency of data returned. However, we can accurately provide *expectations* of data recency. Our work, which we call Probabilistically Bounded Staleness (PBS), helps make these predictions. Using PBS, we can optimize the trade-off between latency and consistency provided by partial quorums (R+W <= N) by predicting both with high accuracy.

Currently, in quorum-replicated data stores like Riak, there's no good way to predict the benefit of using partial quorums or the consistency they provide. By measuring the latency of messaging and using modeling techniques we've developed, Riak can do better by describing the probability of consistency according to both time and versions (see an interactive demo in your browser at http://cs.berkeley.edu/~pbailis/projects/pbs/#demo). 

Thus far, in addition to general Dynamo-style replication analysis, we've developed a patch for Cassandra that performs the required latency profiling and are interested in potentially working to integrate PBS analysis into additional data stores like Riak, which can also benefit from PBS analysis (see code and documentation at https://github.com/pbailis/cassandra-pbs). These techniques are broadly applicable: for example, in our Technical Report (http://cs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-4.pdf), in addition to examining Cassandra and Voldemort, thanks to Coda Hale, we predicted the latency-consistency trade-offs of a production Riak deployment at Yammer (thanks again to Coda Hale!). The sample analysis script we've written for Cassandra will work for Riak as well given the proper input format.

We'd welcome any feedback or questions you might have.

Thanks!
Peter Bailis
UC Berkeley

More info:
DataStax wrote a great explanatory blog post on PBS last week: http://www.datastax.com/dev/blog/your-ideal-performance-consistency-tradeoff
You can read an overview of PBS on our project page: http://cs.berkeley.edu/~pbailis/projects/pbs/
You can also read our technical report on PBS that has more technical detail: http://cs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-4.pdf

Daniel Abadi recently blogged about the latency-consistency trade-off: http://dbmsmusings.blogspot.com/2011/12/replication-and-latency-consistency.html
Henry Robinson (Cloudera) also blogged about PBS: http://the-paper-trail.org/blog/?p=334
Re: PBS: Better understanding latency vs. consistency trade-offs in Riak David Smith 1/30/12 10:29 AM
Hi Peter,

Thanks for the email re: PBS -- it's quite interesting stuff.

On Thu, Jan 19, 2012 at 6:03 PM, Peter Bailis <pba...@cs.berkeley.edu> wrote:
>
> Thus far, in addition to general Dynamo-style replication analysis, we've
> developed a patch for Cassandra that performs the required latency profiling
> and are interested in potentially working to integrate PBS analysis into
> additional data stores like Riak, which can also benefit from PBS analysis

Can you describe the class of changes you had to make to Cassandra to
gather the information? We already track a fair bit of latency
information across percentiles -- are there any other inputs you need?

Thanks,

D.

--
Dave Smith
Director, Engineering
Basho Technologies, Inc.
diz...@basho.com

_______________________________________________
riak-users mailing list
riak-...@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: PBS: Better understanding latency vs. consistency trade-offs in Riak Peter Bailis 1/30/12 12:36 PM
Dave,

Thanks for the follow-up!

For PBS, we need four latency distributions.

For writes:
--Time from write coordinator to replica plus replica write processing time (W): How long does it take from the time the coordinator sends out a write request to the time the replica starts serving that write?
--Time from replica to write coordinator (A): How long do write acknowledgements take to be delivered to the coordinator?

For reads:
--Time from read coordinator to replica and time for replica to read corresponding value from its local store (R)
--Time from replica to read coordinator (S): How long do a read responses take to be delivered?

W and R are a function of network delay and local processing time, while A and S should be mostly network delay.  It's best to know these distributions for each of the N ordered responses (e.g., {the first, the second, ..., the nth} for each of {W,A,R,S}), but it's not strictly necessary.  Does that make sense?

Once you have (empirical) distributions, it's easy to calculate the probability of staleness (see a simple implementation, calc_prob_fresh @ https://github.com/pbailis/cassandra-pbs/blob/trunk/pbs/pbs_utils.py).

Thanks!
Peter