Eventually consistent pn-counter generator/checker

24 views
Skip to first unread message

JS

unread,
Jul 26, 2022, 10:42:40 PM7/26/22
to Jepsen Talk
Hi,

Testing Antidote's eventually consistent pn-counter has led to experimenting with evolving jepsen.checker/counter and  maelstrom.workload.pn_counter/checker.

A naive approach of modeling the counter and every client process' possible PoV with it's own TreeRangeSet allows:
  • track/test every (possible) transaction :value
  • more explicit assertions re:
    • client :read must always conform to its PoV re :ok/:info :adds
    • :monotonic? :read deltas must conform to client's PoV of possible values
    • all nodes must return a :final? read that is equal to the others
Spicing up the generators will also help give the checker more texture to work with:
  • diverse generators:
    • grow only
    • swing (try to swing the counter back and forth)
    • random
  • with :add values:
    • >= # transactions + 1
    • unique sequence of random numbers
This creates a more unique/sparse TreeRangeSet for the counter and client PoVs.

It's finding bugs:

An initial implementation is here.

Although this is not that far from the initial counter checkers, it does seem to find slightly more total anomalies and can at times provide a more specific/earlier error.

Going to keep experimenting looking for a better fit for explicitly eventually consistent counters.

P.S. here's a best case example of the generators, checker and db anomaly all lining up for a clear explanation:

; results.edn
{:valid? false,
:errors
({:value 5852405,
:type :ok,
:checker-error :value-not-possible,
:node "n2",
:monotonic? true,
:f :read})
...
:final-reads
{:valid? false,
:errors ({:checker-error :final-reads-not-equal}),
:final-reads
(["n1" 5862735]
["n2" 5852405]
["n3" 5862735]
["n4" 5862735]
["n5" 5862735]),
:counter [[5862735 5862735]],
:suspicious
[{:type :ok,
:f :increment,
:value 10330,
:node "n2"}]}}


; node n2 in history.edn

; last valid read
{:type :ok, :f :read, :value 5651785, :monotonic? true, :time 64261497446, :process 1, :node "n2", :index 11335}
...
; this next read is not possible:
; - the op's seen by this client plus any combination of other op's cannot equal this value
; - the difference is equal to dropping a single :ok transaction that happened on this node
{:type :ok, :f :read, :value 5852405, :monotonic? true, :time 66418677772, :process 1, :node "n2", :index 11465}
...
; all further reads return same not possible value
{:type :ok, :f :read, :value 5852405, :monotonic? true, :time 68087898812, :process 1, :node "n2", :index 11557}
...
; final read is invalid, the difference is a missing :ok op on this node
{:index 11889, :value 5852405, :time 75466391766, :process 1, :type :ok, :node "n2", :final? true, :monotonic? true, :f :read}

Kyle Kingsbury

unread,
Jul 27, 2022, 3:19:46 PM7/27/22
to ta...@jepsen.io
On Tue, 2022-07-26 at 19:42 -0700, JS wrote:
> Hi,
>
> Testing Antidote's eventually consistent pn-counter has led to
> experimenting with evolving jepsen.checker/counter and 
> maelstrom.workload.pn_counter/checker.

Ooooh, very cool! The PN-counter checker is something I'd like to do
more work with, but I haven't actually worked on a system that used
counters in a few years and haven't had an opportunity to fold it into
Jepsen proper.

--Kyle

Reply all
Reply to author
Forward
0 new messages