Any examples of a specification of an S3-like object store API?

Steve Loughran

unread,

Jul 13, 2016, 8:22:56 AM7/13/16

to tlaplus

I'm looking for some example specifications of an eventually consistent object store, such as amazon S3.

This isn't because I plan to implement one, it's because I want to do some things against such an object store, specifically using a consistent database to address the inconsistency problems (similar to Netflix's s3mper - http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html ), then implement an O(1) output committer for hadoop/Tez/Spark which handles race conditions in multiple (speculative) competing executors and is resilient to failures —precisely the features that you don't get when working with S3 today. And of course, to show that such a mechanism works, a bit of formality can only help.

I know AWS and perhaps the Azure team have been using TLA+ internally: are there any specifications of the exposed behaviours of the S3 & Azure Data Lake stores around? Something from the service owners themselves would be best. Otherwise: has anyone done some examples of object stores with create consistency, queued metadata updates for object listings, asynchronous delete/update operations

For extra fun, S3 appears to briefly cache negative lookups, so that while the creation sequence always holds

~ exists(path)
PUT(path)
GET(path)

an initial GET could leave a negative result which the next GET would retrieve, so the following sequence is not guaranteed to be true, at least if there is "not enough" delay between the PUT and the subsequent GET.

~GET (path)
PUT(path)
GET(path)

While I don't want to go near that problem, it exists —so I'd better write it down and code for it.

Right now I'm not even sure how best to define that "eventually" concept except to say after some time t the observed state of updated/deleted objects will change, what the values of t are for different infrastructure instances

-Steve

Steve Loughran

unread,

Jul 25, 2016, 6:58:14 AM7/25/16

to tlaplus

On Wednesday, 13 July 2016 13:22:56 UTC+1, Steve Loughran wrote:

I'm looking for some example specifications of an eventually consistent object store, such as amazon S3.

In the absence of any such spec, I've sat down to start writing my own, and appear to have got stuck due to my lack of understanding of the TLA+ syntax and/or fundamental misunderstandings of the concepts therein. Some help would be appreciated.

1. Defining a function now() which returns a timestamp.

I've done something ugly which fails to include the fact that the timestamp can/should be considered monotonically increasing from some external
source.

CONSTANT now(_)
ASSUME \A t \in Nat: now(t) \in Timestamp

Why the meaningless argument? To distinguish now(0) from "now", which is too like an unchanging constant for my liking, and because now() is illegal.

What should I be doing here?

2. What's the best way to specify postconditions of an operation?

Example: after a delete I want to declare the fact "the path is no longer in the store"

doDelete(path, result) ==
LET
    validArgs == path \in Paths
    exists == has_entry(store, path)
IN
    \/ /\ ~validArgs
        /\ result' = BadRequest
        /\ UNCHANGED store
    \/ /\ validArgs
        /\ ~exists
        /\ result' = NotFound
        /\ UNCHANGED store
    \/ /\ validArgs
        /\ exists
        /\ result' = Success
        /\ store' = [p \in (DOMAIN store \ path) |-> store[p]]
        /\ ~has_entry(store', path) \* HERE

Now, that's somewhat superfluous given definition of has_entry;

has_entry(s, p) == p \in DOMAIN s

the definition of what store' becomes declares that path isn't in the domain, but by doing it this way I'm spelling out what test. Except if I do it where I have, I'm essentially creating a new outcome, aren't I?

The alternative would be some invariant, though as well as showing up my confusion about predicates vs set refinement, I also have to consider that problem which it only holds if nothing else does a PUT operation between the delete and check.

DeleteInvariant == \A p in Paths: doDelete(p, Success) ==> ~has_entry(store', p)

I don't want to go there yet.

3. Invariants of consistency across operations. How to declare that the metadata from the doGet and the doHead operations are consistent.

GetAndHeadConsistent ==
\A path \in DOMAIN store, sysMd \in SystemMetadata, data \in Data :
    doGet(path, Success, data, sysMd) ==> doHead(path, Success, sysMd)

Again, people may think this is obvious, but OpenStack swift can return different object lengths in its doList from its doHead operation. Calling it out provides a clear declaration of an invariant people should write tests for.

I know these will all sound trivial to people that actually understand the details of TLA+ but I'm only getting there, first with a simple definition of a consistent object store API, from which I can then extrapolate to one with eventual consistency, where things like that delete invariant downgrades to "holds after some bounded time period"

-Steve

Andrew Helwer

unread,

Jul 25, 2016, 4:20:35 PM7/25/16

to tla...@googlegroups.com

Hi Steve, all good questions.

1. Defining now()

Monotonically-increasing variables make model-checking difficult, because absent restrictions this means the system has an infinite number of possible states and so the model checker will never halt. My advice here is to think carefully about whether a monotonically-increasing global clock is really required for you to achieve the desired semantics. If you absolutely need the global clock, define it thus:

VARIABLE GlobalClock

TypeInvariant == GlobalClock \in Nat

Init == GlobalClock = 0

Tick == GlobalClock' = GlobalClock + 1

Model-checking this is difficult. You'll have to define Nat to be a finite set like 0 .. 100, which means there exist behaviors which probably don't conform to the real world system - like an uninterrupted series of Tick relations until you hit GlobalClock = 100, after which other actions take place. You can mitigate this by not allowing Tick if other actions are enabled, but really this whole approach is messy and you're better off without a monotonically-increasing global clock. Why do you need one?

2. Specifying postconditions

It's best to view primed variable "assignments" and postconditions as one and the same. This is tricky to understand, but when you have an action like this:

ExampleAction ==

/\ x \in {0, 1}

/\ x' = 1 - x

you aren't saying "if x is 0 or 1, then assign 1 - x to x after this action executes". You're saying "the ExampleAction formula is true of a step in a behavior if in the first state x is either 0 or 1, and in the second state x' = 1 - x". This isn't pedantry; moving from the state machine paradigm to the behavior paradigm is critical for understanding the temporal logic component of TLA+. The purpose of a spec is to define a set of correct system behaviors, where a behavior is a sequence of states and a state is an assignment of values to variables. So when you see this:

Init == x = 0

Spec == Init /\ [][ExampleAction]_<<x>>

You understand that spec defines a subset of all behaviors such that Init is true in the first state, and for all steps either the ExampleAction boolean formula is true or it is a stuttering step.

How does this all relate back to your question? Simply, a postcondition of the type you've given is perfectly fine. You can use primed variables in whatever logical statements and checks you want and it will not create a new outcome. What adding conjuncts can do is remove outcomes. For example, if we have the following:

ExampleAction ==

/\ x \in {0, 1}

/\ x' = 1 - x

/\ x' \in {2, 3, 4}

Here we have the "postcondition" check that x' is in the set {2, 3, 4}. It's impossible for ExampleAction to be true of any step, since the conjuncts are clearly contradictory and can never all be true. Thus this step can never be taken. Tying this back to your example, we have these conjuncts:

/\ store' = [p \in (DOMAIN store \ path) |-> store[p]]
/\ ~has_entry(store', path) \* HERE

If the first conjunct is true (which will probably always be the case) and the second conjunct (your postcondition) is not, TLC will simply not take this step. Personally I would use the first conjunct as the postcondition in and of itself.

A minor stylistic note: rather than modifying the domain of your store variable, consider mapping undefined paths to a placeholder null value instead.

3. Invariants

I'm not sure I understand this question. Could you specify what doGet and doHead do, and what it means for them to be consistent?

Andrew

Leslie Lamport

unread,

Jul 25, 2016, 5:44:06 PM7/25/16

to tlaplus

Andrew didn't notice that the conjunct

/\ ~has_entry(store', path)

is not "somewhat superfluous". It is completely superfluous because
it's implied by the preceding conjunct

/\ store' = [p \in (DOMAIN store \ path) |-> store[p]]

and the definition of has_entry. More precisely, it would if you had
written that preceding conjunct correctly. I don't remember what the
precedence of the operators is, so I don't know how

DOMAIN store \ path

is parsed. I presume you meant to write

DOMAIN (store \ path)

But S \ T is the set of elements in S but not in T, so the expression
you wanted is

DOMAIN (store \ {path})

To give you a more general answer, the next-state relation specifies
the new values of variables. A postcondition is a condition that
should hold on the values of the new variables. It should be implied
by the next-state relation and whatever precondition you assume about
the old values of the variables. As Andrew pointed out, sometimes you
can make the desired postcondition a conjunct of an action to rule out
some possible new values of variables that the rest of the action
allow.

----

TLC will check if a formula Inv is an invariant of the specification,
meaning that Inv is true on all reachable states. If you want to
put the assertion that Inv is an invariant in the specification, you
can write

THEOREM Spec => []Inv

where Spec is the TLA+ temporal formula that constitutes the
system's specification. However, TLC does not check THEOREMs.

----

As Andrew indicated, if you want to add a concept of time to your
spec, you should add a variable that represents a clock. If you want
your specification to model real time, see my article "Real Time is
Really Simple" at

http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#real-simple

You need real time if you are interested in properties such as "if X
happens then Y happens within 5 seconds". However, if you just need
properties such as "if X happens then Y eventually happens", then you
don't want to introduce time. You should instead learn about temporal
logic, liveness, and fairness--e.g., in Chapter 8 of "Specifying Systems".

Leslie

Steve Loughran

unread,

Jul 26, 2016, 5:08:17 AM7/26/16

to tla...@googlegroups.com

On 25 July 2016 at 21:20, Andrew Helwer <andrew...@gmail.com> wrote:

Hi Steve, all good questions.

1. Defining now()

Monotonically-increasing variables make model-checking difficult, because absent restrictions this means the system has an infinite number of possible states and so the model checker will never halt. My advice here is to think carefully about whether a monotonically-increasing global clock is really required for you to achieve the desired semantics. If you absolutely need the global clock, define it thus:

VARIABLE GlobalClock

TypeInvariant == GlobalClock \in Nat

Init == GlobalClock = 0

Tick == GlobalClock' = GlobalClock + 1

Model-checking this is difficult. You'll have to define Nat to be a finite set like 1 .. 100, which means there exist behaviors which probably don't conform to the real world system - like an uninterrupted series of Tick relations until you hit GlobalClock = 100, after which other actions take place. You can mitigate this by not allowing Tick if other actions are enabled, but really this whole approach is messy and you're better off without a monotonically-increasing global clock. Why do you need one?

I really want set a wall time when objects are created. So it's purely a state thing. I suppose I could do something like Apache Spark's clock code, which has >1 implementation of the clock: the wall clock and the test clock.

2. Specifying postconditions

It's best to view primed variable "assignments" and postconditions as one and the same. This is tricky to understand, but when you have an action like this:

ExampleAction ==
/\ x \in {0, 1}
/\ x' = 1 - x

you aren't saying "if x is 0 or 1, then assign 1 - x to x after this action executes". You're saying "the ExampleAction formula is true of a step in a behavior if in the first state x is either 0 or 1, and in the second state x' = 1 - x". This isn't pedantry; moving from the state machine paradigm to the behavior paradigm is critical for understanding the temporal logic component of TLA+. The purpose of a spec is to define a set of correct system behaviors, where a behavior is a sequence of states and a state is an assignment of values to variables. So when you see this:

Init == x = 0

Spec == Init /\ [][ExampleAction]_<<x>>

You understand that spec defines a subset of all behaviors such that Init is true in the first state, and for all steps either the ExampleAction boolean formula is true or it is a stuttering step.

I suspect that I've been too used to other specification systems, ones where things are explicitly split into precond and postcond tests. If instead I'm looking more at the state changes during correct operation, then, I can just mix things up in the /\ statements.

How does this all relate back to your question? Simply, a postcondition of the type you've given is perfectly fine. You can use primed variables in whatever logical statements and checks you want and it will not create a new outcome. What adding conjuncts can do is remove outcomes. For example, if we have the following:

ExampleAction ==
/\ x \in {0, 1}
/\ x' = 1 - x
/\ x' \in {2, 3, 4}

Here we have the "postcondition" check that x' is in the set {2, 3, 4}. It's impossible for ExampleAction to be true of any step, since the conjuncts are clearly contradictory and can never all be true. Tying this back to your example, we have these conjuncts:

/\ store' = [p \in (DOMAIN store \ path) |-> store[p]]
/\ ~has_entry(store', path) \* HERE

If the first conjunct is true (which will probably always be the case) and the second conjunct (your postcondition) is not, TLC will simply not take this step. Personally I would use the first conjunct as the postcondition in and of itself.

I agree, it is superfluous, except I like to spell out clearly those assertions that are good for testing.

What I think I will try and do is look more at sequences of actions, so that a doPut followed immediately by doGet will return the item that was put. That's is the kind observable state that tests like.

A minor stylistic note: rather than modifying the domain of your store variable, consider mapping undefined paths to a placeholder null value instead.

Noted, except that I do like to use that (DOMAIN store) elsewhere to define the set of paths which have data behind them. I think my life would get more complicated

3. Invariants

I'm not sure I understand this question. Could you specify what doGet and doHead do, and what it means for them to be consistent?

they're essentially the HTTP HEAD/GET calls. HEAD returns whether there's something at a path, and if so: metadata, including length of the file.

GET returns the same metadata and the contents too. For valid implementations: the metadata returned by GET and HEAD must be the same; the data coming back with a GET is that which was most recently submitted with a PUT (absent any intermediate DELETE).

There's also the internal consistency requirements of the Get operation: the data returned in the response is the same length as the length declared in the metadata. Obvious, yes, but evidence implies it doesn't always hold: https://issues.apache.org/jira/browse/HADOOP-11202

And again, any list operation should return the length of all the objects in the object store, keeping them consistent with the actual numbers returned.

These are all the kind of things which filesystems do, so fundamental that everyone tends to take them for granted: ls, stat and cat are consistent in their reporting of data. It turns out that the further you get from your posix-compatible local FS, the less you can rely on these things. What I'm trying to do here is write them down for the implementors of clients of the object store to make sure they've implemented, as well as tell users of the APIs what they can expect.

Andrew Wilcox

unread,

Jul 26, 2016, 6:54:46 PM7/26/16

to tlaplus

I really want set a wall time when objects are created.

In your original post you said you wanted to be able to model that for example ~GET(path), PUT(path), GET(path) wasn't guaranteed to get what you put. What else do you need to model, that you'd need a wall time for?

Steve Loughran

unread,

Jul 27, 2016, 6:19:06 AM7/27/16

to tla...@googlegroups.com

On 26 July 2016 at 23:54, Andrew Wilcox <andrew...@gmail.com> wrote:

I really want set a wall time when objects are created.

In your original post you said you wanted to be able to model that for example ~GET(path), PUT(path), GET(path) wasn't guaranteed to get what you put. What else do you need to model, that you'd need a wall time for?

I'm trying to set a created date on files. That's all. I suppose I could have a specific construct where the current time is passed in to the object store as another argument, so the state of the inner system isn't running on a clock, just there is something, somewhere, that has a notion of time.

regarding that sequence of ~GET(path), PUT(path), GET(path), that's apparently some caching of negative lookups which AWS S3 does, as part of a DDoS defence: both positive and negative results are cached.

It's interesting as it complicates testing; you can't execute that exact sequence to demonstrate that the store appears to have create consistency, or if you do, you'd better have some idea of what a good delay between the negative GET and the PUT should be. And it'd have to be a coded in delay, as the usual probe+sleep strategy doesn't work, as it just refreshes the entry in the cache.

Beyond the scope of this first exercise anyway; here I want to declare that after a PUT, a HEAD or a GET returns the length of the data submitted, the time it was submitted; the GET also returns the exact data supplied.

Andrew Wilcox

unread,

Jul 27, 2016, 9:01:41 PM7/27/16

to tlaplus

I'm trying to set a created date on files.

Maybe this is what you meant by a test clock... what you could do is have each event that needs a timestamp increment an integer index variable t, and time[t] is the time of the event with the time index t. In your model you wouldn't need to have the "time", the actual wall clock time, you could just have t=1, t=2, etc. Then to show that a HEAD or GET returns the correct time the data was submitted, you could have the methods return t=2 as the created date, where the data was submitted in event 2, without having to model which particular time t=2 was.

It's interesting as it complicates testing; you can't execute that exact sequence to demonstrate that the store appears to have create consistency, or if you do, you'd better have some idea of what a good delay between the negative GET and the PUT should be. And it'd have to be a coded in delay, as the usual probe+sleep strategy doesn't work, as it just refreshes the entry in the cache.

A thought: you could add a "DELAY" event to your model; so that e.g. a "~GET DELAY PUT GET" would successfully get what was put; without having to model how long the delay was in wall clock time.

Reply all

Reply to author

Forward