Reasoning more easily about distributed systems

333 views
Skip to first unread message

Liron Shapira

unread,
Nov 17, 2016, 1:53:52 AM11/17/16
to Eve talk
It makes intuitive sense that Eve's unordered semantics will make reasoning about distributed systems easier, I'm just trying to understand how so in a bit more detail.

Within each search/bind/commit section, the individual lines of code are reorderable, right?

Is each bind or commit section a transaction? I.e. Eve guarantees that every search block will see a "consistent snapshot", wherein it appears that each transaction has either fully run or hasn't run at all?

So whenever you need things to happen in order, the trick is to group them into a single commit or bind block, while otherwise you're being explicit about not needing ?


co...@kodowa.com

unread,
Nov 17, 2016, 2:45:58 PM11/17/16
to Eve talk
> Within each search/bind/commit section, the individual lines of code are reorderable, right?

Correct. Furthermore, even search/bind/commit actions can be reordered, so you can commit before you search and it's still valid.

> Is each bind or commit section a transaction? I.e. Eve guarantees that every search block will see a "consistent snapshot", wherein it appears that each transaction has either fully run or hasn't run at all?

Right, I think this program illustrates that:

Set my salary to 10

```
commit
  [#corey salary: 10]
```

This block does three things

1. Binds my salary to the browser for display
2. Adds a new salary
3. Removes the old salary

```
search
  corey = [#corey]
 
bind @browser
  [#div text: corey.salary]

commit
  corey.salary += 30
  corey.salary -= 10
​```

The question is, what shows up in the browser? 10? 30 and 10? 30? nothing? Depending on ordering, these are all valid intermediate states of the program. But you'll only see 30 in the browser. So the computation runs through intermediate states until it can't anymore, and then a transaction is formed, which other blocks can see. This is important for integrity constraints, for instance. We don't want employees with more than one salary, or no salary at all, so we can write this block to enforce that:

​```
search
  [#corey salary]
  1 = count[given: salary]
 
bind @error
  [#error message: "Corey's salary is not right!"]
```

Given our semantics, this error will not trigger. Even though `salary` might have intermediate states that violate this rule, when the program converges, the rule isn't violated so no error is ever raised.

> So whenever you need things to happen in order, the trick is to group them into a single commit or bind block, while otherwise you're being explicit about not needing ?

I think there is more work that needs to be done here, in order to make expressing ordered computations easier. Honestly users shouldn't feel like there's a "trick" to doing ordered computation. Since you're thinking about this now, did you have any suggestions?

Corey

Liron Shapira

unread,
Nov 17, 2016, 2:59:34 PM11/17/16
to Eve talk
Cool thanks for the reply. Hm I didn't personally run into an issue of wanting to do ordered computation, I've just been trying to wrap my head around all the implications of Eve.

co...@kodowa.com

unread,
Nov 17, 2016, 4:29:30 PM11/17/16
to Eve talk
Liron,

> Is each bind or commit section a transaction?

I need to amend my comment to address this more precisely. The process works like this: a program starts and all blocks have a chance to run. Once all blocks have run (or not if they couldn't), and the computation doesn't move forward (no blocks have anything new to say), then the net changes to the database are packaged as a "transaction" for the next round of computation.

The program I gave you executes in two transactions. In the first, Block 1 (B1), commits [#corey salary: 10]. Block 2 (B2) cannot run because it can't see the results of B1 since the transaction has not finished. The same goes for Block 3 (B3), which is the integrity constraint. Now that all blocks have had a chance to run, the transaction is complete, and the net effect is we added [#corey salary] to the database.

At the start of the second transaction, B1 has nothing to do. However, this time B2 can run, and it does the three things I mentioned in my last post: adds a record to @browser, removes the value 10 from salary, adds the value 30 to salary.

During this transaction there will either be 0 or 2 salaries, depending on which order the += and -= are executed. Either way, this would be problematic if we didn't have transactions, because B3 could see this intermediate state. But B3 cannot see this state; it only sees a single salary (the one committed by B1) during this transaction, so it does not run. Now that all blocks have had a chance to run, and no blocks propose any further changes, the transaction is complete.

(as an aside in B3 I should have written 1 != count[given: salary], to say that the error occurs only when the salary isn't 1)

Corey

Liron Shapira

unread,
Nov 17, 2016, 4:45:04 PM11/17/16
to Eve talk
Thanks for the clarification, this is fascinating.

To confirm my understanding: If we didn't have the `corey.salary -= 10` part, the third block would notice the problem in the third tick, when it looks at the snapshot following the second tick?

I'm still curious how Eve might make things easier when replication comes into the picture. Would be great to see an example of, say, a write operation going to a replication slave, which normally is a pain to handle, but somehow nicer with Eve.

By the way, this is also for a series of blog posts about Eve that I've been working on.

Liron Shapira

unread,
Nov 17, 2016, 6:57:28 PM11/17/16
to Eve talk
I've been doing some reading on Dedalus...

It seems that Eve's `commit` and `bind` semantics correspond to Dedalus's `@next`.

Will there be a separate Eve language feature corresponding to Dedalus's `@async`?

co...@kodowa.com

unread,
Nov 17, 2016, 7:00:59 PM11/17/16
to Eve talk
> By the way, this is also for a series of blog posts about Eve that I've been working on.

Oh great, glad to help in any way we can!

> To confirm my understanding: If we didn't have the `corey.salary -= 10` part, the third block would notice the problem in the third tick, when it looks at the snapshot following the second tick?

Yup, you got it.

> Would be great to see an example of, say, a write operation going to a replication slave, which normally is a pain to handle, but somehow nicer with Eve.

Not too sure of the domain you're working in, but perhaps someone else on the team has a good example.

Corey

Nick Smith

unread,
Nov 20, 2016, 6:12:31 AM11/20/16
to Eve talk
Once all blocks have run (or not if they couldn't), and the computation doesn't move forward (no blocks have anything new to say), then the net changes to the database are packaged as a "transaction" for the next round of computation.

This sounds to me like you're implying there can be multiple rounds of calculation within a transaction. Don't all blocks work exclusively with data from the prior timestep/transaction* and (re-)execute at most once per timestep? If so, what is the significance of this part of what you said? My understanding is that you're probably referring to the "fixed point" that occurs when nothing changes between timesteps (as opposed to within one). I may be mistaken.

I'm just trying to wrap my head around the fine details of the semantics here. This deserves an excruciatingly detailed blog post / doc.

* so using terminology from the Dedalus paper, blocks comprise "inductive rules", and not "deductive rules"

Chris Granger

unread,
Nov 20, 2016, 12:49:26 PM11/20/16
to Nick Smith, Eve talk
@Liron

Would be great to see an example of, say, a write operation going to a replication slave, which normally is a pain to handle, but somehow nicer with Eve.

Unlike Dedalus, our goal isn't to create a language to create distributed systems in, but instead to make Eve be a distributed system by default. So while it could ostensibly be good at writing a replication protocol, we don't want you to ever have to do that. By leveraging the semantics that make it simple to reason about building distributed systems, we can make a platform that is itself automatically distributed based on a set of constraints that you set. All of that will come much later though as part of the world scale computer milestone.

It seems that Eve's `commit` and `bind` semantics correspond to Dedalus's `@next`.

Correct, though we layer the notion of a "transaction" over that. See below.

Will there be a separate Eve language feature corresponding to Dedalus's `@async`?

We were just talking about this the other day and I'm pretty sure there will be. For most things you don't need it given our semantics, but the one case where you really do is if you want to express a long-running computation without blocking.

@ Nick

We depart from Dedalus's exact semantics, but we resolve down to them. The current formulation is that a commit (which bind is a macro over) is an "inductive rule" meaning that it looks at what it can see now and says something about what will be true in the next time step. We don't currently have "deductive rules", which allow you talk about changes in the current time step. Because of that decision, we might go through time steps where we're in the middle of a computation and the results don't yet make any sense. This would be fine if we wanted to think of our entire system as being eventually consistent, but then things like integrity constraints are largely meaningless. So to recover our ability to reason about correctness, we have the concept of a "transaction". During a time step all the blocks in your program have the opportunity to execute once and propose changes to t + 1. Once they've all done so, we run the next time step. We continue to do this until none of the blocks propose a change. Once we've reached a stable point, we consider that the end of the "transaction". At that point, we'd check if there were any errors and assuming there are none, we atomically notify the world of the complete set of changes we accumulated in all of our time steps. 

We'll talk more about this as we work more on the world scale computer stuff, but the reason for doing it this way is that it opens up the possibility of having a strongly consistent system with an eventually consistent performance profile. Also by only having deductive rules that run to a fixed point, more programs are able to terminate because they're stratified across time. For example, if you had a block like this:

```
search @event @session
  app = [#app]
  [#click]
commit
  app.counter := app.counter + 1
```

If that weren't stratified across time it wouldn't terminate,because the click would still be there and we'd add 1 to the counter infinitely. Since commit goes across time and there's a block in the event database that removes the click if it sees one, everything is fine. At t0 we add a click, at t1 we update the counter and remove the click, at t2 the block fails because there's no click. In general, our goal was to make it so that you only really needed one notion of time - that every change will happen in the next time step - and that you could ignore some of the harder things to reason about when talking about "now" by default.

--
You received this message because you are subscribed to the Google Groups "Eve talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eve-talk+u...@googlegroups.com.
To post to this group, send email to eve-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/eve-talk/60ad8878-faf0-4a19-8084-2a98d8052926%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Liron Shapira

unread,
Nov 20, 2016, 1:38:50 PM11/20/16
to Eve talk, nmsm...@gmail.com
Unlike Dedalus, our goal isn't to create a language to create distributed systems in, but instead to make Eve be a distributed system by default. So while it could ostensibly be good at writing a replication protocol, we don't want you to ever have to do that. By leveraging the semantics that make it simple to reason about building distributed systems, we can make a platform that is itself automatically distributed based on a set of constraints that you set. All of that will come much later though as part of the world scale computer milestone.

That's what I figured, but do you have a hypothetical example of an Eve program with a specific guarantee that it would give when run on a distributed system, which for a typical non-Eve program would be harder to achieve due to the complexity of managing replication race conditions?

 
Once we've reached a stable point, we consider that the end of the "transaction". At that point, we'd check if there were any errors and assuming there are none, we atomically notify the world of the complete set of changes we accumulated in all of our time steps. 


So if I understand correctly, there are two levels of atomicity:

  1. Each time frame is atomic. Each `search` block is evaluated exactly once per timeframe.

  2. A multi-frame "transaction" is atomic, not with respect to `search` blocks, but with respect to "the world", which is outside the scope of Eve 0.2.

Nick Smith

unread,
Nov 20, 2016, 9:03:34 PM11/20/16
to Eve talk, nmsm...@gmail.com
@Chris
 
Thanks for that explanation. That helps clarify things quite a bit! (You need an official doc page for i☺). Some thoughts:

We don't currently have "deductive rules", which allow you talk about changes in the current time step. Because of that decision, we might go through time steps where we're in the middle of a computation and the results don't yet make any sense.

Once we've reached a stable point, we consider that the end of the "transaction". At that point [...] we atomically notify the world of the complete set of changes we accumulated in all of our time steps. 

We need something that works like deductive rules in order to write functions, and I think we can get it by (beautifully) reusing inductive rules, and without requiring any special function semantics! You mention that a transaction is completed when a stable point is reached, at which time the results are visible to "the world". Well, what if we could define our own micro-worlds in Eve that can be nested into one another? Put some code blocks into a micro-world, then when data is injected (this is function input), execution iterates over micro-timesteps within the micro-world (i.e. iteration/looping is just time progression). This all happens within a single "regular" timestep. Once it reaches a stable point, its records/outputs are visible to the blocks working with these "regular" timesteps.

Roughly, you can think of Eve execution like integration in calculus. Blocks define the derivative/rate of change of the system. The initial database state defines the initial value of the system being integrated. From this vantage point, having two levels of time is like integrating twice! Micro-worlds consequently allow us to express arbitrarily complex ODEs, rather than just first-order ones.

This is precisely the semantics I was proposing for "contexts" in our prior discussion (to which I am still constructing an informed response). It solves the problem of "needing to be able to say something instantly" that you touched on. You were concerned that contexts would add ordering to the semantics, and they would, but only insofar as it stratifies subcomputations in the same way that inductive rules stratify iterations of a computation. The semantics are as simple as: if you modify a subcontext within a timestep of a parent context, then the subcontext runs to fixpoint before the parent context's timestep ends.

Note that we don't need to give up blocks for this as was suggested by my prior post. I expressed multiple independent ideas, which should probably be discussed separately. It's been hard to tease them apart because I've conceived them mostly through intuition about an ideal semantics.

Liron Shapira

unread,
Nov 22, 2016, 12:25:17 AM11/22/16
to Eve talk, nmsm...@gmail.com
The website for Bloom has this profound-seeming quote:

Much of the pain in traditional distributed programming comes from this mismatch: programmers are expected to bridge from an ordered programming model into a disordered reality that executes their code.

I've been trying to show an example of what this means in Part III of my blog series, but I realized I don't actually know.

Chris Granger

unread,
Nov 22, 2016, 2:50:23 PM11/22/16
to Liron Shapira, Eve talk, nmsm...@gmail.com
@nick - yeah, see Eric's response to Zubair on functions. Assuming they're referentially transparent, it's fine to use dedalus's idea of deductive rules. One thing missing from that though, is how you search for the "function call" to augment. Right now we're thinking that'll be a new action called "input" and the only thing you'll be able to do as a result is modify the record that was matched.

@liron

That quote is talking about the fact that our programs, even functional ones, are foundationally predicated on the idea of having an ordered execution, but in a distributed system there's no guarantee of order at all. You can receive events from the future or the past in any order at any time. Even worse, you can just not get events at all, creating gaps in the timeline. So while our code pretends that we have this nice orderly model of execution the reality is something much more terrifying. Because of that mismatch, creating distributed systems is not only difficult but hugely error prone even by experts.

You were asking for examples. A classic demonstration of this problem is an ATM. Let's pretend we implemented an ATM network as if it were synchronous, but was in reality a distributed eventually consistent system. In this system the assumption is that a person can withdraw up to some amount safely based on their current balance. When they withdraw, the balance is reduced by the amount withdrawn and there's a check to make sure that balance can't go below zero. The problem with this system is that there's a fundamental ordering assumption: that withdrawals will happen in some specific order. In an eventually consistent system, it's not really possible to uphold that guarantee and there's a very simple way to steal money in such a system: simultaneously withdraw the maximum amount from several ATMs. Eventually they will reconcile and the bank will find out, but at the time all of the ATMs thought the balance was high enough to make the withdrawal and so allowed it. Dedalus doesn't make writing this system any easier, but it instills the idea that the world is "disorderly" when you're talking about distributed systems. Through analysis it could tell you that you are open to this kind of attack. More often than not the problems with distributed systems arise because we can't figure out how all the possible failure modes might compound into some crazy scenario. Having a platform that can tell you where things could go sideways or where guarantees can't be upheld is hugely beneficial. 

Eve *could* help you write this though. We have a strong consistency model, which can guarantee that transactions will complete as if they were ordered in some way. This means the fundamental ordering assumption in our ATM software is valid. We don't know what order the transactions will completed in, but they won't semantically happen all at the same time and allow theft. Typically the reason why systems don't choose to use a strong consistency model is that most implementations require a lot of coordination between the members of your system. Coordination means waiting for members to respond and waiting means both more latency and less throughput. By building Eve on a model that is fundamentally unordered, but then carefully adding the notion of a transaction to it, we have a way of recovering something akin to the performance of an eventually consistent system with strong consistency's guarantees. Doing so requires a fairly strange set of prerequisites - lack of order, a complete timed log, incremental execution, and so on - which is part of the reason why we haven't seen it before.  I don't think any individual thing in Eve is particularly novel, but part of what we're showing and will continue to show as we move on is that this system is much greater than the sum of its parts and that *does* lead to something fairly novel. For example, because of these characteristics, not only could we have a very interesting distributed system underlying Eve, we could automatically distribute things *for* you. You could make arbitrary cuts on what you want run on the client or the server or let us just figure it out. You could transfer control to other machines, have the system scale automatically, and so on. Removing order is a big part of what enables these things. Finding a way to recover strong consistency is what will make it reasonable for us mere mortals to work with.

Liron Shapira

unread,
Nov 22, 2016, 3:36:15 PM11/22/16
to Eve talk, wis...@gmail.com, nmsm...@gmail.com
Hm, I'm not sure I see the takeaway from this ATM example.

We want to (1) check balance and (2) withdraw. If I write a SQL-powered remote API, I might forget to put these two operations into a transaction. Or I might err on the side of safety and have all my REST API handlers wrap all their DB operations in a transaction by default, and then have a hard time knowing for sure when I can take things out of the transaction.

I get that the status quo is problematic, but what's the delta with Eve? Won't the programmer have basically the same experience telling Eve to group these two operations into a transaction?

Chris Granger

unread,
Nov 22, 2016, 3:57:28 PM11/22/16
to Liron Shapira, Eve talk, nmsm...@gmail.com
Ah, that's not really a distributed system. A single instance of SQL or a single server talking to an instance of SQL doesn't have any of the problems I described because it's effectively a system with a globally coordinated component in it: the database. Let's say you're Bank of America though, and there's no way one SQL database could keep up with the load of ATM withdrawals that happen every minute. In that case you need multiple SQL databases which you then need to keep in sync. How do you do that? How long are you willing to wait when you check with the other databases? How do you determine which database has the most accurate representation of the balance? Local transactions are a relatively solved problem in databases (though there's a lot left to be desired in terms of performance and complexity), but distributed transactions are a mess.

Let's take another example of a distributed system: microservice-based architectures that are all the rage right now. How would you implement a transaction across an entire set of services? How do you guarantee that the balance you read from service A is still correct when service B attempts to transfer it to another bank? Or in the case of an intra-bank transfer, how could you ensure that the total balance across all accounts remains the same?

In a single member system, like your SQL example, there's no question of who knows what, how up to date that knowledge is, and whether we all know the same thing. In a distributed system though, none of that can be guaranteed. One demonstration of this is the Two Generals' Problem. I chose the bank example because it's a commonly used one, but the reality is that basically every system today has some element of distribution to it. If you have code that runs on a local version of data on the client and communicates with a server, you have a distributed system. You can see these fail in all sorts of funny ways on the internet now. E.g. on twitter you can reply to a tweet that has been deleted while you replied, orphaning the reply. You can receive responses to tweets before the original. You can get messages in chat programs out of order or see facebook posts appear and disappear after a refresh. These systems are all eventually consistent, which is fine for something like a tweet. But for systems that run your business, that could lead to disastrous results.

Chris Granger

unread,
Nov 22, 2016, 4:09:16 PM11/22/16
to Liron Shapira, Eve talk, nmsm...@gmail.com
I forgot to answer your last question: "Won't the programmer have basically the same experience telling Eve to group these two operations into a transaction?"

Eve is implicitly transactional by default, so there's no effort on the programmer's part to ensure that invariants are upheld. If you write a rule that says the balance can never go below zero, it never will - Eve will throw out the inputs that started the transaction and store a record saying that it did so. The goal is that with Eve you won't have to think about any of this. You won't need to think about distribution, transactions, where things are, who knows what, etc. Instead you get to pretend like you have a single infinitely sized computer and we'll take care of the rest - the holy grail of distributed systems research. There's a *lot* of work we need to do to get there and we will likely fall short of that lofty goal, but it's plausible to present something significantly better with this foundation.

Liron Shapira

unread,
Nov 22, 2016, 4:26:29 PM11/22/16
to Eve talk, wis...@gmail.com, nmsm...@gmail.com
Ok, let's consider two distributed-system scenarios:

(1) Transactions with a lot of moving parts. For Bank of America's scale, they can simply shard their data to get back transaction semantics. So it seems like you can always architect single-record transactions into your system. But you often need transactions involving fancy queries over multiple parts of data, multiple microservices, etc.

(2) Transactions that don't get to know the full state of the system. I don't know if there's anything interesting to say about the ATM example. Let's say the bank is okay with simultaneous withdrawls for multiple ATMs as long as the most risk it's taking is getting the total balance down to -$500. The total balance is not something that an individual ATM can count on knowing before it has to decide whether to dispense money.

For either/both of these, can we come up with some Eve pseudocode that's somehow nicer than the mainstream equivalent?

Chris Granger

unread,
Nov 22, 2016, 5:09:52 PM11/22/16
to Liron Shapira, Eve talk, nmsm...@gmail.com
Sure!

1) 

For Bank of America's scale, they can simply shard their data to get back transaction semantics. So it seems like you can always architect single-record transactions into your system.

Sadly that's not true. A simple demonstration is transferring from one account to someone else's account. How would you shard to guarantee that every person I would ever transfer money to is on the same shard as I am? It would be nice if sharding could magically fix the distributed system problem, but that hasn't been true in practice - the real world is messy. 

So let's take that as our example. We want to implement a transfer from some arbitrary account, to some other arbitrary account and we need to make sure that the total balance of those two accounts remains the same.

We want to capture the starting total of our transfer that way we can ensure that everything works out

```
search @account @transfer
  transfer = [#balance-transfer account account2 not(starting-total)]
  total = account.balance + account2.balance
commit @transer
  transfer.starting-total = total
```

Once we have a filled out transfer, we do the actual balance adjustment

```
search @account @transfer
  transfer = [#balance-transfer account account2 amount starting-total]
commit @account
  account.balance := account.balance - amount
  account2.balance := account2.balance + amount
```

If somehow the balances don't finally add up to the starting total, something very bad has happened
and we should bail out

```
search @account @transfer
  transfer = [#balance-transfer account account2 amount starting-total]
  account.balance + account2.balance != starting-total
bind @error
  [#error description: "Balance transfer didn't maintain a constant amount of money" transfer]
```

That's it. No locking, no sharding, no explicit transactions, no coordinating, none of the dozens of other things that you'd normally need to think about in this context. The Eve code you would write if this were just running in your browser is the same Eve code you would write if this was running on a 1000 machine cluster with millions of requests a second. All of this is meant to be the basis of Eve's model as opposed to something that you need to engage with as a programmer.

2) Same thing, we don't need to be aware of if this is distributed or local or anything:

```
search @account @withdrawal @atm
  withdrawal = [#withdrawal atm account amount]
  not([#dispense withdrawal])
  account.balance > amount
commit @atm
  [#dispense atm withdrawal amount]
commit @account
  account.balance := account.balance - amount
```

Ensure the sum dispensed is equal to the amount withdrawn

```
search @withdrawal @atm
  transaction = [#withdrawal]
  cash = [#dispensed]
  sum[value: cash.amount, given: cash] != sum[value: transaction.amount, given: transaction]
bind @error
  [#error description: "Cash to dispense is not equal to withdrawl amount", transaction, cash]
```

Ensure that an account can never go below 0

```
search @account
  account = [#account balance < 0]
bind @error
  [#error description: "Account balance is below 0", account, balance: account.balance]
```

In both of these cases the only thing you express in Eve is the functionality and the invariants you want upheld. Regardless of execution strategy, it's then the platform's job to make sure they are upheld.

Liron Shapira

unread,
Nov 22, 2016, 9:17:29 PM11/22/16
to Eve talk, wis...@gmail.com, nmsm...@gmail.com
@Chris thanks, this is helpful. I agree about the limitations of sharding. I'm just still having some issues understanding where Eve improves the status quo...


For #1, it seems a slightly-modified version of your "actual balance adjustment" block is all we actually need:

```
search @account @transfer
  transfer = [#balance-transfer account account2 amount done: false]
commit @account
  account.balance := account.balance - amount
  account2.balance := account2.balance + amount
commit @transfer
  transfer.done := true
```

But isn't this basically identical to the code we'd write on a LAMP stack? :p

While it's nice that Eve is declaratively stating the transaction that needs to happen, and leaving it up to the implementation layer to figure out how to efficiently make the transaction happen in a distributed database, doesn't the mainstream implementation (SQL transactions) already offer this?

(I grant that Eve's abstraction also blurs the line between a distributed database and a local operation, which is nice, but I'm specifically wondering how Eve improves the database layer.)

Re the validation blocks: I also grant that Eve's expressive power to detect error conditions is a big step above what mainstream programmers currently have, due to the fact that you can write these validation blocks. And yes, this is a way in which Eve helps build more reliable distributed systems, but I see it as a separate discussion (for somewhere in my Part IV).



For #2, you say we don't need to be aware if it's distributed or local, but it seems we *do* need to know that @account is a single global value.

Normally, in the spirit of "layer fluidity", I imagine that the underlying client implementation of Eve is allowed to smartly compensate for server lag by having its own replicated copy of (part of) any given database, right? E.g. Meteor and Firebase clients do this.

But for the @account database in particular, we have to know that this isn't okay. I believe this property of @account should be expressed somewhere in the example code. In other words, I think Eve has to distinguish when it's "making an API call" in this example. And I'm afraid this will undermine the coolness of the example.

Chris Granger

unread,
Nov 23, 2016, 6:56:26 PM11/23/16
to Liron Shapira, Eve talk, nmsm...@gmail.com

> But isn't this basically identical to the code we'd write on a LAMP stack? ... doesn't the mainstream implementation (SQL transactions) already offer this?


Let's explore that assumption. Since you brought up the LAMP stack, I'll look at using MySQL and PHP to accomplish a sharded, transactional bank transfer. The clustered implementation of MySQL does appear to support distributed transactions, but with a whole list of caveats. Here are a couple choice excerpts from that document:


> when a transaction modifying a number of rows commits concurrently with a transaction reading the same rows, the transaction performing the read can observe “before” values, “after” values, or both, for different rows among these, due to the fact that a given row read request can be processed either before or after the commit of the other transaction.


That's pretty terrifying (we'll get to that in a second), but I thought this one was even better - basic operations like count can't be trusted:


>  it is not possible to guarantee the transactional consistency of the COUNT() function on the slave. In other words, when performing on the master a series of statements (INSERT, DELETE, or both) that changes the number of rows in a table within a single transaction, executing SELECT COUNT(*) FROM table queries on the slave may yield intermediate results.


So this boils down to the first couple of paragraphs in that document which say that the clustered implementation of MySQL only supports transactions at the "read committed" level. The only guarantee that isolation level provides is that if something is committed you will read it - but it makes no guarantees about *when* you will read it with respect to other transactions. This means that while our transfer is happening, a select can read rows from entirely different snapshots of the database. In the case of a bank, that means I can read values during my transaction that by the end are no longer correct. That alone is sufficient for the ATM attack. This means that distributed transactions aren't enough. Fortunately, MySQL also provides row level locking, so we could lock the row until the entire distributed transaction is complete. The problem with that approach is that the distributed transaction is already at least 10x slower than a normal one and because we are locking the row to prevent any other updates, we're now queuing transactions behind us. With just one slow network trip and a little bit of contention, your entire banking system can trivially come to a halt and there's a good chance it may never catch back up. To top all of this off, MVCC isn't enabled in the clustered implementation so your per machine throughput is down and the increase in memory usage per transaction dramatically decreases the size of data we can work with.


After digging into the MySQL docs for a bit, it's not looking good. Distributed transactions in this context are basically useless and the associated performance issues negate the scaling we were after to being with. The queuing and locking will also open us up to really straightforward attacks (e.g. a bad actor constantly attempts to transfer $1 from one account to another, causing contention which cascades into the whole system stalling). We could ostensibly use a cache in front of the database to address some of that, but then we're back to non-transactional stale reads and updates. So where do we go from here?


I guess we could not use the clustered MySQL and instead use innodb. That pushes sharding and distributed transaction management up to the application level, so we'd have to do a lot of this in our PHP. The naive approach would be to write something like this pseudo code:


$amount = 10;

$from = "chris's account";

$to = "joe's account";

// queryShard will find the right database to query from based on our account

$balance = queryShard($from, "

   BEGIN

   SELECT balance FROM account WHERE account.id = $from AND account.balance > $amount

   UPDATE account WHERE account.id = $from SET account.balance = balance - $amount

   COMMIT

");

if(!$balance) {

 log("insufficient balance for transfer");

 return;

}

// these can't be in one db transaction because they can be from different shards, we could potentially add

// some more complex batching logic to try and take advantage of that when we can, but it'll make the rest

// of the code more complicated

$balance2 = queryShard($to, "

   BEGIN

   SELECT balance FROM account WHERE account.id = $to

   UPDATE account WHERE account.id = $to SET account.balance = balance + $amount

   COMMIT

");

// it's possible that the $to account doesn't exist

if(!balance2) {

 // we need to roll our transfer back

 queryShard($from, "

     BEGIN

     SELECT balance FROM account WHERE account.id = $from

     UPDATE account WHERE account.id = $from SET account.balance = balance + $amount

     COMMIT

 ");

}

$startingTotal = $balance + $balance2;

// ensure total balance remained the same

$updatedBalance = queryShard($from, "SELECT balance FROM account WHERE account.id = $from");

$updatedBalance2 = queryShard($to, "SELECT balance FROM account WHERE account.id = $to");

if($updatedBalance + $updatedBalance2 != $startingTotal) {

 // reset the balances

 queryShard($from, "

     BEGIN

     SELECT balance FROM account WHERE account.id = $from

     UPDATE account WHERE account.id = $from SET account.balance = balance + $amount

     COMMIT

 ");

 queryShard($to, "

     BEGIN

     SELECT balance FROM account WHERE account.id = $to

     UPDATE account WHERE account.id = $to SET account.balance = balance - $amount

     COMMIT

 ");

 log("balance transfer failed for $from $to $amount");

}


Note that this code is magically doing a bunch of stuff that it couldn't do in reality - it'd probably be several times longer if we wrote out all the helpers and things that I'm glossing over. Disregarding that though, it's also fraught with issues. Every time we call queryShard, we're exposing ourselves to inconsistency. Even if we checked previously that an account exists, there's no guarantee by the time we attempt to make the transfer that the account does still exist or that its balance is high enough. So to counter that we check the balance and update it in one transaction, but in between the time we withdraw the funds and attempt to deposit them into the $to account, that account may have disappeared. If it did, we need to undo the withdrawal, which we can't really guarantee will succeed either since more time has passed. We could try to address this by writing a bunch of business logic that locks down what operations are valid while we're going through this process, but we'd need to make sure we accounted for that at every single place we make an update. We'd also need to add some sort of queue to handle the updates we intend to make, but can't yet because we're in our custom transaction process.


Next, when we attempt to check that our total balance has remained constant throughout this process we again run into the possibility that other transactions have taken place and our balances have been changed. If we then attempt to reset, it may be the case that the funds have already been withdrawn from the to account and we've just had a lot of money stolen. Nowhere in any of this are we handling the case where one of our queries just fails because of a partition or the machine disappearing on us. We also have no guarantees about what would happen if the machine that is running this code goes down in the middle of the process. We'd need to implement mechanisms to make sure that if we do experience a failure, money hasn't just disappeared.


To do that correctly, we'd at least need to implement project-wide account locking and even then, for something dealing with money, we shouldn't rely solely on us being diligent enough to check the lock everywhere in our code. Realistically we probably need to implement a two-phase commit or some other consistency protocol here to ensure that we all agree what the values are at the start and what they should be at the end. Now that we're implementing our own consistency protocol, our very simple code for a relatively simple problem requires us to be experts in an intensely difficult field - one that even the experts constantly mess up.


I'm not even going to attempt to show what that code would start to look like, because a) I'd get it wrong and b) it's... a lot. So back to the question at hand: "But isn't this basically identical to the code we'd write on a LAMP stack?" Not even close.


> I believe this property of @account should be expressed somewhere in the example code.


Fair enough, that would be an attribute on the database record:


```

commit @database

 [#database name: "account" shared: true secure: true]

```


Also, if we're including that as part of the cost of the implementation, it seems like we should include all of the configs and schemas that the LAMP stack would entail as part of the comparison too. Those are going to be quite a lot more than two lines.


> In other words, I think Eve has to distinguish when it's "making an API call" in this example.


Nope. :) A database is a database. It doesn't matter if it's on the server or somewhere else. It doesn't matter if it needs to be protected or if it can be optimistically shared. Those are all properties of the system, not of the code. Your program doesn't need to change even if how the system is laid out does.


All of that being said, if there's an alternative approach, I'd love to hear it. While MySQL is probably an extreme case here, most databases will have similar but different caveats in their implementations. If anyone knows of other solutions that would allow you to write the simple code we started with, we'd certainly love to hear about them.




Liron Shapira

unread,
Nov 23, 2016, 7:47:37 PM11/23/16
to Eve talk, wis...@gmail.com, nmsm...@gmail.com
1)

Thanks for the clarification, I hadn't realized that Eve's value-add in the ATM example is to *scalably implement a nontrivial transaction*, and that the status quo of databases supposedly can't do that.

This is great; I'm just worried that there's no Eve-specific shortcut to tackling this monster of a problem.

Are you thinking that making Eve able to implement this ATM example will require less work than upgrading the transaction capabilities of the best SQL database? Or do you think there's a ton of work to do either, but at least with Eve, the work pays off more because the semantics let you express lots of other cool things once you can express the ATM thing?


2)

Well, the `shared: true` attribute can't be treated by the language as an ordinary attribute. For one thing, the language semantics need to say that the consequences of a `@commit` operation on such a record aren't yet visible in the next timestep, right?

While there's a lot of uniformity between interacting with local and remote data, it seems that if you have the luxury of expecting synchronous effects at the next timestep, then it tips you off that you are running locally, rather than making an API call.
...

Chris Granger

unread,
Nov 23, 2016, 8:19:20 PM11/23/16
to Liron Shapira, Eve talk, nmsm...@gmail.com
I'm just worried that there's no Eve-specific shortcut to tackling this monster of a problem. Are you thinking that making Eve able to implement this ATM example will require less work than upgrading the transaction capabilities of the best SQL database?

It requires a very different set of base assumptions/semantics, e.g. only having sets, implementing the database as a materialization over a timeless log, and so on. Here are a couple papers that start to lay that out:


Specifically on how Eve might handle transactions, the closest thing would be LogicBlox's transaction repair:


For one thing, the language semantics need to say that the consequences of a `@commit` operation on such a record aren't yet visible in the next timestep, right?

We maintain the synchronous programming abstraction, so it will be available in the next timestep. As we've mentioned in a couple of other threads we'll probably want an explicit async operator for some things, but this wouldn't be one of them. Looking past that for a moment though, whether or not something is available in the next timestep shouldn't really matter - Eve blocks are reactive, when the data *is* available, the right things will happen.

Liron Shapira

unread,
Nov 24, 2016, 2:59:33 AM11/24/16
to Eve talk, wis...@gmail.com, nmsm...@gmail.com
Ok great, that helps. I'll write a section about Eve's superior potential for distributed transactions in post #3 and post the draft here on Friday.

I'm still a little unclear on the remote API stuff, but I understand it's coming soon, so looking forward to learning more at that time.
...
Reply all
Reply to author
Forward
0 new messages