Online banking with GAE?

DennisP

unread,

Apr 23, 2008, 4:20:50 PM4/23/08

to Google App Engine

Suppose I have some large number of users, each with an account
holding virtual dollars. Each user can transfer dollars to any other
user's account.

I can't use transactions, because they only work within entity groups,
and if I put everybody in the same group Google says my site will
grind to a halt.

I can't just retrieve current balances, add/subtract, and save them
back, because two people might retrieve at the same time, and whoever
is quickest to save back loses his update.

I could just save commands (from:Jim to:Dave) and have single backend
process which applies them in sequence, except there are no backend
processes.

Generally I'd get around the lack of backend processes by dividing
them up into little chunks and executing the chunks on the backs of
some requests somewhere...but then I'm back into a situation where
conflicts can occur.

Is there any way to do this reliably on GAE?

Brett Morgan

unread,

Apr 23, 2008, 6:41:29 PM4/23/08

to google-a...@googlegroups.com

I think your last option is actually going to be the most workable. By
having the command entities you have a datestamped transaction log,
thus you won't lose updates. That's the most important part from a
reliability point of view. Transactional conflicts don't impact on
reliability, only on scalability. And if you have web clients sit
there and ajax ping a worker that is applying the command entities
until they are all done, then you can even show the user a progress
meter or other appropriate ui feedback.

DennisP

unread,

Apr 23, 2008, 7:05:02 PM4/23/08

to Google App Engine

And yet...each web request is its own process, so now I have multiple
processes, picking up commands and executing them. I could still have
two processes attempting to update the same account at the same time
and having conflicts.

Seems to me the only way to make it reliable is if there's only one
process which can update a given entity's total. This would be doable
if Google implements backend processes, but not until then.

On Apr 23, 6:41 pm, "Brett Morgan" <brett.mor...@gmail.com> wrote:

> meter or other appropriate ui feedback.- Hide quoted text -
>
> - Show quoted text -

Brett Morgan

unread,

Apr 23, 2008, 9:48:10 PM4/23/08

to google-a...@googlegroups.com

The ordering of the updates via the journal log means that the
conflicts won't cause indeterminancy in output. It's just that you
have multiple processes all attempting to update the data in the same
way, and only one will succeed. This means that Chubby is going to
hate you, but you still get correctness.

This is something that will go away once we get background processing.
Something which the GAE team have said is on their radar. =)

javaDinosaur

unread,

Apr 23, 2008, 9:54:44 PM4/23/08

to Google App Engine

> And yet...each web request is its own process, so now I have multiple
> processes, picking up commands and executing them.

Yes that is a risk though in practice under light to moderate load I
think many requests will be serviced by a resident CGI Python process
already hosting your application.

Is there any attribute in your application that allows accounts to be
clustered in branches? If so I think should cluster around a 1000
accounts per branch and also have a pending transaction list in the
same transactional domain. Then like a real 1980’s banking network a
supervisor process can poll branches and coordinate interbranch
transfers using some wide-area 2-phase commit application logic just
like a real banking network.

The supervisor process is going to be tough to write until Google
release some type of queued message processor service that can invoke
application logic outside the scope of an http request. In the mean
time you could install some DataStore entities to act as a Mutexs to
prevent multiple HTTP threads of activity from processing a pending
transaction batch concurrently.

Are you anticipating the collapse of the global banking system and
preparing a standby system?

Brett Morgan

unread,

Apr 23, 2008, 9:58:10 PM4/23/08

to google-a...@googlegroups.com

Oddly enough, you have exactly the same set of concerns when you want
to write an online MMORPG...

DennisP

unread,

Apr 24, 2008, 3:54:44 PM4/24/08

to Google App Engine

Heheh, I doubt my points will ever be convertible to real currency or
coin, though one can always dream.

No real clustering I can do...hmm, I could maybe put each user in his
own entity group, and insert my commands there...(just "add 3",
"subtract 4", etc)....and also insert to a global list of users with
pending transactions.

Then my request-based processes pop a user off the global list, go to
that user and process all pending adjustments. Now the adjustments are
wrapped in a transaction so it oughta work ok.

At worst, two processes pop the same guy off the global list, but that
just means the slower one will get there and have nothing to do.

DennisP

unread,

Apr 24, 2008, 4:05:35 PM4/24/08

to Google App Engine

Perhaps I'm not understanding correctly...my thinking is,

1) Process A retrieves a command to add 3 points to Joe.
2) Process B retrieves a command to add 5 points to Joe.
3) Process A retrieves Joe's balance "10"
4) Process B retrieves Joe's balance "10"
5) Process A adds 3 to 10 and sets Joe's balance to "13"
6) Process B adds 5 to 10 and sets Joe's balance to "15"

Now the 3 points from Process A are lost. Each process needs to be
able to lock Joe's balance, from retrieval until writing it back.
Seems to do that we need a transaction, so we need entity groups
involved somehow.

> > > - Show quoted text -- Hide quoted text -

Brett Morgan

unread,

Apr 24, 2008, 6:14:37 PM4/24/08

to google-a...@googlegroups.com

I need to spend time understand what we can achieve with gae's
transactions. *facedesk*

dsw

unread,

May 4, 2008, 4:49:13 AM5/4/08

to Google App Engine

This is a real flaw in the App Engine / Big Table transaction model:
there is no way to maintain a global conservation law, that is, a
invariant-sum predicate always that holds on variables in different
entity groups.

In the example of implementing a bank, you can violate it in only one
direction so as to never have extra money in the system as follows.
Decrement account A, then increment account B.

However, the problem is that the server request instance may timeout
before you can increment B. Therefore you need a place to log that B
should be incremented later. This log must be (1) guaranteed to let
you write to it without timing out (can't conflict with other
transactions) and (2) can be deleted from in the same transaction as
account B (therefore has to be in the same entity group with B).

If a transaction gets a lock on an entire Entity Group, then this is
not possible. It would only be possible if you could append to the
list of name-value pairs in B in a way that was guaranteed to complete
"eventually" after the current transactions waiting for B finished.

A much better solution to the problem would be for Google to change
Big Table to allow TWO Entity Groups in ONE transaction. Two is
enough to solve the whole problem for any global conservation law.

dsw

unread,

May 4, 2008, 5:00:15 AM5/4/08

to Google App Engine

I filed this as a bug: http://code.google.com/p/googleappengine/issues/detail?id=313

On May 4, 1:49 am, dsw <daniel.wilker...@gmail.com> wrote:
> This is a real flaw in the App Engine / Big Tabletransactionmodel:
> there is no way to maintain a global conservation law, that is, a
> invariant-sum predicate always that holds on variables in different
> entity groups.
>
> In the example of implementing a bank, you can violate it in only one
> direction so as to never have extra money in the system as follows.
> Decrement account A, then increment account B.
>
> However, the problem is that the server request instance may timeout
> before you can increment B. Therefore you need a place to log that B
> should be incremented later. This log must be (1) guaranteed to let
> you write to it without timing out (can't conflict with other
> transactions) and (2) can be deleted from in the sametransactionas
> account B (therefore has to be in the same entity group with B).
>

> If atransactiongets a lock on an entire Entity Group, then this is

> not possible. It would only be possible if you could append to the
> list of name-value pairs in B in a way that was guaranteed to complete
> "eventually" after the current transactions waiting for B finished.
>
> A much better solution to the problem would be for Google to change

> Big Table to allow TWO Entity Groups in ONEtransaction. Two is

Brett Morgan

unread,

May 4, 2008, 9:12:12 AM5/4/08

to google-a...@googlegroups.com

I'm suspecting that tying up multiple entity groups is going to hurt a
lot. Poor Chubby won't be at all happy.

I wonder if there is another way out of the box.

I suppose the question is, how often are you going to be making
changes to the transactional store vs reading the results pages?

--

Brett Morgan http://brett.morgan.googlepages.com/

dsw

unread,

May 4, 2008, 5:07:09 PM5/4/08

to Google App Engine

In my previous post I suggested two solutions: (1) what I'll call an
"eventual consistency" (a term I got indirectly from Armando Fox that
I hope I'm using to mean the same thing) protocol where money can
disappear but "eventually" will reappear somewhere else, and (2) that
having transactions across two entity groups allows transactions
across N entity groups. In this post we give a solution of form (1).

I said that my first eventual-consistency suggestion did not work in
the presence of timeouts, but it can be made to work in the presence
of timeouts. My friend Simon Goldsmith suggests that this is possible
using eventual-idempotent inter-node communication, detailed below.
Hacking on his original protocol, we came up with the following.

Transfer money from A to B with money temporarily disappearing as
follows. Say that A and B start in states A0 and B0 respectively. To
initiate a transfer create a "Transfer" Object C, in initial state C0;
when C is deleted, the transfer is complete. We need Google to
provide the service that if the creation of this object times-out and
therefore fails that the user gets an error page; I haven't checked
what happens when a request times-out, but if Google doesn't do this
then there is nothing the user program can do.

Have an external server keep checking the set of Transfer objects. If
it finds a Transfer object C in state C0(A,B) then it tells A to go
into state A1, which means atomically A decrements its account and
writes a message to a log local to A saying that it is in state
A1(C0).

Note this step, an in fact all other steps in the protocol, has/have
the following two properties.

Eventuality: the thread doing this may timeout after reading
state C0 and before getting A into state A1, but this step will
"eventually" happen because the thread keeps trying.

Idempotency: Because of A's local log message, A going into state
A1 is idempotent and so the thread can keep trying safely; that is,
since C and A cannot change in the same transaction, the thread could
easily tell A fifty times to go into state A1; A will ignore all but
the first request because it knows from its internal log message that
it is already in state A1.

When A gets a message from C telling it to go into state A1(C), after
doing that as above, A sends a message to C saying that A is now in
A1. When C gets this message it locally checks-off that A is in state
A1(C). If C does not get this message, A just sends it again when C
tells it to go into A1(C) the next time. Again, this is idempotent
and eventual.

C in state C1 now repeats this process with B, ending up with B in
state B1(C) and C in state C2. Now A has been decremented and B
incremented, but we need to collect the garbage of the two log
messages and C.

When C gets the message that B is in state B1(C) and has checked that
off, C now sends a message to both A and B to delete their log
messages for Transfer object C. A and B eventually do that. When
they do they respond that they have deleted the log message and C
checks that fact off locally. This is idempotent because if any
account object such as A or B get a message to delete a log message
for a Transfer object C that they have never heard of, they respond
yes, anyway. When C locally checks off that A and B are done, then C
deletes itself.

Daniel S. Wilkerson
Simon F. Goldsmith

dsw

unread,

May 4, 2008, 5:39:29 PM5/4/08

to Google App Engine

In this post I follow up on how to provide N-group transactions given
a primitive for 2-group transactions (what I called (2) in the first
follow-up post above).

All we need is an N-way lock on all N groups. We do this by creating
a Lock object L containing a list of all of the groups we want a lock
on.

We need a shared, global, but otherwise non-semantic (not meaning
anything) total order on all the groups. For i = 0 to N-1 in this
order, that is for each object A_i in our list of N objects, we do our
hypothesized two-way transaction between the Lock object and A_i as
follows. In the transaction we (1) set a flag or "local lock" on A_i
that says Locked(L) and (2) check off in L that we succeed getting the
local lock on object A_i. A_i now refuses to participate in any
transactions that do not include Lock L.

Assume for the moment that L manages to get a local lock on all of A
objects. Now the N-way transaction has been completely initialized
and our program may do whatever it likes to the N objects.

When our N-way transaction is done we want to release the N-way lock
L. To do this for each object we follow a similar protocol as
acquiring it, this time deleting the local lock in A_i and checking
off that it was deleted in L. When the last check-box is checked in L
we delete L.

One of two things will happen. Either L will get a lock on all the A
objects or it will block. It is a standard result that we cannot get
deadlock because it is impossible to have a directed cycle of N-way
locks, each waiting on another local lock A_i held by another lock L':
eventually that cycle has to go against the order and there would have
to be an N-way lock L-bad that has a local lock A_i and is waiting on
another local lock A_j where j < i, violating our order invariant.

I was thinking that when you fail to get all of your locks you should
release them all and try again but Simon points out that the system
will continue to make progress even if you don't. This is a
performance question. I suspect that if temporary inconsistency can
be tolerated that the eventual consistency protocol gives better
performance.

In any case Google should provide a 2-group locking primitive if they
can do it and if so libraries for building N-way transactions on top
so we don't have to.

Mikolaj Habryn

unread,

May 5, 2008, 1:46:11 AM5/5/08

to google-a...@googlegroups.com

(resending after tweaking subscription address, sorry if it dups)

What if you were to treat transactions (financial ones, not appengine
ones) as entity groups, rather than the account balances themselves?

Transferring funds from A to B would consist of creating a credit
object for B and a debit object for A in one atomic transaction. Basic
double-entry bookkeeping, and it guarantees that either the transfer
happens or it doesn't, no partial balance updates left in limbo.

Getting the current balance of an account gets more interesting, of
course - you have to run a query for all credit and debit objects for
an account and add them up. To optimize, you'd keep a
balance-at-point-in-time record attached to the account that you'd
update lazily, and then only search for debits and credits younger
than that.

The downside of this is that it's hard to protect an account from
being overdrawn, but it does guarantee consistent transactions.
There's things you can do to reduce the risk of overdrawing (if it
matters for your application), but I think solving it perfectly boils
down to a 2-phase commit anyway.

m.

Brett Morgan

unread,

May 5, 2008, 2:03:58 AM5/5/08

to google-a...@googlegroups.com

On Mon, May 5, 2008 at 3:46 PM, Mikolaj Habryn <dic...@rcpt.to> wrote:
>
> (resending after tweaking subscription address, sorry if it dups)
>
> What if you were to treat transactions (financial ones, not appengine
> ones) as entity groups, rather than the account balances themselves?
>
> Transferring funds from A to B would consist of creating a credit
> object for B and a debit object for A in one atomic transaction. Basic
> double-entry bookkeeping, and it guarantees that either the transfer
> happens or it doesn't, no partial balance updates left in limbo.
>
> Getting the current balance of an account gets more interesting, of
> course - you have to run a query for all credit and debit objects for
> an account and add them up. To optimize, you'd keep a
> balance-at-point-in-time record attached to the account that you'd
> update lazily, and then only search for debits and credits younger
> than that.

The bit that worries me with this approach is that we turn what was a
straight read into a query + potential write. But, as always, I should
really write some code and test it before I open my trap =)

--

Brett Morgan http://brett.morgan.googlepages.com/

MattCruikshank

unread,

May 6, 2008, 1:39:18 PM5/6/08

to Google App Engine

Mikolaj,

Thanks for describing this system - it's what I had in my head when I
first read this thread.

I worry that it's too hard in practice, though. The consistency is
what really matters, and I think guaranteeing the consistency - while
handling an unexpected failure at any moment - gets to be rough.

For instance, if I add Debit A, but fail before adding Credit B, then
we have to assert that no transaction took place. Sorry, please try
again.

Makes sense to me, but then the act of lazy compression, the creation
of the balance-at-point-in-time record, becomes just as failure-
sensitive, and I'm not sure how to do it.

-Matt

Mikolaj Habryn

unread,

May 6, 2008, 6:00:33 PM5/6/08

to google-a...@googlegroups.com

On Tue, May 6, 2008 at 10:39 AM, MattCruikshank
<mattcru...@gmail.com> wrote:
> For instance, if I add Debit A, but fail before adding Credit B, then
> we have to assert that no transaction took place. Sorry, please try
> again.

That's the bit that the datastore would look after for you
automatically. It will even retry a few times just in case, depending
on how or where it failed.

In some ways, you're less likely to trigger this sort of problem with
this model than if you have an entity group per account anyway. This
way, you're guaranteed that multiple transactions won't contend on
locks (since they're all independent entity groups). With one group
per account and n simultaneous transactions to the same account, n - 1
of them will have to be retried.

> Makes sense to me, but then the act of lazy compression, the creation
> of the balance-at-point-in-time record, becomes just as failure-
> sensitive, and I'm not sure how to do it.

Caching like this tends to be application-specific - you have to rely
on the nature of your data. Financial transactions are appended only,
they never vanish or change. Also, they are created with monotonically
increasing timestamps. So, based on this, a query for all transactions
for a given account that have a timestamp older than x, as long as x
is safely in the past, will always return the same data - meaning that
you can cache the results of any function over it, and you don't
really care whether it succeeds or fails, since it's just an optional
optimization.

m.

>
> -Matt
>
>
> >
>

ryan

unread,

May 13, 2008, 9:12:35 PM5/13/08

to Google App Engine

Thanks for the posts, guys! First, I'd clarify that the datastore is
based on
optimistic concurrency, not locking, so there are no locks for a
single machine to
hold. Changing the datastore to allow transactions across entity
groups would be a
big design change, so it's unlikely in the near future.

The "eventual consistency" algorithm looks good. It's basically the
standard
distributed transaction algorithm that I'm familiar with. I've
implemented it a
couple times, and it works well.

Happily, you can strengthen it to be fully consistent, not just
eventually
consistent. Whenever you load an entity X, you first run a query to
see if there are
any unfinished C (transfer) entities that reference X. If there are,
you first roll
those forward, then fetch X.

The design for local locks and n-way locks also works. You'd want an
expiration
timestamp on each local lock, so that if a lock isn't released (due to
e.g. the
request deadline), it will expire and can be acquired again.
Otherwise, looks good.
Thanks again for posting both of these!

We'd love to include more python libraries that do things like this.
Right now, though, we're
spending most of our time on other priorities. We're excited that
developers
are writing and publishing algorithms and libraries like these
themselves, since in most cases,
problems like these can be solved without us. We'd be excited to see
this posted more widely,
and we'd also love to see reference implementation. Thanks for
starting the conversation!

Reply all

Reply to author

Forward