Globally monotonic counter

105 views
Skip to first unread message

Nathan Gray

unread,
Jul 13, 2009, 8:39:45 PM7/13/09
to google-a...@googlegroups.com
Hi there,

I'm writing a game backend in app engine where progress in each game
is described by timestamped events. So any time a player takes a turn
or types a chat message it generates one or more events, each
timestamped using datetime.utcnow(). Each game and its data form an
entity group in the datastore, and each user can have multiple games
in progress at any given time. On the client side the game keeps
track of the timestamp of the last event it's seen (across *all* the
user's games) and periodically requests all newer events.

But of course GAE is a distributed system and utcnow() is not going to
be consistent across different nodes of the system, so this won't
work. Eventually there will be two near-simultaneous events in
different games that get written to the datastore in the "wrong" order
and one of them will never be seen because it's "older" than the
"newest" event.

So what's the right way to solve this kind of problem? I like the
concept of a global event stream for each user, but maybe I have to
abandon that model and use one event stream per game. Or maybe it
would be better to assume that the clocks won't be *too* far off and
request events with timestamps greater than N seconds before the last
event the client's seen. I'd be grateful for any recommendations or
suggestions.

Thanks,
-n8

--
http://n8gray.org

Albert

unread,
Jul 13, 2009, 10:06:57 PM7/13/09
to Google App Engine
Hi!

This is a quick suggestion.

How about using a global counter (just like your title suggests). You
can use a sharded global counter to facilitate your counting.

And use that counter as a "timestamp" / bookmark.

On every "event", you read from the global counter, use that value as
your "timestamp", and then increment the global counter.

I'm not sure of it's implications, though. I'm not also sure if it
actually guarantees uniqueness of "timestamps" when two events happen
almost at the same time.

Perhaps you can get an idea from this.

Enjoy!

n8gray

unread,
Jul 14, 2009, 2:16:02 AM7/14/09
to Google App Engine
Hi Albert,

Thanks for the suggestion. I think I can live without uniqueness as
long as timestamps don't go backwards. But I think my problem still
exists with a sharded counter. Having a counter is certainly better
than using datetime.utcnow() and lets me assign an order to all events
that have been generated, but that's not really the problem. The
tricky part is deciding, on the client end, which events to request
based on the events you've received. When the client asks for "all
events after time T" it gets some "last" event with a new timestamp
S. But I don't think you can trust this S because there might be some
other event in some corner of GAE with an earlier timestamp that
hasn't yet been observed by the server that answered the client's
request.

I guess the root of the problem is that I know that transactions on
entity groups give me the ACID properties but when it comes to
operations outside of transactions I have no idea what the consistency
model is. Has this been described somewhere?

Thanks,
-n8

Nick Johnson (Google)

unread,
Jul 14, 2009, 5:27:30 AM7/14/09
to google-a...@googlegroups.com
Hi Nathan,

Your best options are either to keep track of one event stream per
game, or to use system time, and 'rewind' the timestamp a bit to
capture any missed events, as you suggest. Global monotonic counters
aren't very practical in large distributed systems.

-Nick Johnson
--
Nick Johnson, App Engine Developer Programs Engineer
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration
Number: 368047

n8gray

unread,
Jul 16, 2009, 6:47:50 PM7/16/09
to Google App Engine
Thanks for the advice, Nick. I'd still like to know more about the
consistency model though. For example, I wonder if there's any
guarantee that two transactions on different entity groups executed by
one process in a given order will be observed in the same order. I
suspect the answer is no. I'm starting to think that the "GAE takes
care of the messy details of distributed systems programming" claim is
a bit overstated...

Cheers,
-n8

On Jul 14, 2:27 am, "Nick Johnson (Google)" <nick.john...@google.com>
wrote:

Andy Freeman

unread,
Jul 17, 2009, 1:29:25 AM7/17/09
to Google App Engine
> I think I can live without uniqueness as
> long as timestamps don't go backwards.

Timestamps on a given server probably don't go backwards, but there's
no guarantee about the relationship between clocks on different
servers.
> > Enjoy!- Hide quoted text -
>
> - Show quoted text -

Andy Freeman

unread,
Jul 17, 2009, 1:35:19 AM7/17/09
to Google App Engine
> I'm starting to think that the "GAE takes
> care of the messy details of distributed systems programming" claim is
> a bit overstated...

Global clock consistency requires very expensive clocks accessible
from every server with known latency (and even that's a bit dodgy).
AFAIK, GAE doesn't provide that, but who does?

GAE doesn't do the impossible, but also doesn't say that it does. WRT
the latter, would you really prefer otherwise?
> > -Nick Johnson- Hide quoted text -

n8gray

unread,
Jul 17, 2009, 2:40:30 AM7/17/09
to Google App Engine
On Jul 16, 10:35 pm, Andy Freeman <ana...@earthlink.net> wrote:
> >                                              I'm starting to think that the "GAE takes
> > care of the messy details of distributed systems programming" claim is
> > a bit overstated...
>
> Global clock consistency requires very expensive clocks accessible
> from every server with known latency (and even that's a bit dodgy).
> AFAIK, GAE doesn't provide that, but who does?
>
> GAE doesn't do the impossible, but also doesn't say that it does.  WRT
> the latter, would you really prefer otherwise?

But that's just it -- in many places it's claimed that GAE makes it
all a cakewalk. From the datastore docs:

"""
Storing data in a scalable web application can be tricky. A user could
be interacting with any of dozens of web servers at a given time, and
the user's next request could go to a different web server than the
one that handled the previous request. All web servers need to be
interacting with data that is also spread out across dozens of
machines, possibly in different locations around the world.

Thanks to Google App Engine, you don't have to worry about any of
that. App Engine's infrastructure takes care of all of the
distribution, replication and load balancing of data behind a simple
API—and you get a powerful query engine and transactions as well.
"""

You could argue that that's not claiming to do the impossible, but
"you don't have to worry about any of that" is certainly not true.
Nowhere in the documentation is there a discussion of the kinds of
subtle gotchas that you need to be aware of when programming for this
kind of system. It's all just "golly isn't this so gosh-darn easy!"
You have to go digging to find the article on transaction isolation
where you find out that your queries can return results that, um,
don't match your queries. And AFAICT you *do* have to worry about
subsequent requests being handled by different servers, since there
doesn't seem to be any guarantee that the datastore writes made in one
request will be seen in the next. Memcache doesn't have transactions,
so it seems like guaranteeing coherence with the datastore is tricky.

I worked in a distributed systems group for many years, so I know that
many of these problems are simply inherent to distributed systems. It
doesn't disturb me that they exist. What bothers me is the way these
issues are broadly *ignored* by GAE's documentation. If I wasn't a
bit savvy about distributed systems I probably wouldn't have realized
that clock skew could cause problems, and nothing I read in GAE's docs
would have helped me figure it out. So no, I don't want GAE to claim
to do the impossible, I want them to *stop* claiming to do the
impossible. I would love to see some articles about the pitfalls of
the system and how to avoid them or mitigate them. The transaction
isolation article is great in that respect -- I hope people at Google
are planning more along those lines.

Cheers,
-n8

Andy Freeman

unread,
Jul 17, 2009, 11:29:18 AM7/17/09
to Google App Engine
You're mis-parsing the sentence. Note that they even tell you what
they mean by "take care of messy details".

Let's look at another example. MS's Visual {whatever} documentation
claims that it makes programming easy. Do you think that that claim
implies that using said product will let anyone produce an O(N)
solution to an NP-complete problem?

> > I worked in a distributed systems group for many years, so I know that
> many of these problems are simply inherent to distributed systems. It
> doesn't disturb me that they exist.

You're complaining that GAE doesn't solve them.

> What bothers me is the way these
> issues are broadly *ignored* by GAE's documentation.

GAE documentation doesn't teach you how to program. It doesn't teach
you how to make money with a web site. It doesn't even tell you how
to do things with other systems so you can compare. It tells you how
to do things with GAE.

Sure, I'd like better tutorials. But, if I had a choice between
documentation that makes it possible to use a new subsystem and more
information on distributed systems programming with a GAE twist, I'll
take the former every time.

n8gray

unread,
Jul 17, 2009, 5:08:02 PM7/17/09
to Google App Engine
On Jul 17, 8:29 am, Andy Freeman <ana...@earthlink.net> wrote:
> You're mis-parsing the sentence.  Note that they even tell you what
> they mean by "take care of messy details".
>
> Let's look at another example.  MS's Visual {whatever} documentation
> claims that it makes programming easy.  Do you think that that claim
> implies that using said product will let anyone produce an O(N)
> solution to an NP-complete problem?

I'm sorry, but "you don't have to worry about any of that" right after
a description of a massively distributed database is a much stronger,
more specific statement than your strawman, and it's also demonstrably
untrue.

> > > I worked in a distributed systems group for many years, so I know that
> > many of these problems are simply inherent to distributed systems.  It
> > doesn't disturb me that they exist.
>
> You're complaining that GAE doesn't solve them.

Where, specifically? You seem to be reacting to this statement:

"""
I'm starting to think that the "GAE takes care of the messy details of
distributed systems programming" claim is a bit overstated...
"""

You may notice that I'm criticizing GAE's advertising, not its
implementation.

Cheers,
-n8

Nick Johnson (Google)

unread,
Jul 22, 2009, 12:56:22 PM7/22/09
to google-a...@googlegroups.com
Hi n8gray,

The Datastore is strongly consistent. As such, all processes will see
changes happen in the same order, even across entity groups - and at
any one point in time, all processes will have a consistent view of
the datastore.

-Nick Johnson

n8gray

unread,
Jul 23, 2009, 4:44:54 PM7/23/09
to Google App Engine
Hi Nick,

Wow, that's impressive! That's a very useful bit of information. Are
memcache writes guaranteed ordered wrt datastore writes as well, or is
it possible for another part of the system to see them in different
orders?

Cheers,
-n8

On Jul 22, 9:56 am, "Nick Johnson (Google)" <nick.john...@google.com>
wrote:

Nick Johnson (Google)

unread,
Jul 23, 2009, 6:18:38 PM7/23/09
to google-a...@googlegroups.com
On Thu, Jul 23, 2009 at 9:44 PM, n8gray <n8g...@gmail.com> wrote:

Hi Nick,

Wow, that's impressive!  That's a very useful bit of information.  Are
memcache writes guaranteed ordered wrt datastore writes as well, or is
it possible for another part of the system to see them in different
orders?

Memcache is also strongly consistent. Since both APIs are synchronous, by the time your API call returns, the changes are visible everywhere - so a datastore write followed by a memcache write will be seen in that order everywhere. There's no guarantee of atomicity across the two, though, so you need to assume that either or both operations could fail independently.

-Nick Johnson
 


Cheers,
-n8

On Jul 22, 9:56 am, "Nick Johnson (Google)" <nick.john...@google.com>
wrote:
> Hi n8gray,
>
> The Datastore is strongly consistent. As such, all processes will see
> changes happen in the same order, even across entity groups - and at
> any one point in time, all processes will have a consistent view of
> the datastore.


n8gray

unread,
Jul 23, 2009, 8:12:13 PM7/23/09
to Google App Engine
On Jul 23, 3:18 pm, "Nick Johnson (Google)" <nick.john...@google.com>
wrote:
>
> Memcache is also strongly consistent. Since both APIs are synchronous, by
> the time your API call returns, the changes are visible everywhere - so a
> datastore write followed by a memcache write will be seen in that order
> everywhere. There's no guarantee of atomicity across the two, though, so you
> need to assume that either or both operations could fail independently.

Ah, I had no idea those APIs were synchronous! In fact, I assumed
that writes would be asynchronous. That makes things much easier. It
would be good to mention that in the documentation for those
subsystems.

Cheers,
-n8
Reply all
Reply to author
Forward
0 new messages