How should we count with Google App Engine? (It's not as simple as 1,2,3)

690 views
Skip to first unread message

Aral Balkan

unread,
Jun 5, 2008, 5:34:19 PM6/5/08
to Google App Engine
I know this may sound silly, but one of the biggest issues I'm running
into with Google App Engine is counting.

If you look in the datastore in your admin dashboard, you'll find that
a count of objects is conspicuously missing.

So for example, how do you find out how many users have an account on
your app?

You could count() which we've seen has been (a) documented as slow and
which we've (b) warned not to use. And, it will only work until your
app has 1,000 users. C'mon, I want more users than that! :)

You could keep a count field in the datastore.

This appears to be the only choice and yet I've seen numerous warnings
against using the datastore to keep global values like a counter.

So how about various ways of lessening the load? You could use
memcache and then, every so often, update the datastore.

I'm not sure how dirty a solution that is but I was talking to Simon
Willison today and he suggested the same thing (and apparently they're
wondering about how best to count too).

So I pose the question to you -- how should we count with Google App
Engine?

(And if anyone from the engineering team wants to pipe in with a best
practice suggestion, please do!) :)

Mahmoud

unread,
Jun 5, 2008, 6:09:12 PM6/5/08
to Google App Engine
I thought vrypan had a pretty good approach here:
http://groups.google.com/group/google-appengine/browse_thread/thread/f2ce195dc989aa2d/a8acc8c5fc1d25c9

Basically you have a set of mini-counters. When you add a user (or an
entity that increases the count), you just pick one of the counters at
random and update it. To get the total count, you just query all
counters and sum their count.

You can increase the number of mini-counters with traffic, to decrease
the possibility of collisions. However, one would need to make sure
that the overall "count of counters" is reasonable.

Augmenting this approach with memcache would further speed things up.

-Mahmoud

Frank

unread,
Jun 5, 2008, 6:14:31 PM6/5/08
to Google App Engine
I second that, some advice from google's engineering team would be
really appreciated

from my understanding it's not recommended to update a datastore count
only if you do it too often, like at every http request.
but if it's for a user count for instance it should be ok, since
you're not gonna have new accounts created at every http request...
well hopefully anyway

the aspect that bothers me more is the possible error in the count
(the 'desynchro'? is that english?), since between the moment you
first load the entity to update the count and the moment call put(),
this count might have been changed by another request...

it's not that bad for some applications, but can definitely be for
others...

I've been tempted to use memcache also, but it's even more prone to
error since you don't control when the cache is cleared, and that can
happen at any time...

Frank

Tom Offermann

unread,
Jun 5, 2008, 6:46:45 PM6/5/08
to google-a...@googlegroups.com
Brett Slatkin talked about how to count efficiently with the datastore
in his Google IO presentation last week: "Building Scalable Web
Applications with Google App Engine."

The problem with having a global counter is that using a single entity
for all writes will result in contention and limited throughput. He
recommended using a technique he called "write shards". In other
words, use numerous counter entities to increment the count. Then,
when you want to display the total count, you query all of the counter
entities and sum them up.

Hopefully, Brett's slides (with code examples!) and a video of his
presentation will be posted soon, but until then you can read this
blog post from one of the attendees that describes the technique:

http://blog.appenginefan.com/2008/06/efficient-global-counters.html

Tom

--
Blog: http://offermann.us

Aral Balkan

unread,
Jun 6, 2008, 4:21:58 AM6/6/08
to Google App Engine
@Mahmoud: Thanks for the link -- I'd read the technique but forgot to
mention it in my post. It does look like the best solution so far,
especially considering @Tom's comment about Brett Slatkin recommending
it.

Aral

On Jun 5, 11:46 pm, "Tom Offermann" <tofferm...@gmail.com> wrote:
> Brett Slatkin talked about how to count efficiently with the datastore
> in his Google IO presentation last week: "Building Scalable Web
> Applications with Google App Engine."
<snip>

Aral Balkan

unread,
Jun 6, 2008, 4:22:19 AM6/6/08
to Google App Engine
Oh, and thanks for that link, Tom -- going to check it out now! :)

Tim Hoffman

unread,
Jun 6, 2008, 9:20:10 AM6/6/08
to Google App Engine
How about each time someone registers you push a message into Amazons
simple queue (SQS)
Do the same when they deregister

The when you are interested in a total read all the messages from the
amazon queue.
Add them to you statistics record as you are the only person/process
that
will ever update that record, you can be garunteed you will store the
correct number ;-)

Rgds

Tim

Philip

unread,
Jun 6, 2008, 10:57:07 AM6/6/08
to Google App Engine
The discussed techniques can certainly provide feasible solutions for
static counters but what about dynamic ones? How would one implement a
counter based on the set of dynamic criteria (e.g. provided by user)
that requires filtering through the datastore and determining the size
of the resulting dataset given that it is larger than 1k entries?

Chad

unread,
Jun 6, 2008, 11:23:41 AM6/6/08
to Google App Engine
Howdy.. I just started using the engine a few days ago and I freakin
love it!

It seems very limiting to be unable to do simple aggregate operations
like "COUNT" or "SUM". Does anyone know if google is ever going to add
this functionality?

Chad

max7

unread,
Jun 7, 2008, 4:37:10 AM6/7/08
to Google App Engine
Google is not going to add this functionality as all goals of
appengine is to make it scalable by removing such functionality like
COUNT and SUM.

If google will add COUNT or SUM then app engine will not work well
with trillion record dataset.

If you need a different functionality then you possible can setup a
dedicated server with MySQL.

That solution would not scale like appengine but you will have COUNT
and SUM.

Aral Balkan

unread,
Jun 7, 2008, 5:55:03 AM6/7/08
to Google App Engine
Hi Max,

Going forward, I am sure that Google is going to be implementing
features and methodologies (like the one for counting that Brett
suggested and Tom documented) to tackle real-world use cases. Although
setting up a separate server for some features that are not supported
right now is what I'm doing also (for payment processing mostly),
that's only a stop-gap measure. In the future, I have no doubt that
Google will steadily be addresses the issues that are raised so that
eventually there will not be a need for that.

Aral

On Jun 7, 9:37 am, max7 <max.seven....@gmail.com> wrote:
> Google is not going to add this functionality as all goals of
> appengine is to make it scalable by removing such functionality like
> COUNT and SUM.
>
> If google will add COUNT or SUM then app engine will not work well
> with trillion record dataset.
>
> If you need a different functionality then you possible can setup a
> dedicated server with MySQL.
>
> That solution would not scale like appengine but you will have COUNT
> and SUM.
<snip>

nchauvat (Logilab)

unread,
Jun 11, 2008, 5:48:19 AM6/11/08
to Google App Engine
> Brett Slatkin talked about how to count efficiently with the datastore
> in his Google IO presentation last week: "Building Scalable Web
> Applications with Google App Engine."

I wrote an article to summarize what I learned from that talk:
http://www.logilab.org/blog/5223

Reply all
Reply to author
Forward
0 new messages