How to determine top 10 scoring users from last 24 hours?

7 views
Skip to first unread message

Amir Michail

unread,
Jan 8, 2009, 3:33:07 PM1/8/09
to Google App Engine
Hi,

How can this be done efficiently with the current version of the GAE?

I suppose you could keep track of all events affecting scoring for the
last 24 hours. Whenever you handle a request -- even one that does
not affect scoring -- you could prune out events older than 24 hours
and add a new one if the player has received more points as a result
of the current request.

As you take out old events and add in new ones, you also update the
top 10 ranking for the last 24 hours accordingly.

The problem here is that taking out all the old events can be quite
expensive. As an extreme example, if you don't get any requests for
over 24 hours, then on the next request, you will need to clear out at
a day's worth of scoring events.

Amir

boson

unread,
Jan 8, 2009, 3:53:31 PM1/8/09
to Google App Engine
I've been planning something like this for my app, and so far this is
the best I've come up with:

First of all, you don't want to be querying to sum up scores for
individual users, so you need to keep a tally in one place. This is
BigTable business as usual.

In each relevant entity where you are tallying, track a day and a
score for that day:
user = "Kobe"
weight = "180"
...
lifetime_score = 14382 # total score over eternity
daily_day = "20090108" # the last day a score was recorded
daily_score = 153 # the score for daily_day

When you update the score, do a transaction as follows:
if today's date matches entity's daily_day:
lifetime_score += new_score
daily_score += new_score
else:
lifetime_score += new_score
daily_day = today
daily_score = new score

Then your final query is:
filter("day =", today_as("YYYYMMDD")).order("-score")

That requires a fairly simple (but custom) index.

The query ignores people who haven't scored today (the first filter
drops them), and sorts in descending on score.

And of course you should memcache the query results.

Hope this helps!

Amir Michail

unread,
Jan 8, 2009, 4:00:09 PM1/8/09
to google-a...@googlegroups.com
On Thu, Jan 8, 2009 at 3:53 PM, boson <dan.k...@gmail.com> wrote:
>
> I've been planning something like this for my app, and so far this is
> the best I've come up with:

But how would you do this with a 24 hour rolling window?

Amir
--
http://readmytweets.com
http://chatbotgame.com
http://numbrosia.com
http://twitter.com/amichail

boson

unread,
Jan 8, 2009, 5:34:31 PM1/8/09
to Google App Engine
On Jan 8, 1:00 pm, "Amir Michail" <amich...@gmail.com> wrote:
> But how would you do this with a 24 hour rolling window?

My solution was for a simple fixed 24-hour chunks.

But you could adapt it to store 24 fixed hourly chunks in each entity
and a timestamp to identify the window. Then roll the data back by
the proper # of hours on each transactional write.

hourly_scores = [13, 0, 0, 18, 28, ...] # always 24 items
hourly_score = 182 # sum of the 24 items
hourly_hour = "2009010814" # YYYYMMDDHH

The update logic involves shifting and pushing and some array
shuffling, and summing the 24 values into hourly_score. But it's all
still relatively straightforward.

I believe this would accomplish the goal (approximating a rolling 24
hour window, with 1 hour granularity), and in a way that still only
requires a single transaction for updates and a simple indexed query
for fetching the results.

Amir Michail

unread,
Jan 8, 2009, 6:44:01 PM1/8/09
to google-a...@googlegroups.com
On Thu, Jan 8, 2009 at 5:34 PM, boson <dan.k...@gmail.com> wrote:
>
> On Jan 8, 1:00 pm, "Amir Michail" <amich...@gmail.com> wrote:
>> But how would you do this with a 24 hour rolling window?
>
> My solution was for a simple fixed 24-hour chunks.
>
> But you could adapt it to store 24 fixed hourly chunks in each entity
> and a timestamp to identify the window. Then roll the data back by
> the proper # of hours on each transactional write.
>
> hourly_scores = [13, 0, 0, 18, 28, ...] # always 24 items
> hourly_score = 182 # sum of the 24 items
> hourly_hour = "2009010814" # YYYYMMDDHH

What would your query be? How would you ensure that you include not
only those who scored points in the last hour but also everyone who
scored points in the last 24 hours?

Amir

>
> The update logic involves shifting and pushing and some array
> shuffling, and summing the 24 values into hourly_score. But it's all
> still relatively straightforward.
>
> I believe this would accomplish the goal (approximating a rolling 24
> hour window, with 1 hour granularity), and in a way that still only
> requires a single transaction for updates and a simple indexed query
> for fetching the results.
> >
>



boson

unread,
Jan 8, 2009, 9:04:07 PM1/8/09
to Google App Engine


On Jan 8, 3:44 pm, "Amir Michail" <amich...@gmail.com> wrote:
> On Thu, Jan 8, 2009 at 5:34 PM, boson <dan.kam...@gmail.com> wrote:
>
> > On Jan 8, 1:00 pm, "Amir Michail" <amich...@gmail.com> wrote:
> >> But how would you do this with a 24 hour rolling window?
>
> > My solution was for a simple fixed 24-hour chunks.
>
> > But you could adapt it to store 24 fixed hourly chunks in each entity
>
> What would your query be?  How would you ensure that you include not
> only those who scored points in the last hour but also everyone who
> scored points in the last 24 hours?
>

Good question. I don't think the model I proposed works for *rolling*
windows... Just fixed windows like "day" or "month".

Please do update this thread if/when you come up with something!

Bill

unread,
Jan 9, 2009, 9:10:28 AM1/9/09
to Google App Engine
Amir,

This is one of those cases that's tough because App Engine's datastore
has no on-the-fly SUM ability. You'd have to write an updating
process that makes periodic calls to your server app to process recent
results and update the entities used for your 24-hour scoring table.

Here's how I'd do it:
- Record each event affecting scoring in an entity (call the model
ScoreEvent) with a timestamp. The key_name of each ScoreEvent would
be the timestamp + user info. On write of this entity, update a 24-hr
score entity (call the model Score24) for this particular user.
- Periodically do maintenance calls to your server to update the
Score24 entities. Each maintenance call would sort ScoreEvents on the
key and then fetch all ScoreEvents since the last Score24 maintenance
call (use a key to mark your place in the ScoreEvents stream).
Querying on keys was added to datastore in Nov:
http://code.google.com/appengine/docs/datastore/queriesandindexes.html#Queries_on_Keys
So for each Score24 maintenance call, you'd fetch some # of ScoreEvent
entities, make a first pass so you combine all events into user
deltas, then update the associated user Score24 entities. Then save
the last ScoreEvent key that you processed in this maintenance call.
On the next maintenance call, use that "last ScoreEvent key" in the
query just as if you were paging:

query = ScoreEvent.gql('WHERE __key__ > :1 ORDER BY __key__',
last_key)

So your Score24 entities will reflect each user's score over the last
24 hours with some error due to the frequency of your maintenance
calls.

Now getting the top X scoring users over the last 24 hours is a query
on Score24 entities where you sort on score and fetch the top X
entities.

Best,
Bill
Reply all
Reply to author
Forward
0 new messages