I'll try to state a more formal version of the problem, then offer my
suggestion.
Problem:
Assume that there is an entity type called Player, and each Player has
a score. A score is an integer. For any given Player P with a score
of X, we want to know how many other Players have scores that are
greater than or equal to X. This is their Rank.
Let's say we have the following Players and scores:
Peter 30
Ben 27
Sylvain 42
Marzia 50
Ryan 12
Peter's Rank would be 3 because Marzia, Sylvain and Peter all have
scores greater than or equal to Peter's score of 30.
Marzia's Rank would be 1, and Ryan's would be 5. The tricky part is
that Ben needs to do this efficiently for hundreds of thousands of
items ( with a fairly frequent number of updates i assume)
Getting this kind of answer is relatively simple on a SQL database,
though I don't honestly know how scalable the typical count(*) > X
solutions are. With the proper indexes, they could probably be very
fast, but i don't know if you'd get lots of contention on writing to
that table/index.
This does seem like a problem that any game with the concept of 'high
score' built in app engine would need to solve. (actually, this
problem probably generalizes even further) I'm pretty sure I will
need something like this at some point myself. So I might as well
take a stab at it now. Here are my initial thoughts... Encouraging or
disparaging comments welcome.
To solve this problem with the GAE datastore, I propose building a
ShardedRanker.
The lowest level item we need to consider is a RankedItem. A
RankedItem has a key, a value, and a reference to a RankingGroup. The
key would be a player's id, and the value would be a player's score.
A ShardedRanker would have references to a number of RankingGroups.
Each RankingGroup would contain a list of RankedItems, and be
responsible for knowing the relative rank of each item it contains.
Each RankingGroup would also be responsible for knowing the highest
and lowest scores its items contain.
When we want to know the rank of a particular item, we can get its
overall rank by combining its relative rank within its RankingGroup
with the total number of items contained in higher ranked Groups.
This will require 1 datastore read to get the info from the
RankingGroup, plus one read for each of the RankingGroups ranked
higher than the 'target' group.
When we insert a new RankedItem, we will need to update the
RankingGroup it belongs to, and maybe the index which tells us which
Group contains which ranges of scores. At some point, if a
RankingGroup gets to big, it would probably be split.
There should be lots of opportunity to memcache things here, reducing
the need for datastore reads. As the total number of items increases,
it might be necessary to create multiple levels of RankingGroups.
-peter