RankList on High Replication Datastore

Rob

unread,

Jul 4, 2011, 11:45:14 AM7/4/11

to Google App Engine Ranklist

Hi,

Will RankList still work on high replication datastore if multiple
scores are entered very close (time wise) together?

Thanks
Rob

Bartholomew Furrow

unread,

Jul 8, 2011, 11:28:30 PM7/8/11

to google-app-en...@googlegroups.com

Hi Rob,

I haven't tried RankList on high-replication, so I don't know the answer to your question; how close is "very close"? I suspect, though, that batching updates will give you the scalability you need. See:

http://groups.google.com/group/google-app-engine-ranklist/browse_thread/thread/7678fb99b58b79cf

...for a discussion of that.

Good luck, and let me know how it goes!

Bartholomew

--
You received this message because you are subscribed to the Google Groups "Google App Engine Ranklist" group.
To post to this group, send email to google-app-en...@googlegroups.com.
To unsubscribe from this group, send email to google-app-engine-r...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-app-engine-ranklist?hl=en.

Rob

unread,

Jul 9, 2011, 5:51:38 AM7/9/11

to Google App Engine Ranklist

Hi,

Thanks Bartholomew.
I'm now running on high replication, and haven't seen any issues with
ranklist (however I haven't tested it thoroughly yet).I am batching
the updates, which resolved some of my previous issues (same as those
outlined in your thread).

Thanks
Rob

On Jul 9, 5:28 am, Bartholomew Furrow <fur...@gmail.com> wrote:
> Hi Rob,
>
> I haven't tried RankList on high-replication, so I don't know the answer to
> your question; how close is "very close"? I suspect, though, that batching

> updates will give you the scalability you need. See:http://groups.google.com/group/google-app-engine-ranklist/browse_thre...

Bartholomew Furrow

unread,

Jul 18, 2011, 5:01:07 PM7/18/11

to google-app-en...@googlegroups.com

Rob,

Thanks for following up! If you ever come across this number, could you let us know your max updates/second with the batching -- and what your constructor call looks like for the ranker?

Cheers!

Bartholomew

Rob

unread,

Jul 21, 2011, 5:59:01 AM7/21/11

to Google App Engine Ranklist

Hi,

I did a very rough test now to see how batch inserts coped.
The use case: scores get updated every second, users can check their
rank at any time and its pretty much correct.
There are up to 10 000 players, with scores ranging from 0 - 100 000.
Players and scores are randomly assigned for each insert.
I batched updates every 3 seconds, with roughly 130 updates arriving
in that time. The updates were batched using fork join queue approach
described by Brett Slatkin in http://www.youtube.com/watch?v=zSDC_TU7rtc
(except with memcache implementation instead of datastore for work
items).
Here's basic outline of code used..

Inserting into fork join queue:
rand = random.randint(0,10000)
score = random.randint(0,100000)
#player is "Random%d"%rand
fork_join_queue_helper.add_ranker_score("Test", "Random
%d"%rand,score, useMemCache=True)

Constructor for ranker:
key = datastore_types.Key.from_path("app", ranker_name)
try:
return Ranker(datastore.Get(key)[ranker_name])
except datastore_errors.EntityNotFoundError:
r = Ranker.Create([0, 100000], 5)
app = datastore.Entity("app", name=ranker_name)
app[ranker_name] = r.rootkey
datastore.Put(app)
time.sleep(2)
return r

Insert:
ranker = get_ranker(ranker_name)
ranker.SetScores(scores)

Results:
Batching results in 3 second intervals for updating 130 records was
handled fine The time to run was about 2 seconds to insert 130
records, so it could probably handle more (however many can fit in
space of 1 second?). Handles at least 43 rank inserts a second.
I didn't get any write contention issues.
Cost was *really" large doing it this way. From the logs: time to run
ms=2105 cpu_ms=50142 api_cpu_ms=46875 cpm_usd=1.393001. ouch.

Quesions:
Is my branching factor too low or too high?
Is the test not representative of real data (very random scores, as
opposed to constantly increasing scores?).
Is the score range too high or too low?
When randomly assigning user key, I could insert the same key more
than once into the batch. This may slow down the insert.

Some thoughts:
In an application where scores are updated very rapidly, it would be
best to save the users who's scores changed, and periodically retrieve
their current score to run the batch update. e.g. A user's score
updates 20 times in 1 minute, updating 20 scores in ranklist is
wasteful, it would be best to get their score at the 20th update and
only do a single ranklist insert.

Thanks
Rob

Bartholomew Furrow

unread,

Jul 21, 2011, 7:29:50 PM7/21/11

to google-app-en...@googlegroups.com

Thanks for sending such a detailed email! This is easily the deepest exploration I've seen of ranklist's capabilities.

The updates were batched using fork join queue approach
described by Brett Slatkin in http://www.youtube.com/watch?v=zSDC_TU7rtc
(except with memcache implementation instead of datastore for work
items).

Without knowing the details, I'll just say that might be dangerous, since memcache offers essentially no consistency guarantees. You could put something in, and then the machine holding that thing could crash.

ms=2105 cpu_ms=50142 api_cpu_ms=46875 cpm_usd=1.393001. ouch.

Eesh. I guess that's what happens when you batch things up: you get to see their prices all nicely aggregated together!

Questions:

Is my branching factor too low or too high?

I'd guess that your branching factor is way too low: each node is really tiny, so you're interacting with a lot of nodes in any given operation. On the other hand, maybe changing it wouldn't reduce the number of nodes you're interacting with very much, but would increase their size more than proportionately. In Code Jam we run with a factor of 100; I think it's worth experimenting to see which ends up being faster, since that's pretty much the one lever a user of this library has.

Is the test not representative of real data (very random scores, as
opposed to constantly increasing scores?).

I think that's likely true. I suspect that most scenarios would have scores increasing (or decreasing) by a small amount relative to the total size of the range.

Is the score range too high or too low?

I would guess too low; most purposes I'd expect to have a large number of possible scores. You could store 100000 scores in one Entity.

When randomly assigning user key, I could insert the same key more
than once into the batch. This may slow down the insert.

Yeah, I dunno what that does!

Some thoughts:
In an application where scores are updated very rapidly, it would be
best to save the users who's scores changed, and periodically retrieve
their current score to run the batch update. e.g. A user's score
updates 20 times in 1 minute, updating 20 scores in ranklist is
wasteful, it would be best to get their score at the 20th update and
only do a single ranklist insert.

Yes, for sure. That's how we do it: when a user submits something, we mark the scoreboard row as row["dirty"] = True. Then we query for dirty rows and update the scores for those people.