Hi,
I did a very rough test now to see how batch inserts coped.
The use case: scores get updated every second, users can check their
rank at any time and its pretty much correct.
There are up to 10 000 players, with scores ranging from 0 - 100 000.
Players and scores are randomly assigned for each insert.
I batched updates every 3 seconds, with roughly 130 updates arriving
in that time. The updates were batched using fork join queue approach
described by Brett Slatkin in
http://www.youtube.com/watch?v=zSDC_TU7rtc
(except with memcache implementation instead of datastore for work
items).
Here's basic outline of code used..
Inserting into fork join queue:
rand = random.randint(0,10000)
score = random.randint(0,100000)
#player is "Random%d"%rand
fork_join_queue_helper.add_ranker_score("Test", "Random
%d"%rand,score, useMemCache=True)
Constructor for ranker:
key = datastore_types.Key.from_path("app", ranker_name)
try:
return Ranker(datastore.Get(key)[ranker_name])
except datastore_errors.EntityNotFoundError:
r = Ranker.Create([0, 100000], 5)
app = datastore.Entity("app", name=ranker_name)
app[ranker_name] = r.rootkey
datastore.Put(app)
time.sleep(2)
return r
Insert:
ranker = get_ranker(ranker_name)
ranker.SetScores(scores)
Results:
Batching results in 3 second intervals for updating 130 records was
handled fine The time to run was about 2 seconds to insert 130
records, so it could probably handle more (however many can fit in
space of 1 second?). Handles at least 43 rank inserts a second.
I didn't get any write contention issues.
Cost was *really" large doing it this way. From the logs: time to run
ms=2105 cpu_ms=50142 api_cpu_ms=46875 cpm_usd=1.393001. ouch.
Quesions:
Is my branching factor too low or too high?
Is the test not representative of real data (very random scores, as
opposed to constantly increasing scores?).
Is the score range too high or too low?
When randomly assigning user key, I could insert the same key more
than once into the batch. This may slow down the insert.
Some thoughts:
In an application where scores are updated very rapidly, it would be
best to save the users who's scores changed, and periodically retrieve
their current score to run the batch update. e.g. A user's score
updates 20 times in 1 minute, updating 20 scores in ranklist is
wasteful, it would be best to get their score at the 20th update and
only do a single ranklist insert.
Thanks
Rob