A suggestion for efficient counters in datastore

189 views
Skip to first unread message

vrypan

unread,
Apr 30, 2008, 6:09:08 AM4/30/08
to Google App Engine
I've been trying to deal with a simple "counter" problem, like keeping
count of pageviews. Doing this in datastore is not trivial as I've
found out.
This post is actually the result of a good discussion done here:
<http://groups.google.com/group/google-appengine/browse_thread/thread/
007dedb7d65bdf4f>

Here is the code I've come up with.
An example usage would be as simple as adding a line like (where
page_id is a unique string identifying each page)
Acc(page_id).acc()
in each one of your pages. Getting the total coun is as simple as
Acc(page_id).val()
(Due to the way the total count is calculated, this may not give
accurate results if you are in the middle of a traffic spike, but it's
good enough for web analytics usage)

-- begin code --
class AccVals(db.Model):
cluster = db.StringProperty(required=True)
count = db.IntegerProperty(required=True)
updated = db.DateTimeProperty(auto_now=True)
rand = db.FloatProperty()

class Acc():
def __init__(self, name,init=0):
self.__sec = 0.1
self.__name = name
self.__init = init

def inc(self):
def trans(key):
obj = AccVals.get(key)
obj.count += 1
obj.put()
self.__val = obj.count

q = db.Query(AccVals).filter('cluster =',
self.__name).filter('rand >', random.random()).get()
if (q):
if (datetime.datetime.now() - q.updated <
datetime.timedelta(0,self.__sec)):
obj = AccVals(cluster=self.__name,
count=self.__init, rand=random.random() )
key = obj.put()
else:
key = q.key()
else:
obj = AccVals(cluster=self.__name,
count=self.__init, rand=1.0 )
key = obj.put()

db.run_in_transaction(trans,key)
return self.__val

def val(self):
total = 0
q = AccVals.all()
q.filter('cluster =',self.__name)
for r in q:
total += r.count
return total

-- end code --

It behaves relatively good and looks like it can scale no matter how
much traffic or traffic spikes you have.

If you look into it, you will see that a "counter instance" is chosen
in random. You may be tempted to use the "instance" that was updated
longer in the past ( order('-updated').get() ), but it turns out that
when you have a traffic spike (or whatever it is your counters count)
the indexes are not updated soon enough and this will return the last
records that were updated :-) It looks like selecting a random
instance is no big deal in low traffic and works much better in high
traffic. I've also seen that after a while, you end up with the number
of counter instances that are required to handle the traffic of the
specific counter with few transaction collisions.

There is one interesting point: the value of self.__sec. I set it to
0.1 seconds, but this is just a value that looked good after some
tests. I have the impression that this value is *related* to some kind
of "global AppEngine constant", measuring the time it takes for a
transaction to complete and safely propagate to the rest of the
infrastructure. I guess this varies, depending on the resource
allocation done for a specific app. Could someone from the AppEngine
development team give us some insight on this?

As I've mentioned before, I'm a Python newbie, so use the code above
at your risk :-)

Johan Carlsson

unread,
May 24, 2008, 11:54:27 AM5/24/08
to Google App Engine
On Apr 30, 12:09 pm, vrypan <vry...@gmail.com> wrote:
...

Interesting!
This is, if I'm not incorrect a Virtually Synchronized Counter?

Here's a thought, would it be possible to store the count partitions
in memory instead of partitioned entities and only write to the
datastore
in a random intervals or when the script ends (in a try/finally
statement).
Keeping the sum in a single entity and cache in memory.

The memory copy would be updated on writes and otherwise an estimate
of the real number.

The entity counter would always represent a true value, even though
all write may not have
updated yet. The in-memory would always shot <= real counter value.

Regards,
Johan
Reply all
Reply to author
Forward
0 new messages