Issue #117

19 views
Skip to first unread message

Prateek Malhotra

unread,
Mar 13, 2012, 5:04:34 PM3/13/12
to kay-...@googlegroups.com
Continuing Discussion:

Well with pull queues you can batch the updates instead of doing them all at once. This would be a requirement of the application itself. For instance, if the application kept counters for back-end reporting, updating the counters may only need to be done every 12 or 24 hours. During the day, tasks would be added to the pull queues and a cronjob can be launched every 12 or 24 hours to pull down all of the tasks waiting, count/aggregate them in memory, then update each counter. There would be no write-limit reached in this manner and you would have significantly fewer writes. I'd rather use up Task Queue resources than higher RPC counts and writes (those cost a TON more $$).

If the counters are needed for real-time information, then obviously sharded counters would be the best option (in fact I use sharded counters myself for a critical part of my application). But depending on the requirement of the app, you could just the cron job to compute from the pull queues more frequently (e.g every 5 minutes). 

I saw a more recent Google I/O piece on this method.

Let me know what you think.

Paolo Casciello

unread,
Mar 14, 2012, 6:40:00 AM3/14/12
to kay-...@googlegroups.com
Right, the problem with sharded counters is the high RPCs usage... :(

Using Pull Queues is a better solution for deferred counters.
We could implement both, each with the correct disclaimer.
For Sharded is easy. Just a builtin object to use standalone for example c = Counter(MyModel, 'Views').

For deferred it could be a similar object from the frontend perspective and some modifications in app.yaml, settings, cron to make it working.
Kay then could have a dedicated pull-queue, let's say "kay-pqueues-deferred-counters" and a task consumer to count/aggregate/write.

What do you think?

What's the max number of tasks in a pull queue?

Prateek Malhotra

unread,
Mar 14, 2012, 11:40:05 AM3/14/12
to kay-...@googlegroups.com
I believe the daily quota for # of tasks is the maximum number of tasks allowed in any single queue. 1,000,000,000 API calls a day and  10,000,900,000 stored tasks.

Not exactly sure how your syntax for sharded counters would work, why are you passing in a Model? 

Having support for the two makes the most sense. Just be sure to clearly state the pros/cons for each method in the documentation along with helpful tutorials.

Thanks,
Prateek

Paolo Casciello

unread,
Mar 15, 2012, 1:22:11 PM3/15/12
to kay-...@googlegroups.com
Ok so the limits are high enough for a counter's typical use case :)

The implementation I've in mind is generalized but maybe linked to a model.
Imagine a class defined as follows:

class ShardedCounter(db.Model):
    mod = db.StringProperty(required=True, default='')
    name = db.StringProperty(required=True)
    count = db.IntegerProperty(required=True, default=0)

Use cases are:
 1) When you need to add a count property to a model
     c = get_sharded_counter(model=Post, name="comments")
     this will get the sharded counter named "comments" which is linked to Post.key
 2) When you need to keep a global counter
     c = get_sharded_counter(name="site_logins")
     this will get the sharded counter name 'site_logins' which "mod" property is ''.

ok, the above class could be enhanced to use only the name property, defining it by the concatenation of mod.key and name.

this is only a possible implementation in my mind :D

What do you think?

Prateek Malhotra

unread,
Mar 17, 2012, 1:00:07 PM3/17/12
to kay-...@googlegroups.com
I see what you mean now but we could leave it to the user to specify a counter name with the model key built in so that Kay wouldn't have to track it. That implementation seems to assume the use case of a counter would be based around a model, but this may not always be the case. If we are going to provide functions to assist in tracking counters for possible use cases, we should provide functions for all use cases that are used or at least popular but the scope of this may be to large. Providing the basic means to perform counting should be sufficient and we can allow the developer to come up with ways in which to leverage the counter names.

I think a better scope for this implementation would be to provide a means to accommodate a lot of SQL aggregate functions. COUNT() is what counters cover, but SUM, AVG, etc. could easily be managed in a similar fashion and would be a great feature to add to Kay. This would emphasize pre-calculation which is how developers should track aggregates, and make it easy for developers to maintain.

Also, going forward, would it be a bad idea to implement NDB based models/functions? We can always expose non-tasklet functions for basic usage, but since NDB is almost out of experimental (possibly in the next SDK release) KAY can provide tasklets to help concurrency.

Example of how to wrap a Tasklet

@ndb.tasklet
def get_sharded_counter_tasklet(name):
  counter = yield ndb.get_context().memcache_get(name)
  if counter: raise ndb.Result(counter)
  ...
  raise ndb.Result(counter)

def get_sharded_counter(name):
  return get_sharded_counter_tasklet(name).get_result()

- Prateek

--
You received this message because you are subscribed to the Google Groups "kay-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/kay-users/-/E6dcmYqLhdEJ.

To post to this group, send email to kay-...@googlegroups.com.
To unsubscribe from this group, send email to kay-users+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/kay-users?hl=en.

Paolo Casciello

unread,
Mar 23, 2012, 5:55:50 AM3/23/12
to kay-...@googlegroups.com
I agree.
And I also agree that NDB now is stable enough to base new features on it if it gives advantages.

Just a clarification, why you use ndb.get_context().memcache_​get(name) ?
ndb checks in-process-cache->memcache->ds automatically as far as I know.

At least as of the latest docs.

Bye,
  Paolo

Prateek Malhotra

unread,
Mar 23, 2012, 8:00:15 PM3/23/12
to kay-...@googlegroups.com
NDB will cache in memcache and memory of GET() calls on an entity. I'm not sure if a memcache_get() pulls from memory or not, but it will auto-batch memcache calls in this way. Also, with the sharding counters method, there wouldn't be a datastore entity you pull from to get an aggregate, rather, a number of sharded counters that you aggregate into a single value. You can modify/update this value in memcache as you update counters to increase lookup times.

I actually have a lot of optimizations in place for using sharded counters with NDB, improvements on the example code you previously linked to. When its time to build this feature out let me know and I can review your code or provide my own.

-Prateek

--
You received this message because you are subscribed to the Google Groups "kay-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/kay-users/-/qxjMruznPLgJ.
Reply all
Reply to author
Forward
0 new messages