Minimizing writes to the HRD

96 views
Skip to first unread message

mikegleasonjr

unread,
Aug 22, 2011, 12:33:19 PM8/22/11
to Google App Engine
Hi,

I'm designing an app which provides a service. Like a public API
exposed to the world.

Ppl are able to use my service via the API. So one call to my app is
an API call. Every calls are read only calls to the datastore.

I want to track each API call (have the total count of API calls for
each account). My business model revolves around the API usage by its
members. So when a client makes an API call, I increment the call
count in its entity. I want to follow the principle that if you're
using the API more, you will pay more. I want to follow that App
Engine philosophy regarding the business model.

But It seems a little hard on the HRD, to do a 'put' on an entity for
every API call.

What I can live with is an alternative solution: Maybe I can have an
application-wide dictionary that keeps tracks of the call counts in
memory, and that would flush its counts to the datastore every hour
(via a cron job). The worst that could happen is that I would loose an
hour of API usage if there's a downtime. I can live with that.

I'm pretty new to java, and considering the nature of the distributed
environment, what would be your strategy for implementing such a
behavior? I guess a dictionary for each web server instance in memory
would work anyway, they would each increment the API calls count every
hour to the datastore.

So a cron job would be necessary, and a locking mechanism when
accessing this global dictionary.

Maybe HDR is fast enough to sustain one of my client which can make
max 100 API calls / sec. But also I want the app to use as little
resources as possible to keep the costs down for my clients and the
speed up :)

What are your ideas!?

Thanks!

Barry Hunter

unread,
Aug 22, 2011, 2:04:20 PM8/22/11
to google-a...@googlegroups.com
Memory is not going to cut it. Your 'instances' are put up and torn
down all the time. So while instance memory might work for short term
caching, you can't rely on it. *

If you can live with occasional losses, just use memcache incr() -
memcache can still loose data**

http://code.google.com/appengine/docs/java/javadoc/com/google/appengine/api/memcache/MemcacheService.html#increment(java.lang.Object,
long)

Then have periodic tasks, that 'writes' the value from memcache into
the datastore.


* Unless you use backends. They are resident, and less likly to suffer
memory loss. But in general from what I understand backends wouldnt be
a good fit to implement the actual API serving.


** But if you experience regular data loss, just increase the
frequency of tasks.

> --
> You received this message because you are subscribed to the Google Groups "Google App Engine" group.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
>
>

Mike Gleason jr Couturier

unread,
Aug 22, 2011, 2:32:58 PM8/22/11
to google-a...@googlegroups.com
Thanks Barry,

For the regular writes from memcache to the hrd, you would set up a cron job?

In that case my API service would be pretty efficient I guess with read only calls from my clients.

Thank you again

Barry Hunter

unread,
Aug 22, 2011, 2:47:28 PM8/22/11
to google-a...@googlegroups.com
Could use cron. or more dynamic tasks.

Depends party on how many clients you have. One of the issues you face
is that can't "query" memcache to see what keys there are waiting to
be written. So you either have to just try them all, or maintain some
sort of list.

If a small number (under say 1000?) then the cron running a check for
all clients would probably work find.

But if many more than that, esp if only a few are active in any
period, then its a huge overhead, to check them all, when many will
still be missing/zero.

In which case when a hit from a client comes, you queue up a task
unique to the client. Set an eta for an hour later. Or similar.
That task then fires later, and wipes the current value from memcache,
and adds it to the datastore.

A nice feature is can use task names to prevent adding more tasks for
a given user. So name the task like "client_month_day_hour" or
similar. Then successive attempts (because you try on every hit) will
just fail. The first one exists and will run.

(But the cron version is easier. Try that first. If its not keeping
up, or too slow, the try the task queue version)

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.

> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/7ZZEf6HRSoIJ.

Mike Gleason jr Couturier

unread,
Aug 22, 2011, 3:29:55 PM8/22/11
to google-a...@googlegroups.com
Thanks I'll give it a shot
Reply all
Reply to author
Forward
0 new messages