Question about datastore cost optimization by reusing old entities for new data

55 views
Skip to first unread message

Rishi Arora

unread,
Sep 15, 2011, 3:31:59 PM9/15/11
to google-a...@googlegroups.com
In my app I have a an Entity group that stores user activity data - like a high level application log of user activity.  This is the Entity Group that I expect to grow the fastest over time, and I have decided to keep only the last 3 months of "user activity" data to avoid indefinite datastore growth.  Regarding how to keep this "3 month moving window", the obvious answer was - run a cron job daily that deletes entities older than 3 months, with some upper bound on number of deletes per day.  However, a better solution comes to mind, and I'm wondering if someone can comment on whether it is indeed better:

When I need to create a new User Activity entity, I should first search for an entity that is 3+ months old.  If I find one, I'll "overwrite" it with the new user activity.  This costs me one read and one write operations (and two more writes because there are two indexes associates with this entity kind).  That's a total of 1 read and 3 writes with the new solution.  In my current solution, I only have 3 writes and no reads, but 3 months later this entity will have to be deleted, which will incur an additional read and 3 writes (one for the entity itself and 2 for the two indexes).  So, my theory is that my current solution has a total of 1 read + 6 writes, and my new solution saves me 3 writes.  Anything wrong with this theory?  Oh and also, I save the CPU time too because I don't need to run a cron job every day.

Thanks in advance.
-Rishi

Gopal Patel

unread,
Sep 16, 2011, 5:17:49 AM9/16/11
to google-a...@googlegroups.com
nice idea but you are not counting the "search for an entity that is 3+ months old" data stores cycle....

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Rishi Arora

unread,
Sep 16, 2011, 5:45:22 AM9/16/11
to google-a...@googlegroups.com
Ah yes, good point.  So I suppose I should assume I'm not saving any CPU cycles.  And cycles spent in a end-of-day cron job deleting old data, will now be spent searching for old entities at the time of writing new ones.  I also found out that reading an entity and writing it back with a different value of a key causes 3 writes - one for the entity, and two for the key - once when the old key is deleted, and second, when the new key is written.  So, I'm only saving one write.  But, if I find a way to preserve one of the two keys in my entity, when modifying an old entity with new data, I'm saving two writes.

To summarize, my best guess at this point is:

Current method: 1 read, 6 writes, CPU cycles in end-of-day cron job.
New method: 1 read, 4 writes, CPU cycles in searching for old entities that can be reused.

Daniel Hans

unread,
Sep 15, 2011, 5:34:25 PM9/15/11
to google-a...@googlegroups.com
On Thu, Sep 15, 2011 at 9:31 PM, Rishi Arora <rishi...@ship-rack.com> wrote:
In my app I have a an Entity group that stores user activity data - like a high level application log of user activity.  This is the Entity Group that I expect to grow the fastest over time, and I have decided to keep only the last 3 months of "user activity" data to avoid indefinite datastore growth.  Regarding how to keep this "3 month moving window", the obvious answer was - run a cron job daily that deletes entities older than 3 months, with some upper bound on number of deletes per day.  However, a better solution comes to mind, and I'm wondering if someone can comment on whether it is indeed better:

When I need to create a new User Activity entity, I should first search for an entity that is 3+ months old.  If I find one, I'll "overwrite" it with the new user activity.  This costs me one read and one write operations (and two more writes because there are two indexes associates with this entity kind).  That's a total of 1 read and 3 writes with the new solution.  In my current solution, I only have 3 writes and no reads, but 3 months later this entity will have to be deleted, which will incur an additional read and 3 writes (one for the entity itself and 2 for the two indexes).  So, my theory is that my current solution has a total of 1 read + 6 writes, and my new solution saves me 3 writes.  Anything wrong with this theory?  Oh and also, I save the CPU time too because I don't need to run a cron job every day.

It may be a minor downside, but if you do not run a cron job every day with the new approach, you may have some obsolete entities which have more than three months. Also, if you want to that all in an end-user thread (search for an old entity, override it and save), there is actually more to do. But of course you can delegate that job to a background task.

Best,
Daniel
Reply all
Reply to author
Forward
0 new messages