Entity Sizing / Grouping wrt New Pricing

65 views
Skip to first unread message

Steve

unread,
Sep 16, 2011, 2:18:43 AM9/16/11
to google-a...@googlegroups.com
I'm looking for some opinions on to what degree I should aggregate my now small entities together into larger entities.  Presently I have 50,000 "Day" entities where each entity is represents a day.

They are relatively small with 6 float, 2 bool, 1 int, and 1 string property.  No property indexes.  Datastore statistics says they average 161 bytes of data and 80 bytes of metadata (241b total).

80% of my user requests are GETs in which I read:
70% of the time, 10 day entities
25% of the time, 30 day entities
05% of the time, 365 day entities

20 of my user requests are POSTs in which I read & write:
75% of the time, 1 day entity
15% of the time, 7 day entities
10% of the time, ~15 day entities

Since the new pricing is going to charge me per entity read and per entity write (and thankfully no property indexes here), I think I should look at reducing how many reads and writes are involved.  I could very easily chunk these individual day entities into groups of 10, or groups of 30.  That would (by my rough guess on metadata savings) put the entity size around 2k or 6k respectively.

I am wondering where the line is between retrieving fewer entities and each entity becoming too big because the overhead of unwanted days.  With a 10 day chunk, my most frequent GET request would usually need 2 entities (4k) where only half the data was in the needed range.  At a 30 day chunk, usually 1 entity would suffice (6k) but 4k of that would be unwanted overhead.

I'm having a hard time getting some internal model for what the impact of serializing & deserializing the overhead days would be.  I wish appstats wasn't just for RPCs.  I'm guessing the extra time to transfer the larger entities to/from the datastore is relatively minimal with Google's network infrastructure.  But now that CPU is throttled down to 600Mhz I don't know what kind of latencies I'd be adding in with serialization.

Right now my most common POST operation is to put 1 entity of 241b.  With a 30 day chunk, that would be still a single entity put but 6k in size.

Any opinions, ideas, gut feelings, etc?

Cheers,
Steve

Joops

unread,
Sep 16, 2011, 10:03:49 AM9/16/11
to Google App Engine

Going through a similar process myself.
(combining multiple entities into single entities as bundles of Json)

I think it's a good idea and I think you will want to do experiments
to see what size works best for you.
I have made it so I can tweak the point at which my entities are
forked into separate entities.
I found that deserialistion was slow when my single entities where too
large.

Don't forget memcache! (so if you need the data for today, you can
just grab this months data from memcache)

J

Steve

unread,
Sep 16, 2011, 1:13:25 PM9/16/11
to google-a...@googlegroups.com
Thanks for the input.  If you don't mind me asking, how large were your entities when you noticed deserialization taking a long time?

Cheers,
Steve

Jeff Schnitzer

unread,
Sep 16, 2011, 2:40:56 PM9/16/11
to google-a...@googlegroups.com
The first question I would ask is: Is this really a problem? Have
you looked at your bill and decided for certain that the savings would
be worth changing your code & possibly cluttering your business logic?

That said, I would be tempted to see what optimization you can make
with memcache. Seems like your write load is light and your read load
is heavy, so there's probably a lot of opportunity here without
fundamentally changing your data architecture.

Jeff

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/6Tk1jzXYZTAJ.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>

Steve

unread,
Sep 16, 2011, 3:40:40 PM9/16/11
to google-a...@googlegroups.com
Hi Jeff,

Yes, it's really a concern.  Previously fetching 20 small sequential entities cost about the same as fetching 1 large entity 20x the size of the small ones.  Now it's costing 10-20x as much because of the per-read (and per write for puts) pricing.

I'm already using memcache extensively, but thanks for the suggestion in case I wasn't.  It's a big help... assuming google doesn't start charging per get/put next year.

I'm not keen on doing extra work either.  But I think now is the time to do it.  I'm refactoring a lot of other things to target the new billing realities.  Startup times used to be a problem, now it's keeping idle instances unnecessarily.  I might as well adjust the entity read/write load now while I'm up to my elbows in the code rather than have to come back in a year when it's not all fresh in my head.

Cheers,
Steve

Reply all
Reply to author
Forward
0 new messages