Relationship Issue: What would be the best way to handle this?

631 views
Skip to first unread message

spierce7

unread,
Feb 16, 2012, 4:37:25 AM2/16/12
to objectify-appengine
Hey guys, I'm fairly new to Objectify, and I had a quick question for
the best way to do something:

Lets say I have an application that allows people to send and receive
messages (think e-mail for simplicity). When my app loads, I don't
want to load every single message from every single contact that's
sent a message to a given user. That's would seem to be a waste.
Instead, I want to load all of the contacts that a user has messages
from (read or unread) so that I can display a list of the contacts on
my app, and when the user clicks on a given contact I want to load all
of the messages from that contact to display to the user.

What I was doing was I was having an entity of type Message, where I'd
have a reference to the contact on there. However I can't find a good
way of doing this without loading all of the Messages for an account.
I read the Objectify wiki on relationships, and I still can't think of
a good way to do this that isn't extremely inefficient.

I'm trying to use as few Reads, and Writes as possible, and where
possible I'm trying to use Smalls instead of Reads (Overall cost to
run my app is a big concern of mine while I'm making this).

On Objectify, how should I be doing this?

Thanks,
Scott

Jeff Schnitzer

unread,
Feb 16, 2012, 12:15:03 PM2/16/12
to objectify...@googlegroups.com
There are three main options when dealing with "aggregation data" like what you describe:

1) Calculate it when you need it

You've concluded, rightly I think, that this is too expensive.

2) Calculate it at batch intervals and store this result

Not very satisfying since it involves a delay.  Plus you don't want to comb through your entire database every night.

3) Update the aggregation when the data changes

This approach involves a little more work every time the data changes, but it's almost certainly what you want to do.

Create some sort of collection of contacts for each user.  When a message arrives, make sure a sender contact exists for that recipient.  Maybe you also want to delete the contact when the recipient deletes the last message from a sender.

Be careful not to bump into entity group transaction rate limits (one write per second).  I'll walk you through some options:

 1) You could store a list of contacts in each recipient:

class Person {
   @Id Long id;
   Set<Key<Person>> contacts;
}

This would be a distinct problem if, say, the recipient received mail from 20 new people all at once.  This is almost certainly a bad idea.  On the other hand, it's blazingly fast and efficient to look up who your contacts are.  A minor improvement would be to move this into a separate entity parented by the person so you aren't always loading that data:

class Contacts {
   @Parent Key<Person> owner;
   @Id long id = 1;   // there's only ever one of these per person, and it should have a predictable key for fetching
   Set<Key<Person>> contacts;
}

Of course, the Set in a single entity gives you a 50,000 entry limit.  It might be slightly smaller than this if you hit the 1M entity size limit first.  If your keys are ~20 chars, it'll be about the same.  If this is an issue you could allow multiple Contacts entities, at which point you have something that looks like the Relation Index Entity pattern from Brett Slatkin's 2009 Google I/O talk:  http://www.youtube.com/watch?v=AgaL6NGpkB8

 2) You could store a list of contacts in the other direction

class Person {
   @Id Long id;
   @Index Set<Key<Peson>> contactOf;
}

This makes it a bit more expensive to find out who your contacts are - you need a keys-only query, not a simple get-by-key.  But you aren't really limited by the entity write rate anymore.  People probably don't send more than one message per second, and if they send out 1000 messages in bulk, you can update the contactOf in a single transaction.

As above, you probably want to move this index into a separate entity:

class Contacts {
   @Parent Key<Person> person;
   @Id long id = 1;   // there's only ever one of these per person, and it should have a predictable key for fetching
   Set<Key<Person>> of;
}

 3) You could also store these contacts in a completely separate entity

class Contact {
   @Parent Key<Person> person;
   @Id Long id;
   @Index Key<Person> owner;
}

This is really just a less-space-efficient way of doing solution #2.

The important thing is to keep updating this structure when every message is sent or received.

Jeff

spierce7

unread,
Feb 16, 2012, 6:27:22 PM2/16/12
to objectify...@googlegroups.com
Jeff,

Thanks for your comprehensive reply. I have a few questions:

1. What is the "entity group transaction rate limits". You mention it's 1 write per second. Are you saying that in a given transaction, I can only write to a specific entity group once per second in that transaction? What about another transaction that's happening in parallel on the same entity group? Is that going to effect that? What about on another appengine Instance? What if I just choose not to use a transaction? It wouldn't be the end of the world if they were out of sync. I tried looking online, but I couldn't find anything on this.

2. Here is my problem. These messages aren't completely based around a persons account only receiving a message from someone. I need to be able to get a list of all the people that the user has sent or received a message from, and I have currently stored on the server (I'll be pruning the database every 48 hours or so to get rid of old messages). Based on that, which method would you suggest? Right now I'm kind of afraid that we're going to be charged a fortune of read and write fees on the database. If I understand your example 2 correctly, every time I receive or send a message, I need to load up the Contacts class for the given user, and verify this persons account is in the set. This sounds pretty expensive also.

What if I had a contact Parent entity, with a list of all their messages, but then any time a new message is sent or received, I would have to load this entire entity, and add 1 item to the list and then save it again? That would be even more expensive wouldn't it?

Thanks,
Scott

Jeff Schnitzer

unread,
Feb 16, 2012, 7:10:15 PM2/16/12
to objectify...@googlegroups.com
On Thu, Feb 16, 2012 at 6:27 PM, spierce7 <spie...@gmail.com> wrote:
Jeff,

Thanks for your comprehensive reply. I have a few questions:

1. What is the "entity group transaction rate limits". You mention it's 1 write per second. Are you saying that in a given transaction, I can only write to a specific entity group once per second in that transaction? What about another transaction that's happening in parallel on the same entity group? Is that going to effect that? What about on another appengine Instance? What if I just choose not to use a transaction? It wouldn't be the end of the world if they were out of sync. I tried looking online, but I couldn't find anything on this.

No no, I mean that any given entity group sustain at most 1 write per second.  If you have a number of transactions contending for write to an entity group, about one per second will succeed while all the others rollback with ConcurrentModificationException.

And that's before XG transactions.  That will divide throughput somewhat more... although exactly how much more depends on the amount of contention on the "other" entity group.  Figure a 2-EG transaction will be half the throughput of a 1-EG transaction... but I would really love to see some benchmarks.
 
2. Here is my problem. These messages aren't completely based around a persons account only receiving a message from someone. I need to be able to get a list of all the people that the user has sent or received a message from, and I have currently stored on the server (I'll be pruning the database every 48 hours or so to get rid of old messages). Based on that, which method would you suggest? Right now I'm kind of afraid that we're going to be charged a fortune of read and write fees on the database. If I understand your example 2 correctly, every time I receive or send a message, I need to load up the Contacts class for the given user, and verify this persons account is in the set. This sounds pretty expensive also.

So if I follow that right the problem is not just that you have a lot of messages, but you have a refcounting problem too. You want to show a tree:

--- MESSAGES FROM:
  + Bob
  + Fred
  + Sally

...but only show Fred when there are current messages from Fred.  And Fred should go away in 48 hours when the last of his messages are purged.

In RDBMS-land it's a "select distinct (sender) from messages" or something like that... which requires scanning all the messages and doesn't work very well at scale.  That said, this would probably be easy to solve in MongoDB, especially if the number of messages per user is actually fairly small.  But you'd have to be careful about scaling that up.

What if I had a contact Parent entity, with a list of all their messages, but then any time a new message is sent or received, I would have to load this entire entity, and add 1 item to the list and then save it again? That would be even more expensive wouldn't it?

No, that would be terrible.  At the very least you could just store a refcount and save that.  That would be somewhat less than double the cost of creating and deleting all those messages.

But you don't really need a refcount, you just need a boolean state.  When you create the first Message, load the Contact(Recipient, Sender) entity... if it doesn't exist, create it.  Whenever a message is deleted, query to see if there are additional messages and if no more, delete the Contact(Receipient, Sender).  At the very least you've reduced the # of writes considerably.

Jeff

spierce7

unread,
Feb 16, 2012, 8:19:03 PM2/16/12
to objectify-appengine
Thanks for your reply.

So if I understand you correctly, you are recommending that use the
above option 3?

class Contact {
@Parent Key<Person> userId; //Id of the person's user account
@Id Long id;
@Index Key<Person> otherPersonId; //Id of the other person that the
given user either sent a message to, or received a message from.
}

Then, every time I get a message, or receive a message, I can check to
see if the above entity exists for the user, and the id of the other
person. If it doesn't exist, then I create it. Then when the person
loads the page I can query the Contact entity for all entities that
have that person's given userId, and thus I will have a list of all of
the recent communication that the user has had with users, and I can
display it to them.

Sorry for trying to be so thorough, but I'm trying to get this right
the first time around. :-) I really appreciate your help so far.

Also, is there any place I can read about a given entity group only
being able to receive 1 write per second? I took a look and didn't see
anything on this in any of the documentation.

Thanks again,
Scott

On Feb 16, 7:10 pm, Jeff Schnitzer <j...@infohazard.org> wrote:

spierce7

unread,
Feb 17, 2012, 1:03:47 AM2/17/12
to objectify-appengine
I suppose that it's also worth noting that in that class I need to be
able to sort by Date, so I have to throw and indexed Date variable in
there as well.

David Fuelling

unread,
Feb 18, 2012, 2:08:02 PM2/18/12
to objectify...@googlegroups.com
Hey Jeff, great discussion here!  

One quick clarification: Did you mean to write 50,000 for the entry limit on multi-valued properties?  From the docs, there seems to still be a 5k index limit per Entity, or (apparently) unlimited entries if a Multivalued property is not indexed (of course, with a 1M limit).

I'm trying to figure out how quickly one would need to implement Brett's "RIE" pattern for multi-valued properties.

Thanks!
david

Jeff Schnitzer

unread,
Feb 20, 2012, 10:29:27 AM2/20/12
to objectify...@googlegroups.com
Sorry - yes, it's 5k.

Jeff
Reply all
Reply to author
Forward
0 new messages