DB Model design, fan-out, lots of writes?

Chris

unread,

Nov 5, 2009, 11:35:26 AM11/5/09

to Google App Engine

Hi All

I'm looking for some input on Db.Model design for the following
scenario:

1) A User can send a message which will to anywhere between 1 to 5000
receipients with 200-2000 receipients being by far the the most
common.
2) Each recipient is expected to receive between 1 and 50 messages a
day.
3) When a receipent has read a message it needs to be flagged as read.

For the distribtion of messages to receipients I took inspiration from
this http://www.youtube.com/watch?v=AgaL6NGpkB8
(About 16 minutes into the video) he suggests a model like this:

class Message(db.Model):
sender=db.StringProperty()
body=db.TextProperty()

class MessageIndex(db.Model):
recipient = db.StringListProperty()

with Message and MessageIndex being in the same entity group. In short
the benefit to this design is supposedly that I can do a key only
query on
the MessageIndex for a particular user. From the MessageIndex keys
returned for the recepient I can extract the actual Message entity
keys and fetch those directly by key.

That's all well and good...but then I get to 3)...recipients needing
to flag messages as read. For that I'm contemplating something like
this:

class MessageReadIndex(db.Model):
recipient=db.StringProperty()
month=db.IntegerProperty()
messagesRead = db.StringListProperty(indexed=False)

When a recipient asks for a list of messages it will be sorted by
date, newest messages first, and paged (think gmail).
In the same page request I can query the MessageReadIndex for the user
and month(s) in question. From here I can loop through each message in
memory and
check to see if it has already been read.

When the recipient clicks a message to read it I can also retrieve the
MessageReadIndex entity and append the Message Id to the messagesRead
property and put() the entity.
This last bit is what has be a bit worried. It will be quite a few
writes from every recipient every day...again think gmail ;-) Not
indexing the messagesRead
property should help minimize the number of index entries that need
updating ...but still. Am I being overly paranoid and prematurely
optimizing at an unreasonably
level? Does anybody have any better ideas as for how to handle this?

Thanks in advance for your CPU time!

/Chris

Bjorn L

unread,

Dec 1, 2009, 9:03:23 AM12/1/09

to Google App Engine

i too am facing a similar problem, was thinking of a design similar to
yours. anyone have any comments on Chris' design?

On Nov 5, 5:35 pm, Chris <cskjoldb...@gmail.com> wrote:
> Hi All
>

> I'm looking for some input onDb.Modeldesign for the following

> scenario:
>
> 1) A User can send a message which will to anywhere between 1 to 5000
> receipients with 200-2000 receipients being by far the the most
> common.
> 2) Each recipient is expected to receive between 1 and 50 messages a
> day.
> 3) When a receipent has read a message it needs to be flagged as read.
>
> For the distribtion of messages to receipients I took inspiration from

> thishttp://www.youtube.com/watch?v=AgaL6NGpkB8
> (About 16 minutes into the video) he suggests amodellike this:

peterk

unread,

Dec 1, 2009, 5:43:35 PM12/1/09

to Google App Engine

An extra write when someone clicks to read a message doesn't seem
massively expensive...at least not from a latency point of view. You
probably would be OK.

However you could optimise these approach further with a write-behind
cache for example.

That could be particularly effective if you expect a user's changes to
their messagereadindex to be bunched together in time (likely in a
gmail style app).

So you would:

1) Write the message_read flag to cache (i.e. "user abc read message
xyz"
2) then have a task that flushes these changes to messagereadindexes
every couple of minutes. After every write to the cache, check if a
task is running or scheduled to run, and if there is none, enqueue a
task and tell it to run in a minute or two.
3) In the meantime more messages might be read, with more writes to
cache.
4) Then your task would kick in, hoover up the flags in the cache, and
write them to the DB. So if one user read multiple messages in that
couple of minutes, you'd be turning what was multiple DB writes into
one write. And assuming your task was looking after index changes for
multiple users (as it probably should) then you also could batch
together your puts for those messagereadindexes for another little
saving over seperate writes for each message_read action.

The downside to this is that it is possible the cache will go wonky
between a user adding a flag to the cache and the time a task comes
around to persist it to the DB. But if the time to task-run is
reasonably low then it shouldn't be a problem most of the time
(according to google anyway). In the odd case where a flag was
dropped, it doesn't seem like it would be a critical problem (the
result would just be that a message appeared to be unread rather than
read).

Julian Namaro

unread,

Dec 1, 2009, 11:43:40 PM12/1/09

to Google App Engine

There is a better solution to this problem.
You can use a special entity MessageRead with no property and a
key_name= message_id + recipient_id

indexes = keys_only query on MessageIndexes
for k in indexes:
keys.append( k.parent() )
keys.append( db.Key.from_path('MessageRead', k.parent().id() +
recipient_id) )
messages = db.get(keys)

You can put MessageRead in the same entity group than Message to do
transactional "mark as read", but then as Peterk said there might be
contention if the message has thousands of recipients. The other
option is to leave it a root entity and write it in a task queue after
a message is read.

On Nov 6, 1:35 am, Chris <cskjoldb...@gmail.com> wrote:
> Hi All
>
> I'm looking for some input on Db.Model design for the following
> scenario:
>
> 1) A User can send a message which will to anywhere between 1 to 5000
> receipients with 200-2000 receipients being by far the the most
> common.
> 2) Each recipient is expected to receive between 1 and 50 messages a
> day.
> 3) When a receipent has read a message it needs to be flagged as read.
>
> For the distribtion of messages to receipients I took inspiration from

> thishttp://www.youtube.com/watch?v=AgaL6NGpkB8

peterk

unread,

Dec 2, 2009, 4:20:01 AM12/2/09

to Google App Engine

Nice job encoding the necessary info in keys :)

Is there value in putting the MessageRead entity in the same entity
group as the message?

Not sure but don't think you'd need updates to MessageRead to be
transactional with any updates to Message or other MessageRead
entities for the same message?

Keep them in their own entity groups and you won't have a contention
problem. A write behind cache could still be useful for reducing
writes though, if you can handle the risk of the odd dropped update.

Julian Namaro

unread,

Dec 2, 2009, 11:31:52 PM12/2/09

to Google App Engine

> Is there value in putting the MessageRead entity in the same entity
> group as the message?
>

Just to do transactional "mark as read". I agree in most cases it's
not worth the trouble.

Reply all

Reply to author

Forward