MongoDB suitability for append-only event-source stream?

yatesco

unread,

Jan 17, 2012, 10:57:55 AM1/17/12

to mongod...@googlegroups.com

Hi all,

I am contemplating using event-sourcing on our next architecture and using a single append-only dump to store said events. A number of discrete components will consume those events and build up their own internal state. During run-time, each component will consume the event in near real-time however occasionally we will introduce a new component which will need to read the entire event log. Imagine each event has an incrementing id - each component will essentially ask "give me all the events since X" - the polling frequency will ensure each query will return the top 100 or so most recent events.

The number of events will quickly run into the millions however *most* of the time the working set will be very very small. My question is what happens when the new component will appear and suddenly ask for the entire event stream which absolutely won't fit in the working memory. I don't want the new reader to pause the writer and the existing readers.

One strategy I thought of was to set up two mirrors - all the run-time readers read one of the slaves (which is really emulating a persistent topic) whilst the second slave is there purely for new components.

Each component will also have an asynchronous job which snapshots itself as well, and those snapshots will be stored in mongo, but I need to do some tests to make sure those writes don't impact the readers....

Any thoughts?

Col

Adam C

unread,

Jan 23, 2012, 12:19:23 PM1/23/12

to mongodb-user

Col,

Let me make sure I have this all straight. It sounds like you will
have a reasonably write heavy application, with a relatively small
working data set (100 most recent) in terms of reads.

Occasionally a new component/node will be brought online and need to
get all events, not just the most recent, as it bootstraps itself with
subsequent reads then reverting to the normal pattern.

Given that you don't want to negatively impact the writes on the
primary when you are bringing up a new node, I think this will fit
most easily into a replica set configuration. The majority of your
reads (and all your writes, of course) hit the primary. When you run
a query that is abnormally large (a new node, or a node playing catch
up) you can then set the slaveOK options and direct that long read to
the secondaries in the replica set. Think of this option more like a
"prefer slave" than "it's OK to use a slave" - any query with that
option set will then hit a secondary rather than the primary, giving
you the behavior you want.

As always, what really needs to happen is some testing with real data
to model this and determine how you need to scale, but based on the
description, this would be the approach I would start with.

Adam.

yatesco

unread,

Jan 23, 2012, 1:26:14 PM1/23/12

to mongod...@googlegroups.com

Thanks Adam. I was thinking along the same lines (particularly about the proof-by-test approach!).

Reply all

Reply to author

Forward