Scaling Streama

45 views
Skip to first unread message

Christos Pappas

unread,
Jun 27, 2012, 4:09:17 AM6/27/12
to str...@googlegroups.com
Streama provides a very simple way to publish activity streams using Mongoid. The implementation takes advantage of the Array field type in MongoDB which allows a single activity document to be sent to followers. It does this by storing the receivers of an activity in an array field allowing a single query to retrieve a users activity stream.

This method is perfect for small sites that need a quick and easy way to provide activity streams, however it probably won't scale very well once you start sharding your data or have thousands of followers per user.

To scale effectively, you'd need to be able to shard on the receiving user so that queries hit a single server. Currently, MongoDB doesn't support sharding based on an array field type, although there is a ticket for this but it doesn't look like anyone is working on it.


joe1chen on Github has created a fork of Streama that implements a scalable solution, however for many of you it may be over kill as it will duplicate activities for each receiver along with any cached data even though you aren't sharding. 

So my question to anyone using Streama is; does the current schema suit what you are doing? Should the schema change to support sharding based on receivers and if so, is duplicating activities per receiver the way to go?

Let me know if you have any ideas on how to improve Streama and make it more scalable while keeping it simple enough for smaller sites.

Christos

Patrick Mulder

unread,
Aug 15, 2012, 5:30:19 PM8/15/12
to str...@googlegroups.com
On Wed, Jun 27, 2012 at 10:09 AM, Christos Pappas
<christo...@gmail.com> wrote:

> Let me know if you have any ideas on how to improve Streama and make it more
> scalable while keeping it simple enough for smaller sites.
>

I was browsing the chapter on Redis today in the book "7 Databases in
7 weeks" http://pragprog.com/book/rwdata/seven-databases-in-seven-weeks

They talk about the design of a "Polyglot Persistent Service" (e.g.
described here too:
http://martinfowler.com/bliki/PolyglotPersistence.html )

I have to study it a bit more in detail, but maybe persistence with
support of Redis could be interesting.

Maybe interesting for discussion, why is MongoDB in the first place a
good fit for persistence of ActivityStreams?

Another question, the idea of load_instance(:actor), etc.would work
with other persitence schemes too?
Reply all
Reply to author
Forward
0 new messages