I'm writing a chat application and I'm considering having an object per day with all the messages for that day embedded in a list. Alternative I could just have the messages in a collection of their own.
e.g. chatDay = { messages : [ { Id = ..., Msg = "Hello" }, { Id = ..., Msg : "Oh hi"} ] }
vs
{ Id = ..., Msg = "Hello" } { Id = ..., Msg : "Oh hi"}
Obviously the former is going to mean a lot fewer individual objects being pulled back and forth so I think might be more efficient for reading, and more convenient for paging. Are there any performance consideration here? Would things start to degrade if a 'day' container started to get large due to a huge quantity of messages (n.b. obviously its going to go horribly wrong if I hit the object size limit!)
> I'm writing a chat application and I'm considering having an object per
> day with all the messages for that day embedded in a list. Alternative I
> could just have the messages in a collection of their own.
> e.g.
> chatDay = { messages : [ { Id = ..., Msg = "Hello" }, { Id = ..., Msg :
> "Oh hi"} ] }
> vs
> { Id = ..., Msg = "Hello" }
> { Id = ..., Msg : "Oh hi"}
> Obviously the former is going to mean a lot fewer individual objects being
> pulled back and forth so I think might be more efficient for reading, and
> more convenient for paging. Are there any performance consideration here?
> Would things start to degrade if a 'day' container started to get large due
> to a huge quantity of messages (n.b. obviously its going to go horribly
> wrong if I hit the object size limit!)
> Thanks,
> Dan
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongodb-user@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user+unsubscribe@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb
> On Fri, Aug 10, 2012 at 12:57 PM, Daniel Harman <daniel....@gmail.com<javascript:> > > wrote:
>> Hi,
>> I'm writing a chat application and I'm considering having an object per >> day with all the messages for that day embedded in a list. Alternative I >> could just have the messages in a collection of their own.
>> e.g. >> chatDay = { messages : [ { Id = ..., Msg = "Hello" }, { Id = ..., Msg : >> "Oh hi"} ] }
>> vs
>> { Id = ..., Msg = "Hello" } >> { Id = ..., Msg : "Oh hi"}
>> Obviously the former is going to mean a lot fewer individual objects >> being pulled back and forth so I think might be more efficient for reading, >> and more convenient for paging. Are there any performance consideration >> here? Would things start to degrade if a 'day' container started to get >> large due to a huge quantity of messages (n.b. obviously its going to go >> horribly wrong if I hit the object size limit!)
>> Thanks,
>> Dan
>> -- >> You received this message because you are subscribed to the Google >> Groups "mongodb-user" group. >> To post to this group, send email to mongod...@googlegroups.com<javascript:> >> To unsubscribe from this group, send email to >> mongodb-user...@googlegroups.com <javascript:> >> See also the IRC channel -- freenode.net#mongodb
Although having said that it doesn't really talk about performance difference between modifying an existing object with for example a push vs inserting a new object into a collection. Are there any general principles to consider here?
>> On Fri, Aug 10, 2012 at 12:57 PM, Daniel Harman <daniel....@gmail.com>wrote:
>>> Hi,
>>> I'm writing a chat application and I'm considering having an object per >>> day with all the messages for that day embedded in a list. Alternative I >>> could just have the messages in a collection of their own.
>>> e.g. >>> chatDay = { messages : [ { Id = ..., Msg = "Hello" }, { Id = ..., Msg : >>> "Oh hi"} ] }
>>> vs
>>> { Id = ..., Msg = "Hello" } >>> { Id = ..., Msg : "Oh hi"}
>>> Obviously the former is going to mean a lot fewer individual objects >>> being pulled back and forth so I think might be more efficient for reading, >>> and more convenient for paging. Are there any performance consideration >>> here? Would things start to degrade if a 'day' container started to get >>> large due to a huge quantity of messages (n.b. obviously its going to go >>> horribly wrong if I hit the object size limit!)
>>> Thanks,
>>> Dan
>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "mongodb-user" group. >>> To post to this group, send email to mongod...@googlegroups.com >>> To unsubscribe from this group, send email to >>> mongodb-user...@googlegroups.com >>> See also the IRC channel -- freenode.net#mongodb
On Friday, August 10, 2012 7:18:29 PM UTC-4, Daniel Harman wrote:
> Although having said that it doesn't really talk about performance > difference between modifying an existing object with for example a push vs > inserting a new object into a collection. Are there any general principles > to consider here?
An update and insert will probably be very comparable _if_ the document has not grown beyond the size of the currently allocated block. If it does have to move the document then the insert is going to be faster.
The other issue to consider is that with each move/delete the document leaves a hole. MongoDB is not very good at managing the holes created if you are not ensuring the documents are of uniform size. A secondary effect is that after a while the collection of holes in the database slows down all allocations (straight inserts and updates that move) as it scan the growing free lists.
In 2.2, TTL collections will switch the collection to a "power of 2 allocator". In theory that fixes the fragmented problem at the expense on 1, potentially, extra index with a TTL of "forever" and a little wasted space.
For me the question is do you ever plan to delete the documents? If not then use a document per message and some smart indexing to group records for faster access. The data will be packed into memory/disk as tight as possible. You will still get temporal/spatial correlation since MongoDB will always append all of the messages to the end of the extents allocated.
If you will delete records its a toss up based on the primary usage pattern but you want the TTL collection's power of 2 allocator.
> On Fri, Aug 10, 2012 at 12:57 PM, Daniel Harman <daniel....@gmail.com<javascript:> > > wrote:
>> Hi,
>> I'm writing a chat application and I'm considering having an object per >> day with all the messages for that day embedded in a list. Alternative I >> could just have the messages in a collection of their own.
>> e.g. >> chatDay = { messages : [ { Id = ..., Msg = "Hello" }, { Id = ..., Msg : >> "Oh hi"} ] }
>> vs
>> { Id = ..., Msg = "Hello" } >> { Id = ..., Msg : "Oh hi"}
>> Obviously the former is going to mean a lot fewer individual objects >> being pulled back and forth so I think might be more efficient for reading, >> and more convenient for paging. Are there any performance consideration >> here? Would things start to degrade if a 'day' container started to get >> large due to a huge quantity of messages (n.b. obviously its going to go >> horribly wrong if I hit the object size limit!)
>> Thanks,
>> Dan
>> -- >> You received this message because you are subscribed to the Google >> Groups "mongodb-user" group. >> To post to this group, send email to mongod...@googlegroups.com<javascript:> >> To unsubscribe from this group, send email to >> mongodb-user...@googlegroups.com <javascript:> >> See also the IRC channel -- freenode.net#mongodb
It depends on a lot of factors (update/move rate, deleting, ratio of
inserts to updates, queries and ordering, etc), but it is a good
approach and one that works well for very active short write loads and
mostly reads, like an activity stream or time-based logging.
On Sun, Aug 12, 2012 at 7:16 AM, MKN Web Solutions
<mich...@mknwebsolutions.com> wrote:
> Can anyone from the MongoDB engineering team confirm that this approach is
> ideal? I just want to verify that this scheme is being used and works well.
> On Friday, August 10, 2012 2:55:06 PM UTC-4, oct wrote:
>> I think this article may answer your question...
>> On Fri, Aug 10, 2012 at 12:57 PM, Daniel Harman <daniel....@gmail.com>
>> wrote:
>>> Hi,
>>> I'm writing a chat application and I'm considering having an object per
>>> day with all the messages for that day embedded in a list. Alternative I
>>> could just have the messages in a collection of their own.
>>> e.g.
>>> chatDay = { messages : [ { Id = ..., Msg = "Hello" }, { Id = ..., Msg :
>>> "Oh hi"} ] }
>>> vs
>>> { Id = ..., Msg = "Hello" }
>>> { Id = ..., Msg : "Oh hi"}
>>> Obviously the former is going to mean a lot fewer individual objects
>>> being pulled back and forth so I think might be more efficient for reading,
>>> and more convenient for paging. Are there any performance consideration
>>> here? Would things start to degrade if a 'day' container started to get
>>> large due to a huge quantity of messages (n.b. obviously its going to go
>>> horribly wrong if I hit the object size limit!)
>>> Thanks,
>>> Dan
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "mongodb-user" group.
>>> To post to this group, send email to mongod...@googlegroups.com
>>> To unsubscribe from this group, send email to
>>> mongodb-user...@googlegroups.com
>>> See also the IRC channel -- freenode.net#mongodb
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongodb-user@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user+unsubscribe@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb
Thanks for the in depth answer. Suggests I've now implemented the wrong approach and better to go back to object per message. So I guess I could index by a date field (with no time on it) to get the effect I have now. However, given that the messages are very small (think IRC not email), I am left wondering if this isn't going to cause a lot of seeking to load up messages by day? They will be temporally correlation of course, but in a table with a whole load of different chat going on they will all be interleaved. Is that something I should be able to ignore?
Alternatively can I force a min block size for a table? I'm not sure it makes sense in terms of disk space consumption but worth considering.
I don't ever plan to delete documents and its likely messages will be cached locally on the web server anyway so perhaps seek time not a huge concern.
On Saturday, August 11, 2012 5:52:19 PM UTC+1, Rob Moore wrote:
> On Friday, August 10, 2012 7:18:29 PM UTC-4, Daniel Harman wrote:
>> Although having said that it doesn't really talk about performance >> difference between modifying an existing object with for example a push vs >> inserting a new object into a collection. Are there any general principles >> to consider here?
> An update and insert will probably be very comparable _if_ the document > has not grown beyond the size of the currently allocated block. If it does > have to move the document then the insert is going to be faster.
> The other issue to consider is that with each move/delete the document > leaves a hole. MongoDB is not very good at managing the holes created if > you are not ensuring the documents are of uniform size. A secondary effect > is that after a while the collection of holes in the database slows down > all allocations (straight inserts and updates that move) as it scan the > growing free lists.
> In 2.2, TTL collections will switch the collection to a "power of 2 > allocator". In theory that fixes the fragmented problem at the expense on > 1, potentially, extra index with a TTL of "forever" and a little wasted > space.
> For me the question is do you ever plan to delete the documents? If not > then use a document per message and some smart indexing to group records > for faster access. The data will be packed into memory/disk as tight as > possible. You will still get temporal/spatial correlation since MongoDB > will always append all of the messages to the end of the extents allocated.
> If you will delete records its a toss up based on the primary usage > pattern but you want the TTL collection's power of 2 allocator.
Echoing Scott's comment about there be a lot of variables but...
If the MongoDB cluster is sized to keep the last N days (hours) in memory then "seeking" isn't an issue except when going back beyond that horizon.
You can index on the full timestamp and then simply do a range query. e.g.: { timestamp : { $gt : Date(2012-08-12T00:00:00) , $lt : Date(2012-08-13T00:00:00) } } The B-Tree indexes that MongoDB uses are designed to efficiently answer this type of query.
You can also create a compound index on { timestamp : 1, chat_name : 1} and it should speed up a query using both a range on timestamp and a range or value for the chat_name.
The only option I know if (other than the bucketed documents) to group the messages into chats is to use a collection per "chat" but I'd not recommend that unless you can enumerate the chats before hand. I have heard issues about scaling the collection count into the thousands but I prefer to just not go there.
The only mechanism I know of for controlling the allocation of blocks in MongoDB is the TTL Collections. I'm eagerly awaiting the 2.2.0 release so I can take it for a spin on my current project.
On Sunday, August 12, 2012 6:44:40 PM UTC-4, Daniel Harman wrote:
> Hi Rob,
> Thanks for the in depth answer. Suggests I've now implemented the wrong > approach and better to go back to object per message. So I guess I could > index by a date field (with no time on it) to get the effect I have now. > However, given that the messages are very small (think IRC not email), I am > left wondering if this isn't going to cause a lot of seeking to load up > messages by day? They will be temporally correlation of course, but in a > table with a whole load of different chat going on they will all be > interleaved. Is that something I should be able to ignore?
> Alternatively can I force a min block size for a table? I'm not sure it > makes sense in terms of disk space consumption but worth considering.
> I don't ever plan to delete documents and its likely messages will be > cached locally on the web server anyway so perhaps seek time not a huge > concern.