Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
14 Million records a day
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  10 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
MoroSwitie  
View profile  
 More options Mar 15 2012, 7:58 am
From: MoroSwitie <moroswi...@gmail.com>
Date: Thu, 15 Mar 2012 04:58:18 -0700 (PDT)
Local: Thurs, Mar 15 2012 7:58 am
Subject: [mongodb-user] 14 Million records a day

We currently have around 2500 VPS servers that we need to monitor.
All data needs to be stored in a database for at least 3 months (If enough
storage space is available this could be extended).
If the initial test is successful 3 other data center locations will be
added and we would be talking about 10.000 vps servers that we need to
monitor.
We are going to setup a test environment to compare MySQL against MongoDB.

We currently don't have any experience with MongoDB so we got a lot of
reading todo and we need to let go of the traditional SQL normalization
rules.
In order for us to setup a representable test environment we were looking
for some pointers.

Read/Write Ratio:
We will be inserting 10000 records a minute 24/7 (get list of vps servers,
check availability and as soon as we receive a response to our check, fire
record/document storage)
This will probably cause a peak each minute. 10000 * 1440 minutes =
14.400.000 Records a day

What would be the correct way to store this in MongoDB
1) create a new document for each update
{
   "vpsId":"xxxxxxxx",
   "dateTime":"2012-03-01 12:00:01",
   "status":"success",
   "responseTime":0.04

}

2) create a new document for each day, and add update to existing document
{
   "vpsId":"xxxxxxxx",
   "date:"2012-03-01",
   "updates":[
      {
         "dateTime":"2012-03-01 12:00:01",
         "status":"success",
         "responseTime":0.04
      },
      {
         "dateTime":"2012-03-01 12:01:01",
         "status":"success",
         "responseTime":0.05
      },
      {
         "dateTime":"2012-03-01 12:02:01",
         "status":"success",
         "responseTime":0.03
      }
   ]

}

There will me a lot less reading of data compared to inserting new
documents.
Any recommendations would be highly appreciated.

Regards,
Moro.

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/-53a2JOws1wJ.
To post to this group, send email to mongodb-user@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Eliot Horowitz  
View profile  
 More options Mar 15 2012, 8:18 am
From: Eliot Horowitz <el...@10gen.com>
Date: Thu, 15 Mar 2012 08:18:18 -0400
Local: Thurs, Mar 15 2012 8:18 am
Subject: Re: [mongodb-user] 14 Million records a day
I would probably go with #2 and group all updates for a given day in 1 document.
Will be easier to purge old ones and shard.

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongodb-user@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
EastGhostCom  
View profile  
 More options Mar 15 2012, 9:54 am
From: EastGhostCom <mikes.google.acco...@brenden.com>
Date: Thu, 15 Mar 2012 06:54:05 -0700 (PDT)
Local: Thurs, Mar 15 2012 9:54 am
Subject: [mongodb-user] Re: 14 Million records a day

 IMO, consider minimizing field names, to not waste space (because field
names are stored w/ every record):

"vpsId" ===> "vid"
"dateTime" ===> "dt" (store this data using MongoDate()
"status" ===> "st"
"responseTime" ===> "rt"

You might also be able to squish the responseTime data into the microsecond
portion of the MongoDate()
Use codes for "status" field (0 = success, 1 = net unreach, 2 = unreach,
etc.)

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/kr28uRxSUwkJ.
To post to this group, send email to mongodb-user@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andy O'Neill  
View profile  
 More options Mar 15 2012, 3:58 pm
From: Andy O'Neill <one...@energyhub.net>
Date: Thu, 15 Mar 2012 12:58:28 -0700 (PDT)
Local: Thurs, Mar 15 2012 3:58 pm
Subject: Re: [mongodb-user] 14 Million records a day

On Thursday, March 15, 2012 8:18:18 AM UTC-4, Eliot Horowitz wrote:

> I would probably go with #2 and group all updates for a given day in 1
> document.
> Will be easier to purge old ones and shard.

While grouping your updates in documents is useful for index size reasons
and deletion purposes, I'd be hesitant to add a whole day's worth of
updates for a given server in a single document as a list as recommended by
Eliot. We found that appending to the end of an array of subdocuments
(using the $push operator) showed very poor performance when the list grew
to a few hundred subdocuments long - our best guess is that $push is an
O(n) operation. Our data was a similar size to yours. You may also find
that you have to 'prepad' the document when you create it to make sure it
isn't moved as you append updates to it during the day.

We solved/worked around the problem by using hourly documents instead of
daily documents, with arrays that are smaller than 100 subdocuments, but
our data is 5 minute resolution so this may not work for you. Another
solution is to use a more deeply nested structure instead of an array, if
it suits your use case:

{
  "vpsId":"xxxxxxxx",
  "date:"2012-03-01",
  "updates" : {
    0 : {
      0 : {
        // record for 00:00
        "status":"success",
        "responseTime":0.04
      },
      1 : {
        // record for 00:01
      },...
    },
    1 : {
      0: {
        // record for 01:00
      },
      ...
    },
    ...
  }

}

In this case you can access any given minute's record using dot notation
(updates.12.17 for 12:17)

I recommend writing a database-only benchmark program that writes data in
the format you intend to use, with the load you intend to handle, and see
if you notice performance issues, rather than building the whole app and
having to go back and change the data schema later.

One tell-tale sign for us was our disk write rate had a huge sawtooth
pattern where every day it would start small and by the end of the day we
were writing 3X more data to disk per second, even though our data comes in
at a constant rate. We still see the sawtooth, and it has the same slope,
but it only rises for an hour before returning to the low rate. This was
*not* due to documents being moved.

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/ArtGODq5NY4J.
To post to this group, send email to mongodb-user@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Chris Westin  
View profile  
 More options Mar 15 2012, 4:07 pm
From: Chris Westin <cwes...@yahoo.com>
Date: Thu, 15 Mar 2012 13:07:20 -0700 (PDT)
Local: Thurs, Mar 15 2012 4:07 pm
Subject: Re: [mongodb-user] 14 Million records a day

The increase in write rate over time is probably due to the oplog records
(for replication) growing as the array append operations are made
idempotent before being written there.

Chris

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/MzYd4Ojv5hsJ.
To post to this group, send email to mongodb-user@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andy O'Neill  
View profile  
 More options Mar 15 2012, 8:18 pm
From: Andy O'Neill <one...@energyhub.net>
Date: Thu, 15 Mar 2012 17:18:42 -0700 (PDT)
Local: Thurs, Mar 15 2012 8:18 pm
Subject: Re: [mongodb-user] 14 Million records a day

On Thursday, March 15, 2012 4:07:20 PM UTC-4, Chris Westin wrote:

> The increase in write rate over time is probably due to the oplog records
> (for replication) growing as the array append operations are made
> idempotent before being written there.

That's interesting, thanks for the insight. Is there any work around,
considering that "subdocuments in arrays" is something regularly suggested
by mongo experts.

Are you saying that the entire array is written to the oplog, rather than
just the $push and element? Why is that necessary?

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/oZGk8_J4zfcJ.
To post to this group, send email to mongodb-user@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Timothy Hawkins  
View profile  
 More options Mar 15 2012, 8:27 pm
From: Timothy Hawkins <tim.hawk...@mac.com>
Date: Fri, 16 Mar 2012 08:27:42 +0800
Local: Thurs, Mar 15 2012 8:27 pm
Subject: Re: [mongodb-user] 14 Million records a day

I think its required, think of what happens if you get two $push updates to the same array very close to each other.

Sent from my iPad

On 16 Mar 2012, at 08:18, Andy O'Neill <one...@energyhub.net> wrote:

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongodb-user@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
MoroSwitie  
View profile  
 More options Mar 16 2012, 7:12 am
From: MoroSwitie <moroswi...@gmail.com>
Date: Fri, 16 Mar 2012 04:12:46 -0700 (PDT)
Local: Fri, Mar 16 2012 7:12 am
Subject: Re: [mongodb-user] 14 Million records a day

If I understand correctly we should try to keep our arrays below 100
sub-documents. I will try your suggested method as well. I'm currently
planning on how to do our test, and will include this pattern as well. You
are totally right about not building the entire app first. I can remember
some old projects in the past that were build that way. So that is why we
now always try to simulate expected load, to see how the database is
holding. I rather spend an extra week testing database load then having to
start over from scratch 6 months later.

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/dNqEYHCxO5oJ.
To post to this group, send email to mongodb-user@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andy O'Neill  
View profile  
 More options Mar 16 2012, 11:42 am
From: Andy O'Neill <one...@energyhub.net>
Date: Fri, 16 Mar 2012 08:42:08 -0700 (PDT)
Local: Fri, Mar 16 2012 11:42 am
Subject: Re: [mongodb-user] 14 Million records a day

On Thursday, March 15, 2012 8:27:42 PM UTC-4, Tim Hawkins wrote:

> I think its required, think of what happens if you get two $push updates
> to the same array very close to each other.

I see, I didn't realize the oplog had to be idempotent, I assumed it would
rely on some kind of ordering, but now I've learned about the replayability
of the oplog. In the comments for SERVER-3407<https://jira.mongodb.org/browse/SERVER-3407> Eliot
suggests that oplog idempotency would not be required once journalling was
in place. Does anyone know anything else about that? For our use-case it
would certainly remove a massive amount of disk writes.

A related question: could using $addToSet instead of $push possibly improve
the performance in this situation? Presumably that is idempotent without
writing the entire array to the oplog, at the cost of having to seek
through the existing elements before inserting. If so, it'd be nice to see
that in the documentation. If I get a change I'll test it, but I'm
interested to hear what everyone else thinks.

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/1mrhULFpnTQJ.
To post to this group, send email to mongodb-user@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Eliot Horowitz  
View profile  
 More options Mar 17 2012, 1:59 am
From: Eliot Horowitz <el...@10gen.com>
Date: Sat, 17 Mar 2012 01:59:44 -0400
Local: Sat, Mar 17 2012 1:59 am
Subject: Re: [mongodb-user] 14 Million records a day
$push does not put the entire array in the oplog, just a pre condition
and what to change.

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongodb-user@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »