We currently have around 2500 VPS servers that we need to monitor. All data needs to be stored in a database for at least 3 months (If enough storage space is available this could be extended). If the initial test is successful 3 other data center locations will be added and we would be talking about 10.000 vps servers that we need to monitor. We are going to setup a test environment to compare MySQL against MongoDB.
We currently don't have any experience with MongoDB so we got a lot of reading todo and we need to let go of the traditional SQL normalization rules. In order for us to setup a representable test environment we were looking for some pointers.
Read/Write Ratio: We will be inserting 10000 records a minute 24/7 (get list of vps servers, check availability and as soon as we receive a response to our check, fire record/document storage) This will probably cause a peak each minute. 10000 * 1440 minutes = 14.400.000 Records a day
What would be the correct way to store this in MongoDB 1) create a new document for each update { "vpsId":"xxxxxxxx", "dateTime":"2012-03-01 12:00:01", "status":"success", "responseTime":0.04
}
2) create a new document for each day, and add update to existing document { "vpsId":"xxxxxxxx", "date:"2012-03-01", "updates":[ { "dateTime":"2012-03-01 12:00:01", "status":"success", "responseTime":0.04 }, { "dateTime":"2012-03-01 12:01:01", "status":"success", "responseTime":0.05 }, { "dateTime":"2012-03-01 12:02:01", "status":"success", "responseTime":0.03 } ]
}
There will me a lot less reading of data compared to inserting new documents. Any recommendations would be highly appreciated.
Regards, Moro.
-- You received this message because you are subscribed to the Google Groups "mongodb-user" group. To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/-53a2JOws1wJ. To post to this group, send email to mongodb-user@googlegroups.com. To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
On Thu, Mar 15, 2012 at 7:58 AM, MoroSwitie <moroswi...@gmail.com> wrote: > We currently have around 2500 VPS servers that we need to monitor. > All data needs to be stored in a database for at least 3 months (If enough > storage space is available this could be extended). > If the initial test is successful 3 other data center locations will be > added and we would be talking about 10.000 vps servers that we need to > monitor. > We are going to setup a test environment to compare MySQL against MongoDB.
> We currently don't have any experience with MongoDB so we got a lot of > reading todo and we need to let go of the traditional SQL normalization > rules. > In order for us to setup a representable test environment we were looking > for some pointers.
> Read/Write Ratio: > We will be inserting 10000 records a minute 24/7 (get list of vps servers, > check availability and as soon as we receive a response to our check, fire > record/document storage) > This will probably cause a peak each minute. 10000 * 1440 minutes = > 14.400.000 Records a day
> What would be the correct way to store this in MongoDB > 1) create a new document for each update > { > "vpsId":"xxxxxxxx", > "dateTime":"2012-03-01 12:00:01", > "status":"success", > "responseTime":0.04 > }
> 2) create a new document for each day, and add update to existing document > { > "vpsId":"xxxxxxxx", > "date:"2012-03-01", > "updates":[ > { > "dateTime":"2012-03-01 12:00:01", > "status":"success", > "responseTime":0.04 > }, > { > "dateTime":"2012-03-01 12:01:01", > "status":"success", > "responseTime":0.05 > }, > { > "dateTime":"2012-03-01 12:02:01", > "status":"success", > "responseTime":0.03 > } > ] > }
> There will me a lot less reading of data compared to inserting new > documents. > Any recommendations would be highly appreciated.
> Regards, > Moro.
> -- > You received this message because you are subscribed to the Google Groups > "mongodb-user" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/mongodb-user/-/-53a2JOws1wJ. > To post to this group, send email to mongodb-user@googlegroups.com. > To unsubscribe from this group, send email to > mongodb-user+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/mongodb-user?hl=en.
-- You received this message because you are subscribed to the Google Groups "mongodb-user" group. To post to this group, send email to mongodb-user@googlegroups.com. To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
IMO, consider minimizing field names, to not waste space (because field names are stored w/ every record):
"vpsId" ===> "vid" "dateTime" ===> "dt" (store this data using MongoDate() "status" ===> "st" "responseTime" ===> "rt"
You might also be able to squish the responseTime data into the microsecond portion of the MongoDate() Use codes for "status" field (0 = success, 1 = net unreach, 2 = unreach, etc.)
-- You received this message because you are subscribed to the Google Groups "mongodb-user" group. To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/kr28uRxSUwkJ. To post to this group, send email to mongodb-user@googlegroups.com. To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
On Thursday, March 15, 2012 8:18:18 AM UTC-4, Eliot Horowitz wrote:
> I would probably go with #2 and group all updates for a given day in 1 > document. > Will be easier to purge old ones and shard.
While grouping your updates in documents is useful for index size reasons and deletion purposes, I'd be hesitant to add a whole day's worth of updates for a given server in a single document as a list as recommended by Eliot. We found that appending to the end of an array of subdocuments (using the $push operator) showed very poor performance when the list grew to a few hundred subdocuments long - our best guess is that $push is an O(n) operation. Our data was a similar size to yours. You may also find that you have to 'prepad' the document when you create it to make sure it isn't moved as you append updates to it during the day.
We solved/worked around the problem by using hourly documents instead of daily documents, with arrays that are smaller than 100 subdocuments, but our data is 5 minute resolution so this may not work for you. Another solution is to use a more deeply nested structure instead of an array, if it suits your use case:
{ "vpsId":"xxxxxxxx", "date:"2012-03-01", "updates" : { 0 : { 0 : { // record for 00:00 "status":"success", "responseTime":0.04 }, 1 : { // record for 00:01 },... }, 1 : { 0: { // record for 01:00 }, ... }, ... }
}
In this case you can access any given minute's record using dot notation (updates.12.17 for 12:17)
I recommend writing a database-only benchmark program that writes data in the format you intend to use, with the load you intend to handle, and see if you notice performance issues, rather than building the whole app and having to go back and change the data schema later.
One tell-tale sign for us was our disk write rate had a huge sawtooth pattern where every day it would start small and by the end of the day we were writing 3X more data to disk per second, even though our data comes in at a constant rate. We still see the sawtooth, and it has the same slope, but it only rises for an hour before returning to the low rate. This was *not* due to documents being moved.
-- You received this message because you are subscribed to the Google Groups "mongodb-user" group. To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/ArtGODq5NY4J. To post to this group, send email to mongodb-user@googlegroups.com. To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
The increase in write rate over time is probably due to the oplog records (for replication) growing as the array append operations are made idempotent before being written there.
On Thursday, March 15, 2012 12:58:28 PM UTC-7, Andy O'Neill wrote:
> On Thursday, March 15, 2012 8:18:18 AM UTC-4, Eliot Horowitz wrote:
>> I would probably go with #2 and group all updates for a given day in 1 >> document. >> Will be easier to purge old ones and shard.
> While grouping your updates in documents is useful for index size reasons > and deletion purposes, I'd be hesitant to add a whole day's worth of > updates for a given server in a single document as a list as recommended by > Eliot. We found that appending to the end of an array of subdocuments > (using the $push operator) showed very poor performance when the list grew > to a few hundred subdocuments long - our best guess is that $push is an > O(n) operation. Our data was a similar size to yours. You may also find > that you have to 'prepad' the document when you create it to make sure it > isn't moved as you append updates to it during the day.
> We solved/worked around the problem by using hourly documents instead of > daily documents, with arrays that are smaller than 100 subdocuments, but > our data is 5 minute resolution so this may not work for you. Another > solution is to use a more deeply nested structure instead of an array, if > it suits your use case:
> In this case you can access any given minute's record using dot notation > (updates.12.17 for 12:17)
> I recommend writing a database-only benchmark program that writes data in > the format you intend to use, with the load you intend to handle, and see > if you notice performance issues, rather than building the whole app and > having to go back and change the data schema later.
> One tell-tale sign for us was our disk write rate had a huge sawtooth > pattern where every day it would start small and by the end of the day we > were writing 3X more data to disk per second, even though our data comes in > at a constant rate. We still see the sawtooth, and it has the same slope, > but it only rises for an hour before returning to the low rate. This was > *not* due to documents being moved.
-- You received this message because you are subscribed to the Google Groups "mongodb-user" group. To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/MzYd4Ojv5hsJ. To post to this group, send email to mongodb-user@googlegroups.com. To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
On Thursday, March 15, 2012 4:07:20 PM UTC-4, Chris Westin wrote:
> The increase in write rate over time is probably due to the oplog records > (for replication) growing as the array append operations are made > idempotent before being written there.
That's interesting, thanks for the insight. Is there any work around, considering that "subdocuments in arrays" is something regularly suggested by mongo experts.
Are you saying that the entire array is written to the oplog, rather than just the $push and element? Why is that necessary?
On Thursday, March 15, 2012 4:07:20 PM UTC-4, Chris Westin wrote:
> The increase in write rate over time is probably due to the oplog records > (for replication) growing as the array append operations are made > idempotent before being written there.
> Chris
> On Thursday, March 15, 2012 12:58:28 PM UTC-7, Andy O'Neill wrote:
>> On Thursday, March 15, 2012 8:18:18 AM UTC-4, Eliot Horowitz wrote:
>>> I would probably go with #2 and group all updates for a given day in 1 >>> document. >>> Will be easier to purge old ones and shard.
>> While grouping your updates in documents is useful for index size reasons >> and deletion purposes, I'd be hesitant to add a whole day's worth of >> updates for a given server in a single document as a list as recommended by >> Eliot. We found that appending to the end of an array of subdocuments >> (using the $push operator) showed very poor performance when the list grew >> to a few hundred subdocuments long - our best guess is that $push is an >> O(n) operation. Our data was a similar size to yours. You may also find >> that you have to 'prepad' the document when you create it to make sure it >> isn't moved as you append updates to it during the day.
>> We solved/worked around the problem by using hourly documents instead of >> daily documents, with arrays that are smaller than 100 subdocuments, but >> our data is 5 minute resolution so this may not work for you. Another >> solution is to use a more deeply nested structure instead of an array, if >> it suits your use case:
>> In this case you can access any given minute's record using dot notation >> (updates.12.17 for 12:17)
>> I recommend writing a database-only benchmark program that writes data in >> the format you intend to use, with the load you intend to handle, and see >> if you notice performance issues, rather than building the whole app and >> having to go back and change the data schema later.
>> One tell-tale sign for us was our disk write rate had a huge sawtooth >> pattern where every day it would start small and by the end of the day we >> were writing 3X more data to disk per second, even though our data comes in >> at a constant rate. We still see the sawtooth, and it has the same slope, >> but it only rises for an hour before returning to the low rate. This was >> *not* due to documents being moved.
On Thursday, March 15, 2012 4:07:20 PM UTC-4, Chris Westin wrote:
> The increase in write rate over time is probably due to the oplog records > (for replication) growing as the array append operations are made > idempotent before being written there.
> Chris
> On Thursday, March 15, 2012 12:58:28 PM UTC-7, Andy O'Neill wrote:
>> On Thursday, March 15, 2012 8:18:18 AM UTC-4, Eliot Horowitz wrote:
>>> I would probably go with #2 and group all updates for a given day in 1 >>> document. >>> Will be easier to purge old ones and shard.
>> While grouping your updates in documents is useful for index size reasons >> and deletion purposes, I'd be hesitant to add a whole day's worth of >> updates for a given server in a single document as a list as recommended by >> Eliot. We found that appending to the end of an array of subdocuments >> (using the $push operator) showed very poor performance when the list grew >> to a few hundred subdocuments long - our best guess is that $push is an >> O(n) operation. Our data was a similar size to yours. You may also find >> that you have to 'prepad' the document when you create it to make sure it >> isn't moved as you append updates to it during the day.
>> We solved/worked around the problem by using hourly documents instead of >> daily documents, with arrays that are smaller than 100 subdocuments, but >> our data is 5 minute resolution so this may not work for you. Another >> solution is to use a more deeply nested structure instead of an array, if >> it suits your use case:
>> In this case you can access any given minute's record using dot notation >> (updates.12.17 for 12:17)
>> I recommend writing a database-only benchmark program that writes data in >> the format you intend to use, with the load you intend to handle, and see >> if you notice performance issues, rather than building the whole app and >> having to go back and change the data schema later.
>> One tell-tale sign for us was our disk write rate had a huge sawtooth >> pattern where every day it would start small and by the end of the day we >> were writing 3X more data to disk per second, even though our data comes in >> at a constant rate. We still see the sawtooth, and it has the same slope, >> but it only rises for an hour before returning to the low rate. This was >> *not* due to documents being moved.
-- You received this message because you are subscribed to the Google Groups "mongodb-user" group. To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/oZGk8_J4zfcJ. To post to this group, send email to mongodb-user@googlegroups.com. To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
> On Thursday, March 15, 2012 4:07:20 PM UTC-4, Chris Westin wrote: > The increase in write rate over time is probably due to the oplog records (for replication) growing as the array append operations are made idempotent before being written there.
> That's interesting, thanks for the insight. Is there any work around, considering that "subdocuments in arrays" is something regularly suggested by mongo experts.
> Are you saying that the entire array is written to the oplog, rather than just the $push and element? Why is that necessary?
> On Thursday, March 15, 2012 4:07:20 PM UTC-4, Chris Westin wrote: > The increase in write rate over time is probably due to the oplog records (for replication) growing as the array append operations are made idempotent before being written there.
> Chris
> On Thursday, March 15, 2012 12:58:28 PM UTC-7, Andy O'Neill wrote:
> On Thursday, March 15, 2012 8:18:18 AM UTC-4, Eliot Horowitz wrote: > I would probably go with #2 and group all updates for a given day in 1 document. > Will be easier to purge old ones and shard.
> While grouping your updates in documents is useful for index size reasons and deletion purposes, I'd be hesitant to add a whole day's worth of updates for a given server in a single document as a list as recommended by Eliot. We found that appending to the end of an array of subdocuments (using the $push operator) showed very poor performance when the list grew to a few hundred subdocuments long - our best guess is that $push is an O(n) operation. Our data was a similar size to yours. You may also find that you have to 'prepad' the document when you create it to make sure it isn't moved as you append updates to it during the day.
> We solved/worked around the problem by using hourly documents instead of daily documents, with arrays that are smaller than 100 subdocuments, but our data is 5 minute resolution so this may not work for you. Another solution is to use a more deeply nested structure instead of an array, if it suits your use case:
> In this case you can access any given minute's record using dot notation (updates.12.17 for 12:17)
> I recommend writing a database-only benchmark program that writes data in the format you intend to use, with the load you intend to handle, and see if you notice performance issues, rather than building the whole app and having to go back and change the data schema later.
> One tell-tale sign for us was our disk write rate had a huge sawtooth pattern where every day it would start small and by the end of the day we were writing 3X more data to disk per second, even though our data comes in at a constant rate. We still see the sawtooth, and it has the same slope, but it only rises for an hour before returning to the low rate. This was *not* due to documents being moved.
> On Thursday, March 15, 2012 4:07:20 PM UTC-4, Chris Westin wrote: > The increase in write rate over time is probably due to the oplog records (for replication) growing as the array append operations are made idempotent before being written there.
> Chris
> On Thursday, March 15, 2012 12:58:28 PM UTC-7, Andy O'Neill wrote:
> On Thursday, March 15, 2012 8:18:18 AM UTC-4, Eliot Horowitz wrote: > I would probably go with #2 and group all updates for a given day in 1 document. > Will be easier to purge old ones and shard.
> While grouping your updates in documents is useful for index size reasons and deletion purposes, I'd be hesitant to add a whole day's worth of updates for a given server in a single document as a list as recommended by Eliot. We found that appending to the end of an array of subdocuments (using the $push operator) showed very poor performance when the list grew to a few hundred subdocuments long - our best guess is that $push is an O(n) operation. Our data was a similar size to yours. You may also find that you have to 'prepad' the document when you create it to make sure it isn't moved as you append updates to it during the day.
> We solved/worked around the problem by using hourly documents instead of daily documents, with arrays that are smaller than 100 subdocuments, but our data is 5 minute resolution so this may not work for you. Another solution is to use a more deeply nested structure instead of an array, if it suits your use case:
> In this case you can access any given minute's record using dot notation (updates.12.17 for 12:17)
> I recommend writing a database-only benchmark program that writes data in the format you intend to use, with the load you intend to handle, and see if you notice performance issues, rather than building the whole app and having to go back and change the data schema later.
> One tell-tale sign for us was our disk write rate had a huge sawtooth pattern where every day it would start small and by the end of the day we were writing 3X more data to disk per second, even though our data comes in at a constant rate. We still see the sawtooth, and it has the same slope, but it only rises for an hour before returning to the low rate. This was *not* due to documents being moved.
> -- > You received this message because you are subscribed to the Google Groups "mongodb-user" group. > To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/oZGk8_J4zfcJ. > To post to this group, send email to mongodb-user@googlegroups.com. > To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
-- You received this message because you are subscribed to the Google Groups "mongodb-user" group. To post to this group, send email to mongodb-user@googlegroups.com. To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
On Thursday, 15 March 2012 20:58:28 UTC+1, Andy O'Neill wrote:
> We solved/worked around the problem by using hourly documents instead of > daily documents, with arrays that are smaller than 100 subdocuments, but > our data is 5 minute resolution so this may not work for you. Another > solution is to use a more deeply nested structure instead of an array, if > it suits your use case:
> In this case you can access any given minute's record using dot notation > (updates.12.17 for 12:17)
> I recommend writing a database-only benchmark program that writes data in > the format you intend to use, with the load you intend to handle, and see > if you notice performance issues, rather than building the whole app and > having to go back and change the data schema later.
If I understand correctly we should try to keep our arrays below 100 sub-documents. I will try your suggested method as well. I'm currently planning on how to do our test, and will include this pattern as well. You are totally right about not building the entire app first. I can remember some old projects in the past that were build that way. So that is why we now always try to simulate expected load, to see how the database is holding. I rather spend an extra week testing database load then having to start over from scratch 6 months later.
-- You received this message because you are subscribed to the Google Groups "mongodb-user" group. To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/dNqEYHCxO5oJ. To post to this group, send email to mongodb-user@googlegroups.com. To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
On Thursday, March 15, 2012 8:27:42 PM UTC-4, Tim Hawkins wrote:
> I think its required, think of what happens if you get two $push updates > to the same array very close to each other.
I see, I didn't realize the oplog had to be idempotent, I assumed it would rely on some kind of ordering, but now I've learned about the replayability of the oplog. In the comments for SERVER-3407<https://jira.mongodb.org/browse/SERVER-3407> Eliot suggests that oplog idempotency would not be required once journalling was in place. Does anyone know anything else about that? For our use-case it would certainly remove a massive amount of disk writes.
A related question: could using $addToSet instead of $push possibly improve the performance in this situation? Presumably that is idempotent without writing the entire array to the oplog, at the cost of having to seek through the existing elements before inserting. If so, it'd be nice to see that in the documentation. If I get a change I'll test it, but I'm interested to hear what everyone else thinks.
-- You received this message because you are subscribed to the Google Groups "mongodb-user" group. To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/1mrhULFpnTQJ. To post to this group, send email to mongodb-user@googlegroups.com. To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
On Fri, Mar 16, 2012 at 11:42 AM, Andy O'Neill <one...@energyhub.net> wrote:
> On Thursday, March 15, 2012 8:27:42 PM UTC-4, Tim Hawkins wrote:
>> I think its required, think of what happens if you get two $push updates >> to the same array very close to each other.
> I see, I didn't realize the oplog had to be idempotent, I assumed it would > rely on some kind of ordering, but now I've learned about the replayability > of the oplog. In the comments for SERVER-3407 Eliot suggests that oplog > idempotency would not be required once journalling was in place. Does anyone > know anything else about that? For our use-case it would certainly remove a > massive amount of disk writes.
> A related question: could using $addToSet instead of $push possibly improve > the performance in this situation? Presumably that is idempotent without > writing the entire array to the oplog, at the cost of having to seek through > the existing elements before inserting. If so, it'd be nice to see that in > the documentation. If I get a change I'll test it, but I'm interested to > hear what everyone else thinks.
> To post to this group, send email to mongodb-user@googlegroups.com. > To unsubscribe from this group, send email to > mongodb-user+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/mongodb-user?hl=en.
-- You received this message because you are subscribed to the Google Groups "mongodb-user" group. To post to this group, send email to mongodb-user@googlegroups.com. To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.