db size

16 views
Skip to first unread message

orlin

unread,
Aug 11, 2010, 12:39:04 PM8/11/10
to mongodb-user
How much DB storage space is needed to store 2 billion floating point
numbers (everything completely indexed)? Would I have to shard? It's
data that doesn't change - once generated, I just need to search it
fast. Basically compare >= and <= (possibly several ranges in the
same query) and maybe also for +/- (sign).... At what size do you
recommend sharding and can I run the shards on the same physical
machine - several servers? I need a cheap solution for the time
being. The 2 billion was an example of what I may start out with.
But what about 10 - 20 times that? 20 to 40 billion float numbers
would make me quite happy. How much RAM would I need? Should the
hardware upgrade be linear - moving from the 2 to 20 billion floats
means 10-times more powerful server is necessary? I'm thinking of a
Linode box...

Orlin

Kyle Banker

unread,
Aug 11, 2010, 2:44:51 PM8/11/10
to mongod...@googlegroups.com
The amount of space really depends on how big your documents are. What do your documents look like? Each float will take up 8 bytes, and you need to add the overhead for a document, the _id, the index on _id, and the index on the float field. With 2 billion floats, I'd roughly estimate 30-40 gigs of data, minimum (again, this can vary a lot depending on how you structure your documents, how long your key names are, etc. You should use short key names). With that much data, I don't think you'd need to shard, especially if you start with a medium-sized linode box. You at least want to keep indexes in RAM. As long as you can accomplish that with a single box, you shouldn't need to shard. You want to keep indexes in RAM, at minimum. If that can't be accomplished on a single node, then you can shard. Scaling up should be roughly linear.


--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


orlin

unread,
Aug 11, 2010, 4:37:45 PM8/11/10
to mongodb-user
The _id is a fixed string = 16 characters. The floats are equally
divided into 22 fields. There is nothing else to the structure (i.e.
it's a flat document). I just figured out that I only need indexes
for half of them. So let's say only 1 billion would be indexed (= how
much RAM?). That's for 11 float fields - does that mean more memory
(as compared to a single index 11 times bigger)? The indexed fields
have 2 characters for names and the non-indexed 3 characters. I
appreciate that you're helping out with knowing what to expect.

I plan to run Node.js for queries against this Mongo data and also
Redis on the same server. Obviously, Redis needs memory too. So it's
important to know how much Mongo will use up. Which is the medium
http://linode.com box? They are actually named based on the amount of
RAM and there are others - bigger than the ones on the home page. At
first glance, there doesn't seem to be any cost savings -- but that
actually depends on how much memory Mongo needs and how much I'll have
left after I pick the appropriate plan. Disk space is obviously not
an issue with VPS. Is there a reason I may not want to run Mongo and
Redis on the same box?
> > mongodb-user...@googlegroups.com<mongodb-user%2Bunsubscribe@google groups.com>
> > .

orlin

unread,
Aug 11, 2010, 4:50:44 PM8/11/10
to mongodb-user
To add to the above (just in case) -- there are about 90 million
documents (2 billion / 22). Call it 100 million if it makes
calculations easier. And there are 12 indexes. 11 indexed float
fields + 1 for the 16-character _id.

Thanks,

Orlin

Kyle Banker

unread,
Aug 11, 2010, 5:10:54 PM8/11/10
to mongod...@googlegroups.com
To get the most accurate number, you should build a small prototype. That'll give you the best idea of how big the various collections and indexes are likely to be.

The problem with running Redis and MongoDB on the same node is that they will contend for RAM.  If the instance has enough RAM to accomodate both databases with their intended usages and data sizes, then go for it.  But in a serious production situation, you'd probably want separate servers for these.


Orlin

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.

orlin

unread,
Aug 11, 2010, 5:50:43 PM8/11/10
to mongodb-user
On Aug 11, 8:10 pm, Kyle Banker <k...@10gen.com> wrote:
> To get the most accurate number, you should build a small prototype. That'll
> give you the best idea of how big the various collections and indexes are
> likely to be.

I don't need complete estimation accuracy. Just some guess about how
much memory MongoDB will want for this. Any Ideas? Even with a
prototype - how can I tell how big the indexes are (when fully in
memory)? Create the collection, put 10 records in it, get some
numbers (from where) and multiply by 10 million (for the "grows
linearly")? There is the footprint of MongoDB itself... Which is a
constant for any given version? Suppose there are no other
collections. How does it add up?

Anyone has a MongoDB calculator? Give it a model & number of
documents -- and it tells you how much resources that would take :)
Reply all
Reply to author
Forward
0 new messages