database size limit and file size limit

Oliver

unread,

May 10, 2011, 12:27:48 PM5/10/11

to mongod...@googlegroups.com

hi, all

Very new to mongoDB, sorry that I have to ask such an _FAQ_ item, but I couldn't get a clear picture by searching the past postings and online doc.

Here are the assumptions of my operational environment: 64-bit OS, with Lustre file system currently supporting 250 TB storage space, and future expansion to PB size.

We have 'fat' node available for MongoDB test, with at least 16GB memory and possibly more.

my question are actually two folds:

- what are _theoretical_ limit of MongoDB's database size and the binary file size you can store in it?

- what are the practical limit (again for both db size and binary file size) given my particular configuration, and what are the limiting factors of it? The previous post suggests a link to the actual deployed production system, which quote the number of 1TB - 3TB per node, is that really the practical limit?

TIA

Oliver

Mathias Stearn

unread,

May 10, 2011, 2:05:27 PM5/10/11

to mongod...@googlegroups.com

Each data file will never be more than 2GB. We use a signed 32bit int
to index inside of each file so we cant use more than 2^31 bytes. We
have a soft limit of 16000 files per db [1] which limits you to 32TB
in a single DB per process. It is mostly just used as a sanity check,
so you could easily change that number and recompile to raise that
limit. That said, you will soon run into the 48bit virtual address
space limitation of current x86_64 cpus, of which only 47 bits is
available to user space, leading to a hard limit of just under 128TB.
Also, if you plan to use durability, we double-map every file so you
are limited to around 64TB. Of course no one is anywhere near that
limit on a single node. Once sharded, there isn't a hard theoretical
hard limit to the number of nodes, but we aren't sure the current
implementation will scale much past 100 shards and we think the
current design will be fine until about 1000 shards, but it hasn't
been tested at that scale.

On a practical level, you are more likely to be performance limited by
disk IOPS and RAM. If you are primary doing archival, where data is
never updated, and your primary index is ordered by insert time
(ObjectID is designed around this property), and you mostly access the
recent end of your data, you should be able to pack several TB onto a
single high-end node. However, if you do frequent random access, you
will need a much higher ram-to-data ratio.

[1] https://github.com/mongodb/mongo/blob/master/db/diskloc.h#L49

> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/mongodb-user?hl=en.
>

Oliver

unread,

May 10, 2011, 2:17:48 PM5/10/11

to mongod...@googlegroups.com

Mathias,

Appreciate the detailed reply. Some further clarification questions:

- when you say "each data file will never be more than 2GB" - you are talking about the file stored inside the database, not the database file itself, correct?

- and this apply to even 64-bit OS as well?

- Is there a reason why you use 32-bit signed to index it in the first place? The dataset we are dealing with are mostly from scientific community from large simulation model runs, where 2GB is not uncommon at all ... usually write once and read multiple times. I'd love to storage meta-data and raw-data in a single place, but I guess this is not the use case MongoDB is designed for?

Thanks

Oliver

Mathias Stearn

unread,

May 10, 2011, 2:29:07 PM5/10/11

to mongod...@googlegroups.com

I mean a single file in the file system. A database in MongoDB spans
multiple files, so this doesn't impose a limit on data set size.

Yes this applies to 64-bit OSs as well. We use the same data-file
format on all platforms so you can freely move your files around.

The 32-bit offset is mostly a space optimization. It allows us to
point to anywhere within a single db using a 64-bit DiskLoc stuct with
a 32-bit file number and a 32-bit offset into that file. This is what
mongo uses internally when storing "pointers" to disk.

There is no reason you can't store huge data sets in MongoDB. We have
some users with multiple TB of data.

Reply all

Reply to author

Forward