The collection had 1 billions docs and 2 or 3 index (included the default _id).
We have create one on some fields and each index was using between 19 and 30 GB.
db.a.stats()
{
"ns" : "pex.a",
"count" : 1000000000,
"size" : 496000000000,
"avgObjSize" : 496,
"storageSize" : 513716248896,
"numExtents" : 260,
"nindexes" : 5,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1,
"systemFlags" : 0,
"userFlags" : 1,
"totalIndexSize" : 249163877984,
"indexSizes" : {
"_id_" : 32464467728,
"CustID_1_employeeid_1" : 78537625824,
"EventID_1" : 33235791568,
"Evtdate_1" : 27904459072,
"CustID_1" : 77021533792
},
"ok" : 1
}
db.stats()
{
"db" : "pex",
"collections" : 3,
"objects" : 1000000012,
"avgObjSize" : 495.99999507200005,
"dataSize" : 496000001024,
"storageSize" : 513716265280,
"numExtents" : 262,
"indexes" : 5,
"indexSize" : 249163877984,
"fileSize" : 774797000704,
"nsSizeMB" : 16,
"dataFileVersion" : {
"major" : 4,
"minor" : 5
},
"extentFreeList" : {
"num" : 0,
"totalSize" : 0
},
"ok" : 1
}
The complete database used 770GB on disk.
We have made a test : activate dedup on the Netapp volume. The win in space was not very great:
Then we made a test on compression on the volume level: and there the win was huge:
Very good ..
So now the performance impact, how react mongodb when we try a find ...
Surprise .... when I start the first find() --> normally on an indexed field, it was running, running .. then try to investigate a little:
db.a.getIndexes() --> no indexes anymore, strange ...
db.a.stats() --> oups only 79 millions docs where are my 1 billions docs ?????
db.a.count()
79547252
So to continue some test (and have update a Jira at Mongodb.com with the same info) I have create again :-( an index on a field ... take time to index 1 billions rows ... in fact 79 millions now
So start the command db.a.ensureIndex({EventID: 1}) and start the generation of the index ... wait to reach 100% for the 78 millions docs ... WAIT: it doesn't stop it continues ... 101% ... 1256% and continue and stop when : when it reach 1 billions rows ....
2014-04-02T08:48:36.017+0200 [conn1] Index Build: 79418500/79547252 99%
2014-04-02T08:48:39.014+0200 [conn1] Index Build: 79553000/79547252 100%
2014-04-02T08:48:51.548+0200 [conn1] Index Build: 79590800/79547252 100%
2014-04-02T08:48:54.012+0200 [conn1] Index Build: 79842700/79547252 100%
2014-04-02T08:48:57.013+0200 [conn1] Index Build: 80429000/79547252 101%
2014-04-02T08:49:09.099+0200 [conn1] Index Build: 80854100/79547252 101%
2014-04-02T08:49:12.014+0200 [conn1] Index Build: 81354900/79547252 102%
2014-04-02T21:27:47.443+0200 [conn1] Index Build: 999305900/79547252 1256%
2014-04-02T21:27:50.015+0200 [conn1] Index Build: 999672800/79547252 1256%
2014-04-02T21:28:20.017+0200 [conn1] Index: (2/3) BTree Bottom Up Progress: 1727800/1000000000 0%
2014-04-02T21:28:30.008+0200 [conn1] Index: (2/3) BTree Bottom Up Progress: 4027300/1000000000 0%
2014-04-02T21:28:40.280+0200 [conn1] Index: (2/3) BTree Bottom Up Progress: 6417700/1000000000 0%
Any idea ?
I think Mongodb is lost due to compression on storage level and use some Linux/OS command to compute some information.
Regards,
Eric
This email is sent on behalf of Northgate Information Solutions Limited and its associated companies ("Northgate") and is strictly confidential and intended solely for the addressee(s).
If you are not the intended recipient of this email you must: (i) not disclose, copy or distribute its contents to any other person nor use its contents in any way or you may be acting unlawfully; (ii) contact Northgate immediately on +44 (0)1442 232424 quoting the name of the sender and the addressee then delete it from your system.
Northgate has taken reasonable precautions to ensure that no viruses are contained in this email, but does not accept any responsibility once this email has been transmitted. You should scan attachments (if any) for viruses.