Compression of Mongodb on Netapp storage : strange reaction

99 views

Skip to first unread message

Eric Franckx

unread,

Apr 7, 2014, 3:21:55 AM4/7/14

to mongod...@googlegroups.com

Hi,
we made some test on Mongodb to see how it reacts in our landscape.
We have install :

Mongodb 2.6 rc1
Suse Linux 11 SP3 64 bits in a VM with 4 CPU and 20 GB or RAM
A NEtapp volume of 900 GB
Create a collections with 1 billions docs with average size : 0.5KB

The collection had 1 billions docs and 2 or 3 index (included the default _id).

We have create one on some fields and each index was using between 19 and 30 GB.

db.a.stats()

{

"ns" : "pex.a",

"count" : 1000000000,

"size" : 496000000000,

"avgObjSize" : 496,

"storageSize" : 513716248896,

"numExtents" : 260,

"nindexes" : 5,

"lastExtentSize" : 2146426864,

"paddingFactor" : 1,

"systemFlags" : 0,

"userFlags" : 1,

"totalIndexSize" : 249163877984,

"indexSizes" : {

"_id_" : 32464467728,

"CustID_1_employeeid_1" : 78537625824,

"EventID_1" : 33235791568,

"Evtdate_1" : 27904459072,

"CustID_1" : 77021533792

"ok" : 1

}

db.stats()

{

"db" : "pex",

"collections" : 3,

"objects" : 1000000012,

"avgObjSize" : 495.99999507200005,

"dataSize" : 496000001024,

"storageSize" : 513716265280,

"numExtents" : 262,

"indexes" : 5,

"indexSize" : 249163877984,

"fileSize" : 774797000704,

"nsSizeMB" : 16,

"dataFileVersion" : {

"major" : 4,

"minor" : 5

"extentFreeList" : {

"num" : 0,

"totalSize" : 0

"ok" : 1

}

The complete database used 770GB on disk.

We have made a test : activate dedup on the Netapp volume. The win in space was not very great:

Then we made a test on compression on the volume level: and there the win was huge:

Very good ..

So now the performance impact, how react mongodb when we try a find ...

Surprise .... when I start the first find() --> normally on an indexed field, it was running, running .. then try to investigate a little:

db.a.getIndexes() --> no indexes anymore, strange ...

db.a.stats() --> oups only 79 millions docs where are my 1 billions docs ?????

db.a.count()
79547252

So to continue some test (and have update a Jira at Mongodb.com with the same info) I have create again :-( an index on a field ... take time to index 1 billions rows ... in fact 79 millions now

So start the command db.a.ensureIndex({EventID: 1}) and start the generation of the index ... wait to reach 100% for the 78 millions docs ... WAIT: it doesn't stop it continues ... 101% ... 1256% and continue and stop when : when it reach 1 billions rows ....

2014-04-02T08:48:36.017+0200 [conn1]            Index Build: 79418500/79547252 99%
2014-04-02T08:48:39.014+0200 [conn1]            Index Build: 79553000/79547252 100%
2014-04-02T08:48:51.548+0200 [conn1]            Index Build: 79590800/79547252 100%
2014-04-02T08:48:54.012+0200 [conn1]            Index Build: 79842700/79547252 100%
2014-04-02T08:48:57.013+0200 [conn1]            Index Build: 80429000/79547252 101%
2014-04-02T08:49:09.099+0200 [conn1]            Index Build: 80854100/79547252 101%
2014-04-02T08:49:12.014+0200 [conn1]            Index Build: 81354900/79547252 102%

2014-04-02T21:27:47.443+0200 [conn1]            Index Build: 999305900/79547252 1256%
2014-04-02T21:27:50.015+0200 [conn1]            Index Build: 999672800/79547252 1256%
2014-04-02T21:28:20.017+0200 [conn1]            Index: (2/3) BTree Bottom Up Progress: 1727800/1000000000       0%
2014-04-02T21:28:30.008+0200 [conn1]            Index: (2/3) BTree Bottom Up Progress: 4027300/1000000000       0%
2014-04-02T21:28:40.280+0200 [conn1]            Index: (2/3) BTree Bottom Up Progress: 6417700/1000000000       0%

Any idea ?

I think Mongodb is lost due to compression on storage level and use some Linux/OS command to compute some information.

Regards,

Eric

This email is sent on behalf of Northgate Information Solutions Limited and its associated companies ("Northgate") and is strictly confidential and intended solely for the addressee(s).

If you are not the intended recipient of this email you must: (i) not disclose, copy or distribute its contents to any other person nor use its contents in any way or you may be acting unlawfully; (ii) contact Northgate immediately on +44 (0)1442 232424 quoting the name of the sender and the addressee then delete it from your system.

Northgate has taken reasonable precautions to ensure that no viruses are contained in this email, but does not accept any responsibility once this email has been transmitted. You should scan attachments (if any) for viruses.

David Hows

unread,

Apr 13, 2014, 8:48:54 PM4/13/14

to mongod...@googlegroups.com

Hi Eric,

MongoDB does not do any testing with systems such as NetApp and as such Your Mileage May Vary with their usage.

As you are aware, MongoDB memory maps its entire file storage into RAM and then pages documents in and out of RAM as needed. If the filer is not 100% transparent as to what is deduplicated then you should expect issues such as this to arise - as the filesystem is not guaranteed to be as MongoDB expects it.

If you were building an index when you enabled de-dupe and compression on the file level, I could believe that the pages in memory and on disk now differ and this would be the source of your problems.

What happens if you restart your server (or flush the OS page cache) and MongoDB service? Does that fix things?

Thanks,

David

Reply all

Reply to author

Forward

0 new messages