Understanding Collections Stats

287 views
Skip to first unread message

A. Jalil @AJ

unread,
Oct 8, 2015, 6:29:06 PM10/8/15
to mongodb-user
Hello,

I ran the command below to get stats on my collections, I'd like to understand some of these parameters please. I am using this collection salesdata as an example. 
Please see my questions below:

> db.printCollectionStats()


salesdata
{
        "sharded" : false,
        "primary" : "rs1",
        "ns" : "sales.salesdata",
        "count" : 3,558,022,                          
        "size" : 1,764,778,912,                                
        "avgObjSize" : 496,
        "storageSize" : 9,305,935,856,
        "numExtents" : 25,
        "nindexes" : 4,
        "lastExtentSize" : 2146426864,
        "paddingFactor" : 1,
        "systemFlags" : 1,
        "userFlags" : 1,
        "totalIndexSize" : 1142252608,
        "indexSizes" : {
                "_id_" : 116965856,
                "uniqueIdx" : 629306720,
                "orgId" : 178907232,
                "rawFileId" : 217072800
        },
        "ok" : 1,
        "$gleStats" : {
                "lastOpTime" : Timestamp(0, 0),
                "electionId" : ObjectId("5615777c6fb113b4ad702911")
        }
}


>> "count" : 3,558,022:                 is the number of documents within collection <salesdata>   ?  
>> "size" : 1,764,778,912:                  is this the actual size of  my collection <salesdata>  ? 
>>  "storageSize" : 9,305,935,856,     is this the storage size that is actually taken by my collection <salesdata>  or the storage that allocated  to my collection <salesdata> ? if not, how do find the                                                          actual size of this collection from DB level and not from OS level..

>> The above stats was generate by this command:  db.printCollectionStats()  which gave me stats for all collections within my collection <salesdata>, but how do I get similar stats for only top 10 largest collections ?


Thank you !
@AJ

Stephen Steneker

unread,
Oct 8, 2015, 10:21:03 PM10/8/15
to mongodb-user

On Friday, 9 October 2015 09:29:06 UTC+11, A. Jalil @AJ wrote:

I ran the command below to get stats on my collections, I'd like to understand some of these parameters please. I am using this collection salesdata as an example. 
Please see my questions below:

> db.printCollectionStats()

Hi AJ,

The output of this shell helper comes from the collStats command; the documentation describes the common fields:
   http://docs.mongodb.org/manual/reference/command/collStats/#output

Note: the fields in collection stats may vary based on your version of MongoDB and the configured storage engine.

You can have a look at how the shell helper is implemented by running the command in the shell without the parentheses, eg:

> db.printCollectionStats
function (scale) {
    if (arguments.length > 1) {
        print("printCollectionStats() has a single optional argument (scale)");
        return;
    }
    if (typeof scale != 'undefined') {
        if(typeof scale != 'number') {
            print("scale has to be a number >= 1");
            return;
        }
        if (scale < 1) {
            print("scale has to be >= 1");
            return;
        }
    }
    var mydb = this;
    this.getCollectionNames().forEach(
        function(z) {
            print( z );
            printjson( mydb.getCollection(z).stats(scale) );
            print( "---" );
        }
    );
}

>> The above stats was generate by this command:  db.printCollectionStats()  which gave me stats for all collections within my collection <salesdata>, but how do I get similar stats for only top 10 largest collections ?

Write a script for the mongo shell (or using your favourite programming language) to iterate all the collections, sort by your definition of “large” (eg. by data size, count, or some other criteria), and then limit to the top 10 results.

A quick way to do this would be to save the output of collection stats into a temporary collection and then use normal MongoDB queries (find, aggregate) to filter and/or manipulate the data:

// Save details in a "temp.collstats" collection
var collStats = db.getSiblingDB('temp').collstats;

// Clear old stats
collStats.drop();

// Get stats for all collections in the current database
db.getCollectionNames().forEach(
    function (collName) {
        // Ignore MMAP system.* collections
        if (!collName.startsWith("system.")) {
           collStats.insert({ _id: collName, stats: db.getCollection(collName).stats()});
        }
    }
);

// Find top 10 collections by storage size
collStats.find(
   {},
   { "_id": 1, "stats.count": 1, "stats.size": 1, "stats.storageSize": 1}
).sort({ "stats.storageSize" : -1}).limit(10);

// Find top 10 collections by document count
collStats.find(
   {},
   { "_id": 1, "stats.count": 1, "stats.size": 1, "stats.storageSize": 1}
).sort({ "stats.count" : -1}).limit(10);

I've only done a quick test to make sure my example JS works (MongoDB 3.0 shell). Depending on your collection and index names, you may find you need to cleanup some stats data to insert results as a MongoDB document. For example, auto-generated index names for embedded fields can include "." that would need to be replaced with a valid character for a field name (see also: http://docs.mongodb.org/manual/core/document/#field-names).

Hope that helps set you on the right path ;-).

Regards,
Stephen

A. Jalil @AJ

unread,
Oct 9, 2015, 2:53:17 PM10/9/15
to mongodb-user
Thanks Stephen ! this is very helpful indeed..

I am just trying to figure out why I am getting the error below when I tired to run one of the queries you posted.. I am  using v2.6.4 - not sure if this has to do with the my mongo vesion..

mongos> collStats.find(
...    {},
...    { "_id": 1, "stats.count": 1, "stats.size": 1, "stats.storageSize": 1}
...  ).sort({ "stats.storageSize" : -1}).limit(10);
2015-10-09T13:39:58.900-0500 ReferenceError: collStats is not defined


Then, I replaced double quotes with single quotes, but still the same issue.

mongos> collStats.find(
...    {},
...    { '_id': 1, 'stats.count': 1, 'stats.size': 1, 'stats.storageSize': 1}
... ).sort({ 'stats.count' : -1}).limit(10);
2015-10-09T13:49:45.080-0500 ReferenceError: collStats is not defined


Thank you !

mongos> collStats.find(
...    {},
...    { '_id': 1, 'stats.count': 1, 'stats.size': 1, 'stats.storageSize': 1}
... ).sort({ 'stats.count' : -1}).limit(10);
2015-10-09T13:49:45.080-0500 ReferenceError: collStats is not defined

Stephen Steneker

unread,
Oct 9, 2015, 4:57:55 PM10/9/15
to mongodb-user

On Saturday, 10 October 2015 05:53:17 UTC+11, A. Jalil @AJ wrote:
Thanks Stephen ! this is very helpful indeed..

I am just trying to figure out why I am getting the error below when I tired to run one of the queries you posted.. I am using v2.6.4 - not sure if this has to do with the my mongo vesion..

Hi AJ,

collStats is a variable that I defined at the top of the example JavaScript; you also need to run the code to iterate and save the stats:

// Save details in a "temp.collstats" collection
var collStats = db.getSiblingDB('temp').collstats;

// Clear old stats
collStats.drop();

// Get stats for all collections in the current database
db.getCollectionNames().forEach(
    function (collName) {
        // Ignore MMAP system.* collections
        if (!collName.startsWith("system.")) {
           collStats.insert({ _id: collName, stats: db.getCollection(collName).stats()});
        }
    }
);

Regards,
Stephen

Reply all
Reply to author
Forward
0 new messages