database-level fragmentation statistics

253 views
Skip to first unread message

Charity Majors

unread,
Mar 7, 2013, 7:04:06 PM3/7/13
to mongod...@googlegroups.com
Hi,

Is there any way to tell how fragmented a database is?  I'm familiar with the collection-level padding factor, but I don't see anything in db.stats() that would help me calculate or infer how compacted or fragmented the db is.

The reason I ask is that we started seeing spikes in latency that correlate to lines like this, with a large number of objects scanned:

Thu Mar  7 14:22:46 [conn414392] warning: newExtent 169465 scanned

We then compacted all the collections in the databases that were reporting a large number of objects scanned, and snapshotted/rotated in a new primary.  But the number of newExtents scanned hasn't gone down any since the compaction, which is confusing.  We're still seeing the same spikes.

I'd like to be able to tell how effective the database compactions were -- how fragmented they were before the switch, and how fragmented they are after.  Is there any way to do that?

Asya Kamsky

unread,
Mar 10, 2013, 7:38:52 PM3/10/13
to mongod...@googlegroups.com
You say you compacted some of the collections in the database but then ask how to tell how effective the database compaction was.

The problem is that compact command works on the collection level, not the DB level.  If you want to repair a database (which is like compacting every collection and then compacting the database - sort of) then you would use the repairDatabase command (http://docs.mongodb.org/manual/reference/command/repairDatabase/) - this will both rewrite the DB and will release the freed up space back to the operating system.  Compacting collections does not give the space back to the OS, it just gets reused when you continue writing to that collection/database.

Asya

Charity Majors

unread,
Mar 11, 2013, 2:05:50 PM3/11/13
to mongod...@googlegroups.com
What we're wondering is if there's any way to tell, in aggregate, how fragmented a database is.

There doesn't even seem to be any way to really tell how fragmented a collection is, other than guessing based on the per-collection padding factor (which can change quickly from moment to moment).  What would be nice is the equivalent of the redis fragmentation ratio, or the percona plugins that let you know how much fragmentation was scanned by an individual explain query.  Does Mongo have anything that currently or planned in 2.4?


--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb
 
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Asya Kamsky

unread,
Mar 12, 2013, 7:07:41 AM3/12/13
to mongod...@googlegroups.com
Charity,

Since you can make a copy of your database files (snapshot of the file system would be the simplest way if your file system supports that) you can make a copy of your production files and bring up MongoDB 2.4.0-rc2 which is now available against that data.

You would then be able to use storage visualizer tools described here: http://blog.mongodb.org/post/36157256399/storage-viz-storage-visualizers-and-commands-for though do note, please that these are not for usage on production system - that's why making a separate instance with copies of your data directory is mandatory.

Please let us know if the tools help - while they are an experimental part of MongoDB it would be great to have feedback about their usefulness.

Asya

Charity Majors

unread,
Mar 15, 2013, 10:18:09 PM3/15/13
to mongod...@googlegroups.com
Hmm.  Couldn't get the storage visualizers to work.  Kept complaining that the database didn't exist.  If I ran lynx against the url generated by the tool, it said this:

{ "$err" : "Invalid ns [data.]", "code" : 16256 }

even though the db definitely does exist.

However, the command line tools were pretty neat.  I played around a bit with getDiskStorageStats() and getPagesInRAM().  These look pretty great, definitely a step in the right direction.  Unfortunately it doesn't really give us what we need right now, which is per-database fragmentation stats, not per-collection fragmentation stats.

I feel like this will be pretty useful for most people, who find themselves deleting things from collections quite often.  Slightly less useful to our use case, where we have deleted huge numbers of collections from the database.

Thanks.  Was pretty neat to see our data successfully load under 2.4.0rc3.  :)
Reply all
Reply to author
Forward
0 new messages