Thanks for the write up. A quick review of some of the monitors though having checked out the plugin and what it looks at:
Look Good, will likely work as expected:
Replica State, Connections, Flush Time (though you might want to tweak the flush time numbers to match your disk sub system - they will vary largely based on what you are using)
Potentially Good - need tweaking:
Replication Lag - alarming should be set relatively high since the optime, which this check uses, may not always be up to date on the server polled for remote replica set members - i.e. it will occasionally report a non-zero value when there is in fact no lag
Arbitrary - may not be meaningful:
The collection count, database count and the index miss alarms are going to be highly situational based on your usage and schema, but I suppose they will catch things like a fat finger creating lots of collections/dbs etc.
Possibly bad, potential for a lot of false positives:
Memory Usage - this check only uses resident memory and makes a bad assumption about how resident memory is used in MongoDB. The resident memory size will always grow (assuming data is continually added/read) unless mongod is restarted. This is not a problem, simply the way memory mapped files interact with the OS and are paged in/out as needed. Eventually, given enough time, this will always alarm unless set to a meaningless value greater than available RAM.
Lock Time - this check is going to be tricky to alarm on, because it will spike with write volume, and usually be higher on the primary of a set than secondaries. I'm not sure this will be very helpful as an alarm and will probably have a high level of false positives.
Adam.