I do quite a bit of monitoring of our *many* Isilon clusters where I work. I have a push to Graphite script that touches on the capacity bits. Not using the API due ideological reasons (push vs. pull holy war)
https://gist.github.com/scr512/3798123b739ce5da5339
We primarily use Grafana, Tessera and Dashing for our visualization and dashboard needs. For threshold alerting we use Seyren and we're evaluating Boson and Reimann for more advanced alerting.
All of these (except Reimann as it's a stream processing tool) use the Graphite stored data as the source of truth.
--
Hi john. Fellow Isilon user here. I have been looking at the API for little bit to see if I can extract some stats from it to put though a monitoring and alerting system.If you don't mind me asking, what stats do you pull via the rest api? I have grabbed a bunch of stats via snmp, but some are not available that way.One thing I was looking to monitor is namespace reads, as we have found when they are too high our cluster feels sluggish to the userbase and the complain.I was trying to monitor this: /platform/1/statistics/current?key=node.ifs.heat.getattr.total&devid=allI think I have the right thing to monitor, but if I compare it to the stats I get from insightiq for the same time periods, they are not close, which makes me thing I somehow have thewrong thing, or I need to do some math on the resulting polled number.