Best way to collect JVM / JMX telemetry?

868 views
Skip to first unread message

Kevin Burton

unread,
Dec 20, 2014, 12:36:57 PM12/20/14
to kairosd...@googlegroups.com
I figured you guys would have a solid answer for this.

What's the best way to collect JVM telemetry/stats for daemons (via JMX I assume).

I'm just talking about basic GC stuff.  Like the number of threads, memory / heap utilization.  Off heap memory utilization (via direct buffers).  GC stats including gc_util (amount of time spent in GC) .. number of GCs over time, etc.  

It doesn't look like tcollector has many options:


I was using codahale metrics for this.  but the problems is that this only works for daemons I control.  So external daemons like cassandra, elasticsearch, activemq, etc. These all need to be monitored via JMX since I can't easily jump in and patch them.

I could go the collectd route but then I'm running both tcollector and collectd and then I have to use the collectd kairosdb plugin (not sure if it's stable).  I guess that's not the end of the world if it works though because collectd has a ton of plugins.

Brian Hawkins

unread,
Dec 21, 2014, 12:23:00 AM12/21/14
to kairosd...@googlegroups.com
I answered this in my other post.  We use collectd and Jeffs modified collectd plugin for kairos https://github.com/jsabin/collectd-kairosdb

Brian

allan bailey

unread,
Dec 22, 2014, 10:25:09 AM12/22/14
to kairosd...@googlegroups.com
There's also the Jolokia plugin that you can use to scrape the java
beans/jmx info from. It installs into a running process.

I've used it to collect the jmx data from cassandra.

-allan
--
Allan Bailey
zirpu...@gmail.com

There are 2 hard problems in computer science:
caching, naming, off-by-1 errors.

Will Fraley

unread,
Dec 22, 2014, 1:53:25 PM12/22/14
to kairosd...@googlegroups.com
We use OpenNMS to collect data from JMX/SNMP devices (e.g. JVMs, dumb switches, PDUs, UPS's, load balances, appliances etc).  Only gotcha is that OpenNMS exports the data in google protocol buffer format (http://www.opennms.org/wiki/Performance_Data_TCP_Export) so we have a small piece of java middleware that ingests the metrics coming in and converts them to Kairos style metrics with appropriate format, tags, etc.  

We end up with heap graphs like this:

Example (from a whacky app) CMS GC rate graph:


JVM related bits I pull via OpenNMS are:
  • Garbage Collection
    • Young Generation Collectors 
      • Copy
      • PS Scavenge
      • ParNew
      • YoungOptimized (jrockit)
    • Old Generation Collectors
      • MarkSweepCompact
      • PS MarkSweep
      • ConcurrentMarkSweep
      • OldOptimized (jrockit)
    • Metrics we collect for each garbage collector type above
      • CollectionCount (incrementing counter)
      • CollectionTime (incrementing counter)
      • Last GC thread count (gauge)
      • Last GC duration (gauge - milliseconds)
      • Last GC end time (gauge)
  • JVM Memory (Metrics below collected separately for each of: JVM global, PS Eden Space, PS Survivor Space, PS Perm Gen, PS Old Gen, Par Eden Space, Par Survivor Space, CMS Perm Gen, CMS Old Gen)
    • Memory usage committed (gauge)
    • Memory usage init (gauge)
    • Memory usage max (gauge)
    • Memory usage used (gauge)
  • JVM Threading
    • ThreadCount (gauge)
    • PeakThreadCount (gauge)
    • DaemonThreadCount (gauge)
    • CurrentThreadCpuTime (gauge)

All quite easy with OpenNMS as outlined here:


I have a request in to open source the middleware queuing system that does the NMS->KairosDB translation so I'll make a note here if that ever gets approved.

Kevin Burton

unread,
Dec 22, 2014, 3:08:09 PM12/22/14
to kairosd...@googlegroups.com
so basically you install the plugin and then get data out as json?  

Kevin
Reply all
Reply to author
Forward
0 new messages