How are you monitoring the Hbase cluster behind OpenTSDB?

Adam Steffes

unread,

Feb 19, 2014, 9:00:23 PM2/19/14

to open...@googlegroups.com

How are folks monitoring the Hbase cluster backing their OpenTSDB installation? We're concerned that pumping the hundreds of metrics from each regionserver into OpenTSDB and then into Hbase will cause a feedback loop and drive Hbase load even higher. Adding nodes to the cluster doesn't seem like it will help since each new regionserver will simply add more metrics to store.

Hbase was build with native Ganglia and Graphite support for its own metrics but we really don't want to manage our systems with multiple tools. Also, a separate (from production apps) Hbase cluster for OpenTSDB would still need to be monitored itself.

So what are all of you doing to monitor Hbase?

-Adam

Stephen Wood

unread,

Feb 20, 2014, 1:31:50 PM2/20/14

to Adam Steffes, OpenTSDB

We monitor host level metrics on individual servers in Zabbix. With Zabbix it's simple monitoring (load, cpu, etc).

We store the data for a short period in Zabbix so that server is light-weight. It's really there just to give us alerts for anomalous behavior (high cpu steal time, services flapping, OOM-killer, etc).

That being said we've used the built-in ganglia support with Hbase and like it. It's just missing the alerting component.

--

Stephen Wood

www.heystephenwood.com

Eric Newton

unread,

Feb 20, 2014, 2:44:23 PM2/20/14

to Adam Steffes, OpenTSDB

I would guess that HBase can take at least 10K updates per node *per second*. A few hundred updates every few minutes isn't going to generate much of a load.

-Eric

Jesus Orosco

unread,

Feb 21, 2014, 2:12:38 PM2/21/14

to open...@googlegroups.com

Were also using Zabbix for host level stuff. Beyond that I'm scrapping the hbase status page and jmx output and sending those stats up using zabbix_sender. Request count per server has proved invaluable.

Kevin Ortman

unread,

Feb 21, 2014, 9:11:19 PM2/21/14

to Adam Steffes, open...@googlegroups.com

We use Cloudera.

https://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/latest/Cloudera-Manager-Diagnostics-Guide/Cloudera-Manager-Diagnostics-Guide.html

Thanks,
Kevin

--
Sent from my iPhone

Venkata Laxmi N Ganji

unread,

Mar 3, 2015, 5:55:37 PM3/3/15

to open...@googlegroups.com

Hello Jesus, Can you share some resources about your jmx based solution for hbase.

Pradeep Chhetri

unread,

Mar 5, 2015, 5:54:27 AM3/5/15

to Venkata Laxmi N Ganji, OpenTSDB

Hi,

If you look at tcollector's hbase and hadoop collector scripts (eg. https://github.com/OpenTSDB/tcollector/blob/master/collectors/lib/hadoop_http.py)

It basically hit /jmx endpoint of hadoop hdfs namenode, datanode, hbase-master and hbase-regionservers http endpoints.

Eg. if your hbase master is hosted at http://hbase-master.xyz.com:60010/master-status. Then you can basically hit http://hbase-master.xyz.com:60010/jmx to get jmx stats for hbase-master. Similarly for others as well. I guess this jmx endpoint came from hadoop2.0.

Thank you.

--

Pradeep Chhetri

In the world of Linux, who needs Windows and Gates...

Jonathan Creasy

unread,

Mar 5, 2015, 4:57:53 PM3/5/15

to Pradeep Chhetri, Venkata Laxmi N Ganji, OpenTSDB

I don't have it in front of me, but off the top of my head these metrics are on our TSDB dashboard:

uid's used (rate)

TSDB HTTP Latency

TCollector Lines Sent (4 lines)

Average HBase Latency (50th, 75th, 95th percentiles)

TSDB Graph cache (hits/misses)

Compaction Queue Size

Datanode Disk IO (read / write requests)

Namenode DIsk IO (read / write requests)

Hadoop Datanodes (HDFS Blocks Read/Written)

Hadoop Datanodes (HDFS Bytes Reqd/Written)

1m Load Max (namenodes, datanodes, edgenodes)

1m Load Avg (namenodes, datanodes, edgenodes)

There may be a few others.

Reply all

Reply to author

Forward