OpenTSDB metric design and HBase performance - 1000+ tag values

250 views
Skip to first unread message

Rauno Tuul

unread,
Jun 29, 2018, 9:06:53 AM6/29/18
to OpenTSDB
Hi,

I'm looking for some guidance on how to achieve low latency OpenTSDB 2.4 cluster on CDH5.15 (HBase 1.2.0, Hadoop 2.6.0).
To replace our old lagging cluster, I started off with empty cluster on 6 regionservers (6 cores, 24GB ram, 6 x 300G 10K SAS disks). The amount of unique metrics is really small (less than 50), but we have a lot of tag values (hosts, interfaces).
Tag 'host' has 4000 values and 'interface' 1700 values. We have approx 160000 unique host+interface tag pairs for metric interface.octets.in
Each metric is collected in 5 or 10 minute interval and is stored to OpenTSDB with append and salted UIDs. The amount of daily data appended to TSDB is approx 5..6GB (lzo compressed, replication=3).
In addition theres a tcollect running on each cluster member. The cluster is handling these constant writes really well and is not loaded at all.

Things get nasty, when I try fetch for example 6 metrics (interface.errors.in, interface.errors.out, interface.octets.in, interface.octets.out, interface.octets.out, interface.packets.in, interface.packets.out) with grafana:
- start 2d-ago
- host=cisco1
- interface=Gi0/1
It takes about 10..12 seconds to get the data for these 3 graphs.
Opentsdb "stats": "avgAggregationTime":0.685272,"avgHBaseTime":10851.092426,"avgQueryScanTime":8147.49039,"avgScannerTime":10851.133419,"avgScannerUidToStringTime":0.0,"avgSerializationTime":0.859616,"dpsPostFilter":1150,"dpsPreFilter":1150,"emittedDPs":1144,"maxAggregationTime":0.623238,"maxHBaseTime":8147.251419,"maxQueryScanTime":7452.411409,"maxScannerUidtoStringTime":0.0,"maxSerializationTime":0.743387,"maxUidToStringTime":0.152958,"processingPreWriteTime":8150.691069,"rowsPostFilter":98,"rowsPreFilter":98,"successfulScan":32,"totalTime":8152.314927,"uidPairsResolved":0}}
At the OS side, I see, that all cores on all regionservers nodes are 100% utilized by HBase. There isnt IOwait nor network congestion.


In the other hand, if I request graphs with fewer tagv's (for example tcollect's tsd.rpc.received), the graphs appear almost instantly:
"stats {"avgAggregationTime":2.104052,"avgHBaseTime":4.427962,"avgQueryScanTime":18.620267,"avgScannerTime":6.278119,"avgScannerUidToStringTime":0.0,"avgSerializationTime":2.127678,"dpsPostFilter":75612,"dpsPreFilter":75612,"emittedDPs":1435,"maxAggregationTime":2.104052,"maxHBaseTime":18.12299,"maxQueryScanTime":18.620267,"maxScannerUidtoStringTime":0.0,"maxSerializationTime":2.127678,"maxUidToStringTime":0.008968,"processingPreWriteTime":33.139573,"rowsPostFilter":1176,"rowsPreFilter":1176,"successfulScan":16,"totalTime":48.605377,"uidPairsResolved":0}

I tried to beef up the heap and cache sizes, but still not satisfied with the results. Maybe the hardware of the nodes is not sufficient for the task, even is the volumes and rates are modest. As we use TSDB appends, there are no major compactions either.
Cluster overall statistics:
- Requests Per Second: 10000..14000

Num. Regions: 34

Cluster configuration (the most important parameters only):
tsd.core.auto_create_metrics = true
tsd.core.meta.enable_realtime_ts = true
tsd.core.meta.enable_realtime_uid = true
tsd.storage.enable_appends = true
tsd.storage.salt.buckets = 16
tsd.storage.salt.width = 1
tsd.storage.use_otsdb_timestamp = false
tsd.uid.use_mode=true

HBase max heap = 16G
 hbase.hregion.memstore.chunkpool.maxsize =    0.6
 hbase.regionserver.handler.count =    60
 hbase.hregion.max.filesize =    2147483648
 hbase.client.write.buffer =    8388608
 hbase.rs.cacheblocksonwrite =    true
 hbase.rs.evictblocksonclose =    false
 hfile.block.bloom.cacheonwrite =    true
 hfile.block.index.cacheonwrite =    true
 hbase.block.data.cachecompressed =    true
 hbase.bucketcache.blockcache.single.percentage =    .50
 hbase.bucketcache.blockcache.multi.percentage =    .49
 hbase.bucketcache.blockcache.memory.percentage =    .01
 hbase.regionserver.global.memstore.size =    .3
 hfile.block.cache.size =    .5


I would like to know, is there a limit, from where the amount of unique tag values isn't reasonable anymore?    Or could someone advise a  configuration change, which would improve read performance?
Of course we could append the hostname to metric and achieve definetly better read response time, but this will exclude cross-host aggregation. (the aggregation isn't currently usable anyway, as we hit https://github.com/OpenTSDB/opentsdb/issues/839 ).

Thanks in advance,
--
rauno

stan.p....@gmail.com

unread,
Jul 6, 2018, 6:31:04 AM7/6/18
to OpenTSDB
Hey Rauno, your setup looks quite reasonable. We are still running OpenTSDB 2.3, I'd be interested to hear what your impressions from 2.4 are, in particular, have you had any issues with queries other than the performance problems mentioned in this thread? Do you use rollups and/or pre-aggregates?

Just to double-check for the usual suspects, do you notice any hotspot regions in the cluster, i.e. requests going to only a couple of the regions when the queries happen? Shouldn't be since you have salting on but just in case...

Could you please let me know if the following Cloudera metrics show any spikes when you run the queries?

ipc_process_time_99th_percentile

ipc_queue_time_99th_percentile

ipc_queue_rate

jvm_runnable_threads

jvm_timed_waiting_threads

I'm thinking having appends on, as well as enabling the realtime meta flags, could affect read performance. The IPC queue rate would be a good indication.

Another thing could be the Region Server handler queues get congested, the aforementioned CDH metrics should give you a clue.

Do you run a single instance of the queries for each metric or do the dashboards have graphs with the same metrics? Querying the same data many times in parallel may degrade read performance in some rare cases.

Rauno Tuul

unread,
Jul 9, 2018, 7:37:24 AM7/9/18
to OpenTSDB
Hi,

There was no technical reason for choosing 2.4 - as I'm setting up a fresh cluster, I chose the latest available release. No other issues, only these which are mentioned above.
We started off with salting, the hotspot isn't the case - all hbase regionservers were equally under load.

A few days ago I truncated the hbase tables and shifted hostname to the metrics. (recommended here: http://opentsdb.net/docs/build/html/user_guide/writing/index.html "Shift to metric"). Instead of one metric, I have now 7000+  "interface.octets.out.*" metrics. I also lost the historical tcollector performance data, so I can't validate the peak usage on IPC queues etc.
I know that I lost ability to aggregate metrics cross hosts, but 99% of our needs doesn't require it either, so we prefer performance.

The query performance is now roughly 10..15 times better. Every query is responded in less than a second and all grafana dashboards are loaded within couple of seconds. For example loading a dashboard for all ports per single switch (22 ports, for each port 3 graphs (octets, packets, errors), from 5d-now), takes less than 10 seconds in total and theres a lot of browser latency. Earlier it took at least a minute for 2 days data...

Most common dashboard use case is a dashboard with 3..4 graphs for single port (traffic, packets, errors, signal quality). Each metric is used mostly once. Packets rate metrics have additional tags e.g. unicast/multicast/broadcast. But requesting in parallel wasn't the issue.

Our old cluster is not using appends and the compaction durations are nearly 40 minutes in each hour. Trying to switch on appends, resulted in massive write latency and the overall performance went even worse. Then we decided to build a new cluster.
The newer cluster with better hardware can handle the appends quite well and also the realtime meta flags are on.
I use the variable query in grafana to get a list of all interfaces per host: Query: tag_values(interface.octets.out.$host, interface) and Regex: /.*\/.*/

regards,
--
rauno
Reply all
Reply to author
Forward
0 new messages