Hi,
I'm looking for some guidance on how to achieve low latency OpenTSDB 2.4 cluster on CDH5.15 (HBase 1.2.0, Hadoop 2.6.0).
To replace our old lagging cluster, I started off with empty cluster on 6 regionservers (6 cores, 24GB ram, 6 x 300G 10K SAS disks). The amount of unique metrics is really small (less than 50), but we have a lot of tag values (hosts, interfaces).
Tag 'host' has 4000 values and 'interface' 1700 values. We have approx 160000 unique host+interface tag pairs for metric
interface.octets.inEach metric is collected in 5 or 10 minute interval and is stored to OpenTSDB with append and salted UIDs. The amount of daily data appended to TSDB is approx 5..6GB (lzo compressed, replication=3).
In addition theres a tcollect running on each cluster member. The cluster is handling these constant writes really well and is not loaded at all.
Things get nasty, when I try fetch for example 6 metrics (
interface.errors.in, interface.errors.out,
interface.octets.in, interface.octets.out, interface.octets.out,
interface.packets.in, interface.packets.out) with grafana:
- start 2d-ago
- host=cisco1
- interface=Gi0/1
It takes about 10..12 seconds to get the data for these 3 graphs.
Opentsdb "stats": "avgAggregationTime":0.685272,"avgHBaseTime":10851.092426,"avgQueryScanTime":8147.49039,"avgScannerTime":10851.133419,"avgScannerUidToStringTime":0.0,"avgSerializationTime":0.859616,"dpsPostFilter":1150,"dpsPreFilter":1150,"emittedDPs":1144,"maxAggregationTime":0.623238,"maxHBaseTime":8147.251419,"maxQueryScanTime":7452.411409,"maxScannerUidtoStringTime":0.0,"maxSerializationTime":0.743387,"maxUidToStringTime":0.152958,"processingPreWriteTime":8150.691069,"rowsPostFilter":98,"rowsPreFilter":98,"successfulScan":32,"totalTime":8152.314927,"uidPairsResolved":0}}
At the OS side, I see, that all cores on all regionservers nodes are 100% utilized by HBase. There isnt IOwait nor network congestion.
In the other hand, if I request graphs with fewer tagv's (for example tcollect's tsd.rpc.received), the graphs appear almost instantly:
"stats {"avgAggregationTime":2.104052,"avgHBaseTime":4.427962,"avgQueryScanTime":18.620267,"avgScannerTime":6.278119,"avgScannerUidToStringTime":0.0,"avgSerializationTime":2.127678,"dpsPostFilter":75612,"dpsPreFilter":75612,"emittedDPs":1435,"maxAggregationTime":2.104052,"maxHBaseTime":18.12299,"maxQueryScanTime":18.620267,"maxScannerUidtoStringTime":0.0,"maxSerializationTime":2.127678,"maxUidToStringTime":0.008968,"processingPreWriteTime":33.139573,"rowsPostFilter":1176,"rowsPreFilter":1176,"successfulScan":16,"totalTime":48.605377,"uidPairsResolved":0}
I tried to beef up the heap and cache sizes, but still not satisfied with the results. Maybe the hardware of the nodes is not sufficient for the task, even is the volumes and rates are modest. As we use TSDB appends, there are no major compactions either.
Cluster overall statistics:
- Requests Per Second: 10000..14000
Cluster configuration (the most important parameters only):
tsd.core.auto_create_metrics = true
tsd.core.meta.enable_realtime_ts = true
tsd.core.meta.enable_realtime_uid = true
tsd.storage.enable_appends = true
tsd.storage.salt.buckets = 16
tsd.storage.salt.width = 1
tsd.storage.use_otsdb_timestamp = false
tsd.uid.use_mode=true
HBase max heap = 16G
hbase.hregion.memstore.chunkpool.maxsize = 0.6
hbase.regionserver.handler.count = 60
hbase.hregion.max.filesize =
2147483648 hbase.client.write.buffer = 8388608
hbase.rs.cacheblocksonwrite = true
hbase.rs.evictblocksonclose = false
hfile.block.bloom.cacheonwrite = true
hfile.block.index.cacheonwrite = true
hbase.block.data.cachecompressed = true
hbase.bucketcache.blockcache.single.percentage = .50
hbase.bucketcache.blockcache.multi.percentage = .49
hbase.bucketcache.blockcache.memory.percentage = .01
hbase.regionserver.global.memstore.size = .3
hfile.block.cache.size = .5
I would like to know, is there a limit, from where the amount of unique tag values isn't reasonable anymore? Or could someone advise a configuration change, which would improve read performance?
Of course we could append the hostname to metric and achieve definetly better read response time, but this will exclude cross-host aggregation. (the aggregation isn't currently usable anyway, as we hit
https://github.com/OpenTSDB/opentsdb/issues/839 ).
Thanks in advance,
--
rauno